Kubernetes Cluster Consolidation - Merging Notus into Boreas
Our infrastructure currently consists of two separate Kubernetes clusters: Boreas (AMD64) and Notus (ARM64). To simplify management and reduce operational overhead, we're consolidating these into a single mixed-architecture cluster. This guide outlines the step-by-step process for safely migrating Notus into the Boreas cluster.
Current Infrastructure
Our current setup consists of two separate Kubernetes clusters:
-
Boreas: Single-node AMD64 cluster (x86_64)
- Hosts: mercury-bot, mercury-worker, and other services
- IP: 95.217.130.127
- Recently reconfigured to use Zephyr (standalone mercury-ta dokku app) instead of the Notus-hosted mercury-ta
-
Notus: Single-node ARM64 cluster
- Hosts: mercury-ta and Redis (no longer in use)
- IP: 195.201.35.33
- Currently contains no critical data or services
Migration Goals
- Reset the Notus node to a clean state
- Join Notus as a worker node to the existing Boreas cluster
- Configure workload placement for the ARM64 node
- Ensure proper network connectivity between nodes
- Convert existing K8s playbooks to Ansible roles for better maintainability
Migration Plan
Phase 1: Reset Notus Node
Since Boreas has been reconfigured to use Zephyr instead of the Notus-hosted mercury-ta, there's no need for backups or complex preparation. We can simply reset the node:
# SSH into Notus
ssh root@195.201.35.33
# Reset the Kubernetes configuration
kubeadm reset -f
# Clean up any remaining Kubernetes directories
rm -rf /etc/kubernetes/ /var/lib/kubelet/ /var/lib/etcd/ /var/lib/cni/ /etc/cni/net.d/
# Restart containerd
systemctl restart containerd
Phase 2: Join Notus to Boreas Cluster
-
Generate join command on Boreas
# SSH into Boreas
ssh root@95.217.130.127
# Generate join command
kubeadm token create --print-join-command -
Join Notus to Boreas cluster
# SSH into Notus
ssh root@195.201.35.33
# Run the join command (replace with actual command from previous step)
kubeadm join 95.217.130.127:6443 --token <token> --discovery-token-ca-cert-hash <hash> --cri-socket unix:///run/containerd/containerd.sock -
Verify node has joined
# On Boreas
kubectl get nodes -o wide
Phase 3: Configure Node Labels and Taints
-
Label the Notus node with its architecture
kubectl label node notus kubernetes.io/arch=arm64 -
Add custom labels for workload placement
kubectl label node notus node-type=mercury-ta
Phase 4: Future Deployment Planning
After the cluster consolidation, we'll have the flexibility to deploy workloads to either node based on architecture requirements:
-
ARM64-specific workloads can use node selectors:
nodeSelector:
kubernetes.io/arch: arm64 -
Node-specific workloads can use hostname selectors:
nodeSelector:
kubernetes.io/hostname: notus
Infrastructure Improvement Plan
As part of our ongoing infrastructure improvements, we'll be:
-
Converting K8s playbooks to Ansible roles:
- Create standardized roles for Kubernetes node setup
- Integrate with our existing role-based architecture
- Make K8s node management consistent with other infrastructure
-
Standardizing deployment processes:
- Create unified deployment workflows
- Implement consistent labeling and tainting strategies
- Document best practices for multi-architecture deployments
This work will make future infrastructure management much easier and more consistent across our environment.
Conclusion
By consolidating our Kubernetes clusters, we'll simplify our infrastructure management while maintaining the benefits of both AMD64 and ARM64 architectures. The migration is straightforward since we've already reconfigured Boreas to use Zephyr instead of the Notus-hosted mercury-ta.
The process is designed to be minimally disruptive, and after completion, we'll have a more efficient and manageable infrastructure. The follow-up work to convert our Kubernetes playbooks to Ansible roles will further improve our infrastructure management capabilities for the future.
