Building Highly Available Kubernetes Clusters: Best Practices
Building Highly Available Kubernetes Clusters: Best Practices
High availability is critical for production Kubernetes deployments. This article outlines architectural patterns and implementation details for building resilient Kubernetes clusters across various deployment models.
Core Principles of Kubernetes HA
When designing high-availability Kubernetes architectures, we must address multiple failure domains:
- Node failures - Individual nodes in a cluster may become unavailable
- Zone failures - An entire availability zone could experience an outage
- Region failures - Although rare, entire regions can be impacted
- Control plane availability - Access to the Kubernetes API must be maintained
- Data persistence - Stateful workloads require special considerations
Let’s examine how to address each of these concerns across different deployment scenarios.
Control Plane High Availability
The Kubernetes control plane consists of several components that must be highly available:
- etcd - Distributed key-value store for all cluster data
- kube-apiserver - API server that exposes the Kubernetes API
- kube-scheduler - Component that assigns pods to nodes
- kube-controller-manager - Component that runs controller processes
etcd Considerations
etcd requires strict quorum for operations, meaning a majority of nodes must be available. For production environments:
# Example etcd configuration in a multi-node setup
apiVersion: v1
kind: Pod
metadata:
name: etcd
namespace: kube-system
spec:
containers:
- name: etcd
image: k8s.gcr.io/etcd:3.5.6-0
command:
- etcd
- --advertise-client-urls=https://192.168.2.1:2379
- --initial-advertise-peer-urls=https://192.168.2.1:2380
- --initial-cluster=etcd-0=https://192.168.2.1:2380,etcd-1=https://192.168.2.2:2380,etcd-2=https://192.168.2.3:2380
- --initial-cluster-state=new
- --data-dir=/var/lib/etcd
- --client-cert-auth
# Additional parameters omitted for brevity
For etcd, always deploy an odd number of replicas (typically 3, 5, or 7) to maintain quorum in case of failures.
Load Balancing the API Server
The API server is stateless and can be deployed behind a load balancer:
┌─────────────┐
│ Load │
│ Balancer │
└──────┬──────┘
│
┌──────────────┼──────────────┐
│ │ │
┌─────────▼────┐ ┌───────▼──────┐ ┌─────▼─────────┐
│ kube-apiserver│ │kube-apiserver│ │kube-apiserver│
└──────────────┘ └──────────────┘ └───────────────┘
Node Availability and Auto-Repair
For worker nodes, we need mechanisms to automatically replace failed nodes:
- Self-healing node groups - Automatically replace failed instances
- Node auto-repair - Detect and repair unhealthy nodes
- Proper pod distribution - Using pod anti-affinity to distribute workloads
Pod Anti-Affinity Example
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 3
template:
metadata:
labels:
app: web
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- web
topologyKey: "kubernetes.io/hostname"
Multi-Zone Deployment Architecture
To protect against zone failures, distribute your Kubernetes cluster across multiple availability zones:
AWS EKS Multi-AZ Configuration
When creating an EKS cluster, select at least three availability zones:
resource "aws_eks_cluster" "ha_cluster" {
name = "ha-cluster"
role_arn = aws_iam_role.eks_cluster_role.arn
vpc_config {
subnet_ids = [
aws_subnet.private_a.id,
aws_subnet.private_b.id,
aws_subnet.private_c.id
]
}
}
resource "aws_eks_node_group" "ha_nodes" {
cluster_name = aws_eks_cluster.ha_cluster.name
node_group_name = "ha-nodes"
node_role_arn = aws_iam_role.eks_node_role.arn
subnet_ids = [
aws_subnet.private_a.id,
aws_subnet.private_b.id,
aws_subnet.private_c.id
]
scaling_config {
desired_size = 6
max_size = 9
min_size = 3
}
}
Azure AKS Multi-Zone Configuration
az aks create \
--resource-group myResourceGroup \
--name ha-aks-cluster \
--generate-ssh-keys \
--node-count 6 \
--zones 1 2 3
StatefulSet Configuration for HA
Stateful applications require special consideration. Use StatefulSets with appropriate storage classes:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: "postgres"
replicas: 3
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:13
ports:
- containerPort: 5432
name: postgres
volumeMounts:
- name: postgres-data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: postgres-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "gp3-multi-az"
resources:
requests:
storage: 100Gi
Multi-Region and Multi-Cloud Architectures
For the highest level of availability, consider deploying across multiple regions or even multiple cloud providers. This introduces complexity but provides protection against entire region failures.
Multi-Region Architecture with Global Load Balancing
┌─────────────────────┐ ┌─────────────────────┐
│ Region A │ │ Region B │
│ ┌─────────────────┐ │ │ ┌─────────────────┐ │
│ │ K8s Cluster A │ │ │ │ K8s Cluster B │ │
│ └─────────────────┘ │ │ └─────────────────┘ │
└──────────┬──────────┘ └──────────┬──────────┘
│ │
└────────────┬──────────────────┬─┘
│ │
┌─────────▼─────────┐ │
│ Global Traffic │ │
│ Management │ │
└─────────┬─────────┘ │
│ │
┌────────────┴──────────────────┘
│
┌──────────▼─────────┐
│ Users │
└────────────────────┘
Data Synchronization Approaches
- Active-Passive - Single write region with replication to secondary region(s)
- Active-Active - Multiple write regions with conflict resolution
- Partitioned - Data sharding across regions based on access patterns
Implementing multi-region or multi-cloud deployments requires:
- Global DNS and traffic management - Route users to appropriate regions
- Data synchronization - Keep data consistent across regions
- Configuration management - Ensure consistent application configuration
- Backup and disaster recovery - Enable rapid recovery from failures
Monitoring and Observability for HA Clusters
A highly available cluster is only as good as its observability system. Implement:
- Multi-cluster monitoring with Prometheus and Thanos
- Distributed tracing with Jaeger
- Log aggregation with Elasticsearch/Loki
- Synthetic testing to validate user experience
Conclusion
Building highly available Kubernetes clusters requires careful architecture at multiple levels:
- Infrastructure level - Multi-zone, multi-region deployment
- Kubernetes control plane - Redundant etcd and API servers
- Application deployment - Pod anti-affinity and StatefulSets
- Data management - Replicated storage and backup strategies
The approaches outlined in this article represent production-tested patterns I’ve implemented for enterprise clients. While complexity increases with each level of redundancy, the resulting resilience is essential for business-critical applications.
In my next article, I’ll explore the financial considerations of various high-availability strategies, helping you balance cost with resilience requirements.
If you’re planning a high-availability Kubernetes implementation and need expert guidance, contact me for consulting services.