Disruption¶
Karpenter's disruption is the process that makes Karpenter terminates nodes in the Kubernetes cluster.
Planning phase (disruption controller)¶
Search candidates¶
The disruption controller is continuously discovering nodes that can be disrupted because of this reasons:
- drift
- consolidation (empty)
- consolidation (underutilized)
The number of candidates per reason is located in the karpenter_voluntary_disruption_eligible_nodes prometheus metric
First, the disruption controller starts searching for candidates for a drift disruption and it gives them priorities. If a node in that list has pods that cannot be evicted from the node, the node is ignored for now.
The disruption can be blocked here because of:
- Pod disruption budgets
Pods in the node affected by disruption buckets with 0 ALLOWED DISRUPTIONS
- karpenter.sh/do-not-disrupt
Pods in the node with karpenter.sh/do-not-disrupt: "true" annotation. If the nodeclaim has terminationGracePeriod configured, it will still be eligible for disruption via drift.
We can see that blocked nodes with
kubectl get events --all-namespaces --field-selector involvedObject.kind=Node | grep DisruptionBlocked
If no nodes cannot be disrupted, the same process will start with the consolidation disruption.
Evaluate candidates¶
- NodePool’s disruption budget
The next step is to check the if the node respect the NodePool’s disruption budget, a mechanism to control the speed of the disruption process.
- Evaluate if new nodes are needed
Then the disruption controller does a simulation to estimate if any replacement nodes are needed.
Taint the nodes¶
Next, the chosen node(s) are tainted with karpenter.sh/disrupted:NoSchedule to prevent new pods being scheduled there.
Deploy replacement nodes¶
If new replacement nodes are needed, the disruption controller triggers their deployment and wait until they are deployed. If the deployment fails, the node(s) is(are) untainted and the whole process starts again.
Node deletion¶
Here the disruption controller deletes the node. All the Nodes and NodeClaims deployed via Karpenter have a kubernetes finalizer karpenter.sh/termination. So the the deletion is blocked leaves that task to the termination controller.
When the termination controller terminates the node, the whole process starts again.
Execution phase (termination controller)¶
The termination controller is responsible to finally delete the node. The deletion if blocked by the finalizer. This deletion can be triggered by:
- the disruption controller
- a user using manual disruption
- an external system that deletes the node resource
The APIServer has added the DeletionTimestamp on the node
Taint¶
The chosen node(s) is(are) tainted with karpenter.sh/disrupted:NoSchedule to prevent new pods being scheduled there. Depending of the disruption method, that taint can exist.
Eviction¶
The termination controller starts evicting the pods using the Kubernetes Eviction API.
- This respects Pod disruption budgets
- Static pods, pods tolerating the karpenter.sh/disrupted:NoSchedule taint, and succeeded/failed pods are ignored
Cleaning¶
- When the node is drained, the NodeClaim is deleted
- Finally the finalizer is removed from the node so the APIServer can remove it
Forceful deletion¶
In expiration and interruption methods the disruption controller immediately triggers tainting and draining as soon as the event is detected (interruption signal or expireAfter).
- That methods do not respect NodePool’s disruption budget.
- Pod disruption budgets can be used to control the disruption speed at application level.
- That methods they do not wait for a replacement node to be healthy