Disruption methods¶
Manual deletion¶
- Node
A node can be manually deleted by a user or by an external system. When a node without this finalizer is deleted (via kubectl or api call), the instance will not be deleted from AWS EC2. It is only unregistered from the kubernetes cluster. All the containing pods will deleted by a gargage collection and they will start in another node. This karpenter finalizer improves the deletion process.
- Nodeclaim
The karpenter nodes are associated to a nodeclain. Deleting the nodeclain also deletes the node.
- Nodepool
The nodepools are the owners of the nodeclaims (ownerReferences). If a nodepool is deleted, the associated nodeclaims and nodes will be deleted.
Automatic graceful methods: Consolidation¶
With this method karpenter tries to reduce the cluster cost deleting nodes or replacing them.
consolidationPolicy¶
Karpenter can delete nodes:
- when the node is empty. This is when it has no daemonset related pods running. (deletion mechanism)
- when the workloads can run in another nodes. (deletion mechanism)
- when the nodes can be replaced with cheaper variants. (replace mechanism)
There are 2 consolidation policies: WhenEmptyOrUnderutilized (default) and WhenEmpty and it can be specified in spec.disruption.consolidationPolicy of the nodepool.
The order of the actions that the consolidation tries to do is:
- delete all the empty nodes in parallel
- delete 2 or more nodes and possibly creating a new one if this is a cheaper solution
- delete a single node and possibly creating a new one if this is a cheaper solution
Nodes with fewer pods, or with upcoming expiration or with lower priority pods will be better candidates to be consolidated
Things like the anti-affinity, pod disruption budgets or topology spreads affects the effectiveness of the consolidation
Spot to spot consolidation¶
The spot nodes are consolidated by default with the deletion mechanism.
It is possible to enable the replace one through the SpotToSpotConsolidation feature flag karpenter considers another things in addition to the cheapest price. It also needs a minimum of 15 instance types to work and possibility to be interrupted is also observed.
consolidateAfter¶
When a pod is added or deleted from a node, karpenter starts to calculate if the node is consolidatable when the value specified in spec.disruption.consolidateAfter is reached. With this we can tell karpenter to be more cautious or aggressive in terms of consolidation.
We can disable the consolidation with the "Never" value here
Automatic graceful methods: Drift¶
The drift method tries to reconciliate the desired state of the nodepools and ec2nodeclasses with the actual one. In order to check if there is a drift, karpenter compares some fields in that resources. It also maintains some hashes in the resources.
Automated forceful methods: Expiration¶
It is possible to expire nodes with the spec.template.spec.expireAfter field. The default vale is 720 hours (30 days)
Automated forceful methods: Interruption¶
When this methods karpenter watch some events that can cause involuntary interruptions.
- AWS will reclaim an spot instance
- Maintenance tasks
- Instance deletion events
- Instance stopping events
Then karpenter sends a drain, taint and deletion of the node.
With the spot interruption warnings, there are 2 minutes to solve the situation. In order to get the events we need to configure an sqs queue and a some EventBridge rules. Also, by default, karpenter does not manage the Spot Rebalance Recommendations.