nth

Overview

This post covers:

Written for SREs and DevOps Engineers managing Kubernetes clusters on AWS.

Background

Considerations for EKS Spot Worker Nodes

Spot instances reduce costs and improve scalability on EKS, but they can be terminated at any time.

When using Spot instances as EKS worker nodes, you must handle Spot Instance Interruption Notifications (ITN). AWS EC2 sends an interruption notification event 2 minutes before terminating a Spot instance.

Required actions when handling Spot ITN in EKS:

Cordon and drain concepts

Without handling Spot ITN, application code may not shut down properly, or new pods could accidentally be scheduled on a node about to be terminated. Manually handling every interruption event is not feasible.

This is why the Node Termination Handler controller was created.

NTH operation flow

NTH detects Spot ITN events, evicts pods, and cordons the affected node. The Kubernetes Eviction API respects PDB and terminationGracePeriodSeconds, ensuring graceful pod shutdown.

The following diagram shows NTH operating in IMDS mode:

---
title: Node Termination Handler in IMDS(Instance Metadata Service) mode
---
flowchart LR
    AWS[EC2 Service] --> |1 Spot ITN event| IMDS

    NTH --> |2 Watch Spot ITN event from IMDS| IMDS
    NTH <--> |3 Cordon + Eviction API| API[Kubernetes API Server]
    API --> |4 Cordon| Node[Node Resource]

    API <--> |5 Verify PDB compliance| PDB["PDB<br>(Pod Disruption Budget)"]
    kubelet[kubelet] --> |6 Watch pod deletion command| API
    kubelet --> |7 Send **SIGTERM** signal| Pod["Pod(s)"]

    Pod -.-> |8 Graceful Shutdown| Shutdown(Graceful Shutdown)

    subgraph Cluster["⎈ Kubernetes Cluster"]
      subgraph Node["Worker Node (Spot EC2)"]
        IMDS
        NTH
        kubelet
        Pod
      end
        API
        PDB
        Shutdown
    end

    classDef aws fill:#FF9900,stroke:#232F3E,color:white;
    classDef k8s fill:#326CE5,stroke:#fff,color:white;
    classDef shutdown fill:#D3455B,stroke:#fff,color:white;

    class AWS,IMDS aws;
    class API,PDB,Pod,NTH k8s;
    class Shutdown shutdown;

NTH lets you save costs with Spot instances while mitigating the instability risk of unexpected terminations.

How NTH Works

aws-node-termination-handler (NTH) operates in one of two modes. Both IMDS and Queue Processor modes monitor EC2 instance events, but each supports different event types.

FeatureIMDS ProcessorQueue Processor
Spot Instance Termination Notifications (ITN)
Scheduled Events
Instance Rebalance Recommendation
ASG Termination Lifecycle Hooks
ASG Termination Lifecycle State Change
AZ Rebalance Recommendation
Instance State Change Events
Issue Lifecycle Heartbeats

For broader event coverage, Queue Processor mode is recommended. It requires additional SQS and EventBridge infrastructure, making setup more complex than IMDS mode.

NTH Installation Modes

IMDS and Queue Processor modes differ in pod deployment method and configuration.

IMDS (Instance Metadata Service) Mode

NTH configuration in IMDS mode

Queue Processor Mode

NTH configuration in Queue Processor mode

Environment

Local

Kubernetes

Installation

Goal: Install NTH in IMDS mode using a Helm chart.

Download the Chart

Clone the AWS NTH official repository:

$ git clone https://github.com/aws/aws-node-termination-handler.git

The repository includes a Helm chart.

Navigate to the chart directory:

$ cd aws-node-termination-handler/config/helm/aws-node-termination-handler

Chart directory structure:

$ tree .
.
├── Chart.yaml
├── README.md
├── example-values-imds-linux.yaml
├── example-values-imds-windows.yaml
├── example-values-queue.yaml
├── templates
│   ├── NOTES.txt
│   ├── _helpers.tpl
│   ├── clusterrole.yaml
│   ├── clusterrolebinding.yaml
│   ├── daemonset.linux.yaml
│   ├── daemonset.windows.yaml
│   ├── deployment.yaml
│   ├── pdb.yaml
│   ├── podmonitor.yaml
│   ├── psp.yaml
│   ├── service.yaml
│   ├── serviceaccount.yaml
│   └── servicemonitor.yaml
└── values.yaml

2 directories, 19 files

Chart Configuration

Three key settings to modify in values.yaml:

  1. Target node selection: Use nodeSelector or nodeAffinity to deploy NTH only to specific node groups.
  2. Resource limits: Set resource requests and limits for NTH DaemonSet pods.
  3. webhookURL: Slack incoming webhook URL for NTH event notifications.

daemonsetNodeSelector

Deploy NTH pods only to Spot instance nodes:

# values.yaml
daemonsetNodeSelector:
  eks.amazonaws.com/capacityType: SPOT

Without this constraint, NTH pods deploy to all nodes including On-Demand, wasting resources.

nodeAffinity

For more complex node selection, use nodeAffinity instead of nodeSelector.

nodeAffinity behavior

Example targeting specific node groups:

# values.yaml
daemonsetAffinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: "eks.amazonaws.com/compute-type"
              operator: NotIn
              values:
                - fargate
            - key: node.kubernetes.io/name
              operator: In
              values:
              - hpc-spot
              - data-batch-spot

Multiple matchExpressions conditions are ANDed. The above deploys NTH pods only to nodes that:

  1. Are not Fargate type (On-Demand or Spot)
  2. Belong to hpc-spot or data-batch-spot node groups

Pod Resource Limits

The official NTH Helm chart has no default resource limits. DaemonSet pods typically need minimal resources:

# values.yaml
resources:
  requests:
    cpu: 10m
    memory: 40Mi
  limits:
    cpu: 100m
    memory: 100Mi

PSP Configuration

PSP (Pod Security Policy) is removed since Kubernetes v1.25. Set rbac.pspEnabled to false for EKS v1.25+:

# aws-node-termination-handle/values_ENV.yaml
...
  rbac:
    create: true
+     pspEnabled: true
-     pspEnabled: false

When rbac.pspEnabled is false, the following resources are not created (controlled by the psp.yaml template):

NTH pod's PSP resource relationship:

NTH PSP

webhookURL (Optional)

Set a Slack incoming webhook URL for cordon & drain notifications:

# values.yaml
webhookURL: "${SLACK_WEBHOOK_URL}"

Leave empty if Slack notifications are not needed:

# values.yaml
webhookURL: ""

webhookTemplate (Optional)

Customize the Slack notification message template via webhookTemplate. Use YAML Folded Scalar (>) for readability:

# values.yaml
webhookTemplate: >
  {
    "text": ":rotating_light: *EC2 Spot instance is about to be interrupted.* :rotating_light:\n
    *_Account:_* `{{ .AccountId }}`\n
    *_Instance ID:_* `{{ .InstanceID }}`\n
    *_Node Name:_* `{{ .NodeName }}`\n
    *_Instance Type:_* `{{ .InstanceType }}`\n
    *_Start Time:_* `{{ .StartTime }}`\n
    *_Description:_* {{ .Description }}\n
    *_Affected Pod(s):_* `{{ .Pods }}`"
  }

Slack notification example with folded scalar

Alternatively, store the Slack template in a separate ConfigMap (or Secret) resource.

Default webhookTemplate (when left unset):

# values.yaml
webhookTemplate: "\{\"Content\":\"[NTH][Instance Interruption] InstanceId: \{\{ \.InstanceID \}\} - InstanceType: \{\{ \.InstanceType \}\} - Kind: \{\{ \.Kind \}\} - Start Time: \{\{ \.StartTime \}\}\"\}"

Default Slack message example

See the End to End test code on NTH GitHub for more details.

NTH pods must be able to reach the Slack API via NAT Gateway:

Network flow for Slack notifications

Install NTH with Helm

Install NTH in IMDS mode to kube-system namespace:

CHART_VERSION=0.21.0
helm upgrade \
  --install \
  --namespace kube-system \
  aws-node-termination-handler ./aws-node-termination-handler \
  --version $CHART_VERSION \
  --wait

NTH installs in IMDS mode by default.

To use Queue Processor mode, set enableSqsTerminationDraining to true:

# values.yaml
enableSqsTerminationDraining: true

Queue Processor mode also requires SQS queue and IAM permission setup. This post only covers IMDS mode. See the NTH official docs for details.

Release "aws-node-termination-handler" has been upgraded. Happy Helming!
NAME: aws-node-termination-handler
LAST DEPLOYED: Sun Jun 11 17:40:56 2023
NAMESPACE: kube-system
STATUS: deployed
REVISION: 5
TEST SUITE: None
NOTES:
***********************************************************************
* AWS Node Termination Handler                                        *
***********************************************************************
  Chart version: 0.21.0
  App version:   1.19.0
  Image tag:     public.ecr.aws/aws-ec2/aws-node-termination-handler:v1.19.0
  Mode :         IMDS
***********************************************************************

Confirms NTH v1.19.0 (chart v0.21.0) installed in IMDS mode.

To reinstall an existing NTH deployment, use --recreate-pods and --force:

$ helm upgrade \
    --install \
    --namespace kube-system \
    aws-node-termination-handler ./aws-node-termination-handler \
    --version $CHART_VERSION \
    --recreate-pods \
    --force
Flag --recreate-pods has been deprecated, functionality will no longer be updated. Consult the documentation for other methods to recreate pods

The deprecation warning for --recreate-pods can be safely ignored — the reinstall proceeds normally.

Verify the release:

$ helm list -n kube-system
NAME                           NAMESPACE     REVISION   UPDATED                                STATUS     CHART                                 APP VERSION
aws-node-termination-handler   kube-system   5          2023-06-11 17:40:56.273914 +0900 KST   deployed   aws-node-termination-handler-0.21.0   1.19.0

Check applied values:

$ helm get values aws-node-termination-handler -n kube-system

Verify that the three settings (node selection, resource limits, Slack webhook URL) are properly applied.

NTH Pod Status

Check the DaemonSet:

$ kubectl get daemonset -n kube-system aws-node-termination-handler
NAME                           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                                AGE
aws-node-termination-handler   2         2         2       2            2           eks.amazonaws.com/capacityType=SPOT,kubernetes.io/os=linux   3h56m

Two things to verify:

Since IMDS mode is primarily for handling Spot ITN events, deploying NTH to On-Demand nodes wastes resources unless there are specific events to handle.

When Spot worker nodes scale out via Kubernetes Autoscaler (or Karpenter), the DaemonSet automatically adjusts:

NAME                           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                                AGE
aws-node-termination-handler   2         2         2       2            2           eks.amazonaws.com/capacityType=SPOT,kubernetes.io/os=linux   3h56m

After one more Spot instance joins (3 worker nodes total):

NAME                           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                                AGE
aws-node-termination-handler   3         3         3       3            3           eks.amazonaws.com/capacityType=SPOT,kubernetes.io/os=linux   3h57m

In IMDS mode, NTH pods are managed by a DaemonSet, so they automatically scale with Spot worker nodes.

Graceful Pod Shutdown

Graceful shutdown in Kubernetes ensures applications terminate safely. This is achieved by combining tGPS (terminationGracePeriodSeconds) and preStop hooks.

  1. spec.terminationGracePeriodSeconds: Time to wait between receiving a termination signal and actual termination. Default is 30 seconds.
  2. preStop hook: Executes a command or HTTP request after receiving the termination signal but before the shutdown process begins. Useful for session saving, log backup, etc.

Scenarios That Trigger Termination

Graceful Shutdown Configuration Example

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
spec:
  # [1] Set pod termination grace period to 60 seconds
  terminationGracePeriodSeconds: 60
  containers:
  - name: example-container
    image: nginx
    lifecycle:
      preStop:
        exec:
          # [2] Wait 55 seconds before shutdown
          command: ["sh", "-c", "sleep 55"]

With terminationGracePeriodSeconds: 60, the preStop hook runs sleep 55 after receiving the termination signal. During this time, the pod can complete data processing, release DB connections, and clean up resources.

Pod lifecycle timeline with terminationGracePeriodSeconds and preStop:

Graceful shutdown pod lifecycle

For graceful shutdown to work correctly, the sum of preStop duration and application shutdown time must be less than spec.terminationGracePeriodSeconds.

terminationGracePeriodSeconds > preStop duration + application shutdown time

For Spring Framework applications, these settings affect application shutdown time:

Example of incorrect configuration (preStop duration + application shutdown time exceeds terminationGracePeriodSeconds):

Incorrect graceful shutdown lifecycle

Advanced Configuration

NTH Metrics

NTH provides Prometheus metrics on port 9092 at /metrics:

# aws-node-termination-handler/values.yaml
enablePrometheusServer: true
prometheusServerPort: 9092

Available metrics:

Metrics collection differs depending on NTH mode (IMDS vs Queue).

serviceMonitor and podMonitor are custom resources provided by Prometheus Operator. See the API reference.

In Queue mode, metrics can be collected via:

Example scrape_configs for Prometheus Helm chart:

# charts/prometheus/values.yaml
extraScrapeConfigs: |
  - job_name: 'aws-node-termination-handler'
    static_configs:
      - targets:
          - 'aws-node-termination-handler.kube-system.svc.cluster.local:9092'

In IMDS mode, no ClusterIP service is created, so use podMonitor to scrape metrics directly from pods:

IMDS metrics collection

Without Prometheus Operator, add annotations to all NTH pods for auto-discovery:

# charts/aws-node-termination-handler/values.yaml
podAnnotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "9092"
  prometheus.io/path: "/metrics"

enablePrometheusServer: true
prometheusServerPort: 9092

Testing

To test NTH, a Spot interruption event (Spot ITN) must occur on a cluster using Spot instances.

Use AWS Fault Injection Service to trigger Spot instance interruptions and observe NTH behavior:

---
title: Spot ITN triggered by Fault Injection Service
---
graph LR
  F[Fault Injection Service]
  subgraph Worker Node
    NTH[Node Termination Handler <br> Pod]
    P[Pod]
  end

  subgraph NN[Worker Node]
    NP[New Pod]
  end

  F[Fault Injection Service] -- Spot ITN --> NTH
  NTH --Evict application pod--> P
  P -- Evicted --> NP

  note ~~~ NP

  note["**Note**: These all worker nodes are Spot instances"]

  style NTH fill:darkorange,color:white
  style note fill:null,color:gray,stroke-width:2px

NTH ignores DaemonSet-managed pods and performs cordon + drain on the target node for non-DaemonSet pods:

NTH pod logs after a Spot Interruption Notification:

2025/05/16 08:29:28 ??? WARNING: ignoring DaemonSet-managed Pods: calico-system/calico-node-hhhj4, calico-system/csi-node-driver-zg5vt, kube-system/aws-node-44nls, kube-system/aws-node-termination-handler-b57xt, kube-system/ebs-csi-node-qcp7n, kube-system/eks-pod-identity-agent-xpkpm, kube-system/kube-proxy-hwp8q, kube-system/secrets-store-csi-driver-provider-aws-4c98d, kube-system/secrets-store-csi-driver-wdj4p
2025/05/16 08:29:28 ??? evicting pod pfops/argocd-repo-server-97d8976d9-fkwxd
2025/05/16 08:29:28 ??? evicting pod kube-system/ebs-csi-controller-59fb89d9c-52rvk
2025/05/16 08:29:28 ??? evicting pod kube-system/ebs-csi-controller-59fb89d9c-jdhq7
2025/05/16 08:29:28 ??? evicting pod kyverno/kyverno-background-controller-5579d9bdbc-4mgn8
2025/05/16 08:29:28 ??? evicting pod netbox/netbox-redis-master-0
2025/05/16 08:29:28 ??? evicting pod netbox/netbox-redis-replicas-1
2025/05/16 08:29:28 ??? evicting pod netbox/netbox-redis-replicas-2
2025/05/16 08:29:28 ??? evicting pod kube-system/aws-load-balancer-controller-85679cfcc6-jh488
2025/05/16 08:29:28 ??? evicting pod kube-system/aws-load-balancer-controller-85679cfcc6-9q2gc
2025/05/16 08:29:28 ??? evicting pod kube-system/coredns-8f66cf8b8-5cwld
2025/05/16 08:29:28 ??? evicting pod kube-system/coredns-8f66cf8b8-wvrjn
2025/05/16 08:29:29 ??? error when evicting pods/"coredns-8f66cf8b8-wvrjn" -n "kube-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
2025/05/16 08:29:29 ??? error when evicting pods/"ebs-csi-controller-59fb89d9c-52rvk" -n "kube-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
2025/05/16 08:29:34 ??? evicting pod kube-system/coredns-8f66cf8b8-wvrjn
2025/05/16 08:29:34 ??? evicting pod kube-system/ebs-csi-controller-59fb89d9c-52rvk
2025/05/16 08:29:34 ??? error when evicting pods/"ebs-csi-controller-59fb89d9c-52rvk" -n "kube-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
2025/05/16 08:29:39 ??? evicting pod kube-system/ebs-csi-controller-59fb89d9c-52rvk
2025/05/16 08:29:39 ??? error when evicting pods/"ebs-csi-controller-59fb89d9c-52rvk" -n "kube-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
2025/05/16 08:29:44 ??? evicting pod kube-system/ebs-csi-controller-59fb89d9c-52rvk
2025/05/16 08:29:44 ??? error when evicting pods/"ebs-csi-controller-59fb89d9c-52rvk" -n "kube-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
2025/05/16 08:29:49 ??? evicting pod kube-system/ebs-csi-controller-59fb89d9c-52rvk
2025/05/16 08:30:05 INF Node successfully cordoned and drained node_name=ip-10-xxx-x-xx.ap-northeast-2.compute.internal reason="Spot ITN received. Instance will be interrupted at 2025-05-16T08:31:24Z \n"

Summary

References

Articles:

Node Termination Handler:

AWS EC2 Docs: