Getting Started with Red Hat Build of Karpenter (AutoNode) on ROSA
This content is authored by Red Hat experts, but has not yet been tested on every supported configuration. This guide has been validated on OpenShift 4.22. Operator CRD names, API versions, and console paths may differ on other versions.
Red Hat build of Karpenter (AutoNode) brings workload-aware, just-in-time node provisioning to Red Hat OpenShift Service on AWS (ROSA) with Hosted Control Planes. Instead of managing static machine pools with pre-defined instance types, Karpenter evaluates the exact CPU, memory, and scheduling constraints of pending pods and provisions the optimal EC2 instance automatically — then consolidates underutilized nodes when they are no longer needed.
This guide walks through enabling AutoNode on a ROSA HCP cluster, configuring a NodePool and EC2NodeClass, and exploring use cases including right-sizing, Spot optimization, and consolidation.
Prerequisites
- A ROSA HCP cluster running OpenShift 4.22 or later with AutoNode enabled
ocCLI authenticated to the clusterrosaCLI configured- AWS CLI configured
Set Environment Variables
Set your cluster name once and reuse it throughout the guide:
Deploy a Karpenter-Enabled ROSA Cluster
Option 1 — Automated (Recommended)
Use the
terraform-rosa
Terraform module to deploy a fully configured ROSA HCP cluster with AutoNode enabled in a single command. Set karpenter = true alongside your cluster variables and Terraform handles the IAM role, trust policy, cluster wiring, and default NodePool/EC2NodeClass automatically.
Set the required environment variables:
Create a tfvars file:
Deploy:
Terraform will create the cluster, configure the Karpenter IAM role, and apply the default OpenshiftEC2NodeClass and NodePool automatically.
Option 2 — Manual
Follow the official Red Hat documentation to:
- Create the Karpenter IAM policy and role
- Tag the cluster security group with
karpenter.sh/discovery - Enable AutoNode via
rosa edit cluster --autonode=enabled --autonode-iam-role-arn=<role_arn>
Verify AutoNode is Active
Expected output:
Log in and confirm the Karpenter CRDs are installed:
Configure NodePool and EC2NodeClass
ROSA uses OpenshiftEC2NodeClass instead of the upstream EC2NodeClass. ROSA automatically manages subnet and security group selectors via karpenter.sh/discovery tags — no manual configuration is needed in the spec.
Note: If you deployed via terraform-rosa with karpenter = true, these resources are already applied. Skip to
Use Case 1
.
Apply the OpenshiftEC2NodeClass:
Apply the NodePool:
Verify both resources are ready:
Expected output:
NODES: 0 is correct — Karpenter provisions nodes on demand when pods are pending.
Create the Test Namespace
All workloads run in a dedicated namespace:
Use Case 1 — Basic Scale-Up
Deploy a workload that exceeds current capacity and watch Karpenter provision a right-sized node automatically.
Watch Karpenter respond:
What to observe:
- Pods enter
Pendingstate — no capacity available on existing nodes - Within ~30 seconds, Karpenter detects pending pods and creates a
NodeClaim - A new node joins the cluster (~2–4 minutes)
- All pods schedule and move to
Running
Karpenter evaluated the total pending resource requests (10 × 1 CPU / 1Gi) and provisioned a single right-sized instance through bin-packing rather than multiple smaller nodes.
Use Case 2 — Instance Type Flexibility (Right-Sizing)
Show how Karpenter selects different instance families for memory-heavy vs CPU-heavy workloads.
Important: Resource requests must be large enough that workloads cannot efficiently share a single node. Karpenter always optimizes for cost — small requests will be bin-packed onto one large instance instead of provisioning specialized nodes. topologySpreadConstraints forces pods to spread across separate nodes.
Deploy a memory-heavy workload (12Gi per pod → drives r-family selection):
Deploy a CPU-heavy workload (6 CPU per pod → drives c-family selection):
After nodes provision (~3–4 minutes):
The memory workload lands on r-family instances; the CPU workload lands on c-family instances — no manual node group configuration required.
Use Case 3 — Spot Instance Optimization
Show cost savings through automatic Spot instance usage.
Spot instances can deliver 60–90% cost savings vs On-Demand. Karpenter monitors EC2 Spot markets across instance types and Availability Zones to find the cheapest available capacity, with automatic fallback to On-Demand when Spot is unavailable.
Use Case 4 — Consolidation (Scale Down)
Show Karpenter automatically reclaiming unused capacity.
Within ~60 seconds Karpenter identifies underutilized nodes, cordons and drains them, reschedules remaining pods onto fewer nodes, and terminates the unused EC2 instances.
Use Case 5 — Coexistence with Machine Pools
Karpenter-managed nodes and existing ROSA machine pool nodes run side by side in the same cluster. You can use node selectors and affinity rules to direct specific workloads to either provisioner. This enables a gradual migration — existing workloads stay on managed machine pools while new workloads adopt Karpenter at your own pace.
View existing machine pools
Optionally enable Cluster Autoscaler on a machine pool
To compare Karpenter with traditional Cluster Autoscaler scaling, enable autoscaling on an existing machine pool:
Verify autoscaling is enabled:
Deploy a workload targeting Karpenter nodes
Karpenter-provisioned nodes carry the autonode: "true" label from the NodePool template. Use a nodeSelector to direct workloads exclusively to these nodes:
Deploy a workload targeting machine pool nodes only
Use node affinity to ensure the workload never schedules on Karpenter-managed nodes. The replica count and CPU request are sized to exceed the available capacity of the existing machine pool nodes, which forces the Cluster Autoscaler to provision additional machine pool nodes:
Verify the Cluster Autoscaler scaled the machine pool
Watch for new nodes and confirm they came from the machine pool (no autonode label) rather than Karpenter:
Once new nodes appear, verify the pods scheduled correctly:
Confirm the machine pool replica count increased:
What to observe: Pods targeting machine pool nodes go Pending because existing nodes are full. The Cluster Autoscaler detects the unschedulable pods, scales the machine pool up, and the new nodes carry standard machine pool labels — no autonode label, no karpenter.sh/nodepool label. This confirms the two provisioners are operating independently on the same cluster.
Verify workload placement side by side
Expected result — two distinct groups of nodes:
| Node | autonode |
Instance Type | Capacity Type | Provisioner |
|---|---|---|---|---|
ip-10-x-x-x |
true |
c7i-flex.2xlarge |
spot |
Karpenter |
ip-10-x-x-x |
(none) | m5.xlarge |
(none) | Cluster Autoscaler |
Pods from karpenter-only will appear on nodes with autonode=true; pods from machinepool-only will appear on machine pool nodes with no autonode label.
Cleanup
After the namespace is deleted, both provisioners will reclaim their nodes automatically. Karpenter terminates its nodes within ~30 seconds of the workloads being removed (based on the consolidateAfter: 30s setting in the NodePool). The Cluster Autoscaler will scale the machine pool back down to its minimum replica count within a few minutes once the nodes are no longer needed.
Summary
| Capability | What It Shows | Business Value |
|---|---|---|
| Right-sizing | CPU vs memory workloads get different instance families | No over-provisioning; pay only for what you need |
| Spot optimization | Batch workloads automatically use Spot | 60–90% cost reduction for fault-tolerant workloads |
| Consolidation | Scale down → nodes disappear in ~60s | No stranded capacity; cluster continuously optimizes |
| Zero overhead | No Karpenter pods in oc get pods -A |
Hosted control plane takes the operational burden |
| Coexistence | Machine pool + Karpenter nodes side by side | Gradual migration, no big-bang cutover required |
| 400+ instance types | oc get nodes -L node.kubernetes.io/instance-type shows variety |
No manual node group configuration per instance type |