ROSA with Nvidia GPU Workloads - Manual

Last edited: July 9, 2025
Published: February 21, 2023
Authors: Chris Kang,; Diana Sari

Tags:

AWS

GPU

ROSA

This content is authored by Red Hat experts, but has not yet been tested on every supported configuration.

This is a guide to install GPU on ROSA cluster manually, which is an alternative to our Helm chart guide .

Prerequisites

ROSA cluster (4.14+)
- You can install a Classic version using CLI or an HCP one using Terraform .
- Please be sure you are logged in to the cluster with a cluster admin access.
rosa cli
oc cli

1. Setting up GPU machine pools

In this tutorial, I’m using g5.4xlarge node for the GPU machine pools with auto-scaling enabled up to 4 nodes. Please replace your-cluster-name with the name of your cluster.

Note that you can also use another instance type and not using auto-scaling.

2. Installing NFD operator

The Node Feature Discovery operator will discover the GPU on your nodes and NFD instance will appropriately label the nodes so you can target them for workloads. Please refer to the official OpenShift documentation for more details.

Note that this above might take a few minutes. And then next, we will create the NFD instance.

3. Installing GPU operator

Next, we will set up NVIDIA GPU Operator that manages NVIDIA software components and ClusterPolicy object to ensure the right setup for NVIDIA GPU in the OpenShift environment. Please refer to the official NVIDIA documentation for more details.

And finally, let’s update the ClusterPolicy.

Validating GPU (optional)

By now you should have your GPU setup correctly, however, if you’d like to validate it, you could run the following on terminal.

In essence, here we verify that NFD can detect the GPUs, run nvidia-smi on the GPU driver daemonset pod, run a simple CUDA vector addition test pod, and delete it.

Note that this validation step could take a few minutes to complete. And if you were seeing any error(s) such as “No GPU nodes detected”, “Failed to run nvidia-smi”, etc., then you might want to try again in the next few minutes.

ROSA with Nvidia GPU Workloads - Manual

Prerequisites

1. Setting up GPU machine pools

2. Installing NFD operator

3. Installing GPU operator

Validating GPU (optional)

Interested in contributing to these docs?

Products

Tools

Try, buy & sell

Communicate

About Red Hat

Subscribe to our newsletter, Red Hat Shares

Red Hat legal and privacy links

Red Hat legal and privacy links