Cloud Experts Documentation

Deploy ROSA + Nvidia GPU + RHOAI with Automation

This content is authored by Red Hat experts, but has not yet been tested on every supported configuration.

Getting Red Hat OpenShift AI up and running with NVIDIA GPUs on a Red Hat OpenShift Service on AWS (ROSA) cluster can involve a series of detailed steps, from installing various operators to managing dependencies. While manageable, this process can be time-consuming when you’re eager to start leveraging OpenShift AI for your projects.

This guide and its accompanying Git repository are designed to streamline your setup significantly. We focus on getting you productive faster by using Terraform to deploy a ROSA cluster with GPUs from the start. From there, Ansible scripts take over, automating the deployment and configuration of all necessary operators for both NVIDIA GPUs and Red Hat OpenShift AI. This means less manual configuration for you and more time spent on what matters: innovating with AI.

Prerequisites

  • terraform
  • git
  • ansible cli

Create a Red Hat Hybrid Cloud Console Service Account

Please refer to this guide to create a service account to be used to create the cluster.

Note: Make sure to add the service account to a group that has ‘OCM cluster provisioner’ access. Refer to this guide on adding a service account to a group.

Set Environment Variables

Set and adjust the following variables to meet your requirements

export TF_VAR_client_secret="OCM Service Account Client Secret"
export TF_VAR_client_id="OCM Service Account Client ID"
export TF_VAR_cluster_name="rosa-rhoai"
export TF_VAR_ocp_version=4.19.2
export TF_VAR_private=false
export TF_VAR_compute_machine_type=m5.8xlarge
export TF_VAR_gpu_machine_type=g5.4xlarge
export TF_VAR_admin_password=<admin password>
export TF_VAR_developer_password=<developer password>
export TF_VAR_hosted_control_plane=true
export TF_VAR_multi_az=true
export TF_VAR_region=<AWS Region>

Create the ROSA cluster

The ROSA cluster will be created with the following:

  • A second machine pool with Nvidia GPU worker nodes
  • Deploys Node Feature Discovery (NFD) Operator
  • Deploys NVIDIA GPU Operator
  • Deploys OpenShift Serverless Operator
  • Deploys Service Mesh Operator
  • Deploys Authorino Operator
  • Deploys and Configures Red Hat OpenShift AI
  • Deploys and Configures Accelerator Profile

Clone the git repository

git clone https://github.com/rh-mobb/terraform-rosa-rhoai

Run terraform to deploy

terraform init && \
  terraform plan -out tf.plan && \
  terraform apply tf.plan

Interested in contributing to these docs?

Collaboration drives progress. Help improve our documentation The Red Hat Way.

Red Hat logo LinkedIn YouTube Facebook Twitter

Products

Tools

Try, buy & sell

Communicate

About Red Hat

We’re the world’s leading provider of enterprise open source solutions—including Linux, cloud, container, and Kubernetes. We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Subscribe to our newsletter, Red Hat Shares

Sign up now
© 2023 Red Hat, Inc.