How to run and deploy LLMs using Red Hat OpenShift AI on a Red Hat OpenShift Service on AWS cluster

Learn how to install the Red Hat® OpenShift® AI (RHOAI) operator and Jupyter notebook, create an Amazon S3 bucket, and run the LLM model on a Red Hat OpenShift Service on AWS (ROSA) cluster.

Disclaimer: this content is authored by Red Hat experts, but has not yet been tested on every supported configuration.

Learn how to install the Red Hat® OpenShift® AI (RHOAI) operator and Jupyter notebook, create an Amazon S3 bucket, and run the LLM model on a Red Hat OpenShift Service on AWS (ROSA) cluster.

Disclaimer: this content is authored by Red Hat experts, but has not yet been tested on every supported configuration.

Performing hyperparameter tuning

15 mins

This learning path has been a very simple guide aimed to get you started with Red Hat® OpenShift® AI (RHOAI) and Red Hat OpenShift Service on AWS (ROSA). As mentioned previously, you could improve the accuracy in your training by increasing the dataset size and running a more robust model. We can leverage GPU to support that. Another idea is to extend the workload to AWS SageMaker and/or to AWS Lambda. In addition, RHOAI itself has a section where you can run the code as a pipeline.

What will you learn?

  • How to conduct hyperparameter tuning to increase accuracy

What do you need before starting?

  • Met all prerequisites
  • Completed previous steps

Steps to perform hyperparameter tuning

There are many ways to go about performing hyperparameter tuning for your model to improve the model accuracy. Here we’ll be using optuna, which is a popular library to optimize hyperparameters. It essentially allows you to define the search space for each hyperparameter and automatically finds the best combination based on the specified objective.

This is the code example that you can run on the notebook. Once again, instructions and descriptions are included in-line and denoted by the “#” character, so that you may copy the entire code block:

!pip install transformers datasets torch evaluate accelerate boto3 optuna

import numpy as np
import evaluate
import optuna
import boto3
import os
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer

os.environ["TOKENIZERS_PARALLELISM"] = "false"

dataset = load_dataset("ag_news")
small_dataset = dataset["train"].shuffle(seed=42).select(range(500)) 

model_name = "prajjwal1/bert-tiny"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=4)

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_datasets = small_dataset.map(tokenize_function, batched=True)

def compute_metrics(eval_pred):
    metric = evaluate.load("accuracy")
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

# define the objectives, i.e. the hyperparameters to tune, the training arguments, train the model, and evaluate them
def objective(trial):
    learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 5e-5)
    per_device_train_batch_size = trial.suggest_categorical("per_device_train_batch_size", [4, 8, 16])
    num_train_epochs = trial.suggest_int("num_train_epochs", 2, 4)
    
    training_args = TrainingArguments(
        output_dir="./results",
        eval_strategy="epoch",
        save_strategy="epoch",
        learning_rate=learning_rate,
        per_device_train_batch_size=per_device_train_batch_size,
        num_train_epochs=num_train_epochs,
        weight_decay=0.01,
        load_best_model_at_end=True,
    )

    metric = evaluate.load("accuracy")

    def compute_metrics(eval_pred):
        logits, labels = eval_pred
        predictions = np.argmax(logits, axis=-1)
        return metric.compute(predictions=predictions, references=labels)

    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_datasets,
        eval_dataset=tokenized_datasets,  
        tokenizer=tokenizer,
        compute_metrics=compute_metrics,
    )
    
    trainer.train()
    
    eval_metrics = trainer.evaluate()
    return eval_metrics["eval_accuracy"]

# run hyperparameter search
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=9)

# find the best parameter and accuracy
best_params = study.best_params
best_accuracy = study.best_value

print("Best hyperparameters:", best_params)
print("Best accuracy:", best_accuracy)

# train the model with the best hyperparameters
best_model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=4)
trainer = Trainer(
    model=best_model,
    args=TrainingArguments(
        output_dir="./results",
        eval_strategy="epoch",
        save_strategy="epoch",
        learning_rate=best_params["learning_rate"],
        per_device_train_batch_size=best_params["per_device_train_batch_size"],
        num_train_epochs=best_params["num_train_epochs"],
        weight_decay=0.01,
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_datasets,
    eval_dataset=tokenized_datasets,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)
trainer.train()

model_save_dir = "./model"
tokenizer.save_pretrained(model_save_dir)
best_model.save_pretrained(model_save_dir)

s3_client = boto3.client('s3')
bucket_name = 'llm-bucket-dsari' # please change this to your bucket name
model_save_path = 'model/'

for file_name in os.listdir(model_save_dir):
    s3_client.upload_file(
        os.path.join(model_save_dir, file_name),
        bucket_name,
        model_save_path + file_name
    )

Here we are using the same dataset and the same model. However, the main difference between this code and the one before is that here we define an objective function that takes an optuna trial as input, and we create a Trainer instance with the tuned hyperparameters and train the model.

Then, we create an optuna study and optimize the objective function. Finally, we retrieve the best hyperparameter and its accuracy. Note that the code could run for a bit longer than before, as it keeps running trials and the final results may vary. In this case, the final epoch reaches 94.6% accuracy.

Screenshot of the tuning output
Tuned output with increased accuracy.

 

Note that this is just an example of hyperparameter tuning and that there are many other methods that you can try, such as grid and random search or Bayesian optimization. Many machine learning frameworks and libraries already have built-in utilities for hyperparameter tuning, which makes it easier to apply in practice.

Previous resource
Training the LLM model
Next resource
Conclusion

This learning path is for operations teams or system administrators.

Developers might want to check out how to create a natural language processing (NLP) application using Red Hat OpenShift AI on developers.redhat.com.

Get started on developers.redhat.com

Hybrid Cloud Logo LinkedIn YouTube Facebook Twitter

Products

Tools

Try, buy, sell

Communicate

About Red Hat

We’re the world’s leading provider of enterprise open source solutions—including Linux, cloud, container, and Kubernetes. We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.