Subscribe to our blog

This is a guest blog by Dr. Ronen Dar, CTO and co-founder of Run:AI.

At each stage of the machine learning / deep learning (ML/DL) process, researchers are completing specific tasks, and those tasks have unique requirements from their AI infrastructure. Data scientists’ compute needs are typically aligned to the following tasks:

  • Build sessions - where data scientists consume GPU power in interactive sessions to develop and debug their models. These sessions require instant and always-available GPUs, but require low compute and memory. 
  • Training models - DL models are generally trained in long sessions. Training is highly compute-intensive, can run on multiple GPUs, and typically requires very high GPU utilization. Performance (in terms of time-to-train) is highly important. In a project lifecycle, there are long periods of time during which many concurrent training workloads are running (e.g. while optimizing hyperparameters) but also long periods of idle time in which only a small number of experiments are utilizing GPUs.
  • Inference -  in this phase of development, trained DL models are inferencing on requests from real-time applications or from periodical systems that apply offline batch inferencing. Inference typically induces low GPU utilization and memory footprint (compared to training sessions).

image1-Aug-25-2021-06-54-13-64-PM

Two for me, two for you

As you can see in the chart above, in order to accelerate the development of AI research, data scientists need access to as much or as little GPU as their development phase requires. Unfortunately, the standard Kubernetes scheduler typically allocates static, fixed numbers of GPU – ‘two for me, two for you’. This “one size fits all” approach to scheduling and allocating GPU compute means that if scientists are building models or working on inferencing they have too many GPUs, but when they try to train models they often have too few GPUs. In these environments, a significant number of GPUs remain idle at any given time. Simple resource sharing, like using another data scientist’s GPUs while they are idle, is not possible with static allocations.

Making GPU allocations dynamic

At Run:AI, we envisaged a scheduler that implements a concept which we refer to as “guaranteed quotas” to solve the static allocation challenge. Guaranteed quotas let users go over their static quota as long as idle GPUs are available. 

How does a guaranteed quota system work?

Guaranteed quotas of GPUs, as opposed to projects with a static allocation, can use more GPUs than their quota allows. So ultimately the system allocates available resources to a job submitted to a queue, even if the queue is over quota. In cases where a job is submitted to an under-quota queue and there are not enough available resources to launch it, the scheduler starts to become smarter and pause a job from a queue that is over quota, while taking priorities and fairness into account.

Guaranteed quotas essentially break the boundaries of fixed allocations and make data scientists more productive, freeing them from limitations of the number of concurrent experiments they can run, or the number of GPUs they can use for multi-GPU training sessions. This greatly increases utilization of the overall GPU cluster. Researchers accelerate their data science and IT gains control over the full GPU cluster. Better scheduling greatly increases the utilization of the full cluster. 

How do Run:AI and Red Hat Openshift Container Platform work together?

Run:AI creates an acceleration layer over GPU resources that manages granular scheduling, prioritization and allocation of compute power. A kubernetes-based dedicated batch scheduler, running on top of OCP, manages GPU-based workloads. It includes mechanisms for creating multiple queues, setting fixed and guaranteed resource quotas, and managing priorities, policies, and multi-node training. It provides an elegant solution to simplify complex ML scheduling processes. Companies using OpenShift-managed Kubernetes clusters can easily install Run:AI by using the operator available from the Red Hat Container Catalog.

Learn more about advanced scheduling for GPUs on OpenShift Container Platform at www.run.ai


About the author

Red Hatter since 2018, tech historian, founder of themade.org, serial non-profiteer.

Read full bio

Browse by channel

automation icon

Automation

The latest on IT automation that spans tech, teams, and environments

AI icon

Artificial intelligence

Explore the platforms and partners building a faster path for AI

open hybrid cloud icon

Open hybrid cloud

Explore how we build a more flexible future with hybrid cloud

security icon

Security

Explore how we reduce risks across environments and technologies

edge icon

Edge computing

Updates on the solutions that simplify infrastructure at the edge

Infrastructure icon

Infrastructure

Stay up to date on the world’s leading enterprise Linux platform

application development icon

Applications

The latest on our solutions to the toughest application challenges

Original series icon

Original shows

Entertaining stories from the makers and leaders in enterprise tech