As Artificial Intelligence (AI) and Machine Learning (ML) applications have become more popular today, the ability to serve and optimize a machine learning model becomes more necessary, especially on the new development platform like OpenShift, the enterprise Kubernetes platform. When people migrate their workload into OpenShift, the traditional application will be containerized into a container image where binaries, dependencies, model, and environmental variables will be encapsulated into one bundle. The same image will be deployed in the testing and production environments. This will decrease the amount of missing dependencies and misconfiguration issues that we usually see in deploying enterprise applications. On machine learning applications, the model is usually developed through a model training process and delivered to the application team. The application can either embed the model as a part of it, download the model during run time, or consume it through a Restful API.

In this article, we will focus on how the practitioner can serve a model on OpenShift using Seldon. Seldon is a part of Open Data Hub, the reference architecture for AI/ML Workflow on OpenShift. Seldon allows a user to serve and deploy machine learning models via multiple formats. It also provides different toolkits for model explainability and validation. These processes are usually done by the data scientist at the end of the model training in order to validate the model before deploying it. In the part 1 of this series, we will only focus on a simple model serving use case where we have a model that was trained in OpenShift, stored in an Object Storage Bucket, and served using Seldon Core API and Seldon Operator.

Components

Here is the list of components and how they were used in this article:

Name

Description

OpenShift (OCP)

Enterprise Kubernetes Platform

OpenDataHub (ODH)

AI/ML Reference Architecture in OpenShift

OpenShift Container Storage (OCS)

Container Storage Solution in OpenShift

OpenShift Client (OC)

OpenShift Client Command Tool

JupyterHub

Data Science Notebook Environment

Seldon Operator

OpenShift Operator for Seldon

Seldon Core API

Python API for Seldon

RedHat Universal Base Image (UBI) 

RedHat Container Base Image 

Model Training and Serving Workflow

Figure 1. Model Serving Workflow

The figure above shows a simple model training and serving workflow on OpenShift. The ML Practitioner can develop the model through an instance of Jupyter Notebook deployed within OpenShift and deploy the model to an S3 Object Storage Bucket. This bucket can be deployed using the OpenShift Container Storage (OCS) solution where the user can manage different instances and versions of the model. The ML Practitioner can also deploy a small bucket using Ceph Nano. Ceph Nano is a part of OpenDataHub (version 0.9.0 or later). It encapsulates the Ceph Storage Cluster into a minimal container and runs as a stand alone pod within OpenShift.

When the model is ready to be deployed, the user can start a building process from OpenShift Build to build an application image. The building process is implemented by the practitioner and OpenShift Administrator to wrap the model using the Seldon Core API and deploy it using the Seldon Server. The new image is built on top of the RedHat Python Universal Based Image with Python 3.6. When the building process is completed, it will store the image into an Internal Image Repository within OpenShift.

Finally, when the new image is deployed and available in OpenShift, the ML Practitioner can deploy it using Seldon Operator. The Seldon Operator will deploy the image using an Seldon Deployment object by encapsulating the image into a container with a Seldon Monitoring sidecar. Within the Seldon Deployment object description, we can also expose a set metrics that can be used on Prometheus for monitoring. In this article, we will only focus on a simple model serving scenario without monitoring. When this Seldon Deployment is deployed into the OpenShift environment, a new set of Deployment and Pod objects will be spinned up with all configured details that we provided on the Seldon Deployment object. Instead of using Seldon Operator, the ML Practitioner can also deploy a ML model using a simple Python Flask API. However, it will not include all features that Seldon provides such as Model Explainability and Model Monitoring that we described above.

Implementation Detail

Since this article will only focus on model serving, we will assume that the model development has been done by the ML Practitioner. The following steps below will help the Practitioner to serve the model into OpenShift.

Model Submission to S3 Object Bucket

Within the Jupyter Notebook, after the development is done, the ML Practitioner can submit the model to S3 bucket using the following function.

import boto
import os
import sys
import boto.s3.connection
from boto.s3.key import Key
import argparse

#max size in bytes before uploading in parts. between 1 and 5 GB recommended
MAX_SIZE = 20 * 1000 * 1000
#size of parts when uploading in parts
PART_SIZE = 6 * 1000 * 1000
#avoid files
BAD_FILES=['.gitkeep']

def percent_cb(complete, total):
  sys.stdout.write('.')
  sys.stdout.flush()

def upload_files(s3_host, s3_access_key, s3_secret_key, s3_bucket, file_path):

   conn = boto.connect_s3 (
      aws_access_key_id = s3_access_key,
      aws_secret_access_key = s3_secret_key,
      host = s3_host,
      is_secure=False,
      calling_format = boto.s3.connection.OrdinaryCallingFormat(),
  )

   bucket = conn.get_bucket(s3_bucket)
  for subdir, dirs, files in os.walk(file_path):
      for fileName in files:
          if fileName in BAD_FILES:
              continue
          full_path = os.path.join(subdir, fileName)         
    
          filesize = os.path.getsize(full_path)
          if filesize > MAX_SIZE:               
              mp = bucket.initiate_multipart_upload(full_path)
              fp = open(full_path,'rb')
              fp_num = 0
              while (fp.tell() < filesize):
                  fp_num += 1
                  print("uploading part %i" %fp_num)
                  mp.upload_part_from_file(fp, fp_num, cb=percent_cb, num_cb=10, size=PART_SIZE)

              mp.complete_upload()

           else:
              k = boto.s3.key.Key(bucket)
              k.key = full_path
              k.set_contents_from_filename(full_path,
                      cb=percent_cb, num_cb=10)
  conn.close()

The upload_files function above will take all files within the provided path and upload them into an Object Storage Bucket. The user also needs to provide all required connection information (such as host, access key, secret key, and bucket) in order for the code to access the bucket. For any large file, the function will upload the file on different segments. The max size for the file before the segmentation is 20 mb, and each segment will have a size of 6 mb. The user can adjust these sizes through MAX_SIZE and PART_SIZE variables. The BAD FILE variable is a list of all files that the process will skip. Before running the code, the user also needs to install the following package in order for the code to be compiled:

pip install boto3

Building Seldon Service Image from Model

In this part, we will build the image using Docker Strategy within OpenShift Build using BuildConfig object. To set up a BuildConfig (BC) object, the ML Practitioner can create it either using the OpenShift Client command or the description of the YAML file. The following OpenShift Client (oc) client will help the user to create the BC object:

oc new-build --strategy docker --docker-image registry.redhat.io/ubi8/python-36 --name <MODEL_NAME> -l app=<MODEL_NAME> --binary

The following Build Config will be created:

apiVersion: build.openshift.io/v1
kind: BuildConfig
metadata:
annotations:
  openshift.io/generated-by: OpenShiftNewBuild
labels:
  app: <MODEL_NAME>
name: <MODEL_NAME>
namespace: <NAMESPACE>
spec:
 failedBuildsHistoryLimit: 5
 nodeSelector: null
 output:
   to:
     kind: ImageStreamTag
     name: model-tester:latest
 postCommit: {}
 resources: {}
 runPolicy: Serial
 source:
   binary: {}
   type: Binary
 strategy:
   dockerStrategy:
     from:
       kind: ImageStreamTag
       name: python-36:latest
   type: Docker

For the next step, we need to create a Python class that will load the binary file and provide the prediction method. The purpose of this class is to map the logic of the model against the Restful interface. It also helps to enforce a format that all models need to follow in order for it to work on the Seldon Core server. In the model below, it will try to predict the weather condition based on the temperature. Different models may require different implementations for this class. In this class, we also use ‘keras’ as the main library to load the model. Different libraries may be used for different models.

from keras.models import load_model
import os
from os import environ

# Model file
prediction_model=os.getenv('MODEL','model.h5')

# The temperature that will be send within the HTTP Request Body
feature_name = “temperature”

class MyModel(object):
  result = {}   
  def __init__(self):
      print("Initializing with " + prediction_model + " model")
      self.model = load_model(prediction_model, compile=False)
      self.model._make_predict_function()  

  def predict(self, X, feature_names, **meta):       
      if len(X) > 0 and len(feature_names) > 0:
          idx = 0
          while idx < len(feature_names):

              if str(feature_names[idx]).lower() == feature_name:
                  try:
                      temp = int(X[idx])   
                  except ValueError as verr:       
                      self.result['error'] =  str('Please provide temperature value as Integer')   
                      return X
                  except Exception as ex:
                      self.result['error'] = str('Cannot not recognize ' + str(temp) + ' as temperature' )
                      return X

                   weather_state = self.model.predict(temp)                          
                  self.result['weather_state'] = str(weather_state)
              idx = idx + 1

       return X

   def tags(self):
      return self.result

In addition to the Python class, we also need to provide a ‘Dockerfile’ that captures the logic on how the class can be run within the container. The ‘requirement.txt’ below provides a list of all Python libraries that is required for the Python class to run.

 

Dockerfile

FROM image-registry.openshift-image-registry.svc:5000/openshift/python-36
USER root
COPY . /app
WORKDIR /app
EXPOSE 5000
RUN pip install --upgrade pip && pip install -r requirement.txt

ENV MODEL_NAME {MODEL_NAME}
ENV API_TYPE REST
ENV SERVICE_TYPE MODEL
ENV PERSISTENCE 0

CMD exec seldon-core-microservice $MODEL_NAME $API_TYPE --service-type $SERVICE_TYPE --persistence $PERSISTENCE

requirement.txt

seldon-core==1.2.2
keras==2.2.4
boto==2.49.0

With the Dockerfile, the building process will install all packages, including the Seldon Core API, and when the container starts, the Seldon Core Server will host the model as Restful Service.

When we have everything ready, the user can start the building process by going through the OpenShift console or using OpenShift Client command below:

oc start-build <MODEL_NAME> --from-dir=. --follow

When the process is completed, we will have a new application image in the internal OpenShift image registry.

Deploying Model Using Seldon

After the image is created, we need to deploy Seldon Operator into OpenShift Cluster. As Seldon is a part of OpenDataHub, we can either deploy it within OpenDataHub KfDef file, or we can deploy it separately within OperatorHub. The user can follow through the User Interface on the OpenShift Cluster to complete the installation.

After the operator is installed, the user can also go through the OpenShift Console to deploy an instance of Seldon Deployment using the YAML below.

Seldon Deployment Object

apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
annotations:
metadata:
 name: ${MODEL_NAME}
spec:
 name: ${GROUP_NAME}
 predictors:
   - componentSpecs:
     - spec:
         containers:
           - image: >-
               ${INTERNAL_IMAGE_URL}
             name: classifier
     graph:
       children: []
       endpoint:
         type: REST
       name: classifier
       type: MODEL
     name: ${GROUP_NAME}
     replicas: 1

After the Seldon Deployment object is created, a new pod will be spun up with the service so that users can start testing it. If the user gets access to the service from outside, they can expose the service by creating a Route object.

Validation

After the model is deployed, the user can either interact with the service directly from the intelligent application or they can validate it using curl command below:

curl -X POST -H 'Content-Type: application/json' -d '{"meta": { "tags": {}}, "data": {"names":["temperature"], "ndarray":["55"]}}'<ROUTE_URL>  --connect-timeout 300 --max-time 300

The service will return something similar like this:

{"data":{"names":[],"ndarray":["5"]},"meta":{"tags":{"weather_state":"HUMID"}}}

Model Management in OpenShift

There are various ways to maintain a model within OpenShift. The user can maintain the model as an object within S3 Object Storage Bucket and leverage the object version control within OCS to maintain its version that we are discussing in this article. The user can also leverage a more efficient tool such as Data Version Control (dvc) to version data as well as large model file or MLFlow to track model training and store artifact and model file. Depending on the data pipeline and model training processes that the ML Practitioner uses, one tool and model may work better than another. In this article, we want to focus on how we can maintain a model as a deployable artifact, and within OpenShift, the best practice is to maintain it through an application image. An application image will capture all dependencies for the model to be consumed as a Restful Service. There are no other dependencies that are required for the model to operate within OpenShift so that it will accelerate the model promotion between different environments and clusters.

Conclusion

The article goes over a series of steps that Machine Learning practitioners need to do in order to deploy a basic Machine Learning model into OpenShift using various tools. This step is made for practitioners to practice and gain understanding on how to interact with OpenShift. In a production system, all of these steps should be automated using one or several CI/CD pipelines to compile, deploy, and promote the model through different environments.

This article also does not discuss other aspects of having machine learning models in production such as model explainability or validation. Seldon provides different toolkits that can assist us with these use-cases in OpenShift. Some of these aspects will be covered in the future article of this series.