This is a guest blog written by Dotscience's Luke Marsden, CEO and founder of Dotscience.

In this blog, we explain why you need DevOps for machine learning (also known as MLOps), what is the difference between regular DevOps for software engineering and DevOps for ML, and how DevOps for ML can be implemented with OpenShift + Dotscience.

Why MLOps?

Many people may ask, “What is MLOps (DevOps for ML), and why do I need it?” The data scientist may say, “DevOps is only for the engineers and IT.” The engineers may ask, “I know DevOps (the combining of software development and its deployment in production via IT operations), but why is it different for ML versus software engineering?” And managers may ask, “Is DevOps something urgent I need now, or is it something that would be nice to have in the future?” The answer to all of these is, “yes.” If you want to use ML in the real world to create value for your business, with reproducibility and accountability, then you need DevOps for ML.

There is a fundamental reason for this: Data science is iterative (Figure 1). At each of the major stages of the process (data preparation, model development, model serving, inferencing, and monitoring), issues can occur that require modifying one of the other stages. The most obvious issue is that the performance of a model in production degrades and the model has to be retrained. Another possible issues could be that a model worked well on its training, validation, and test data, but fails in production when inferencing. If this occurs, the user has to return and retrain the model or re-prepare the data. If the data has been prepared but a feature passed to the model turns out to be a cheat variable and leaks information from training set labels into the test set, unrealistic model performance can result. While these particular scenarios may not happen every time, in general some earlier step will require revisiting from a later step.

 

 

Figure 1: Data science is iterative at all stages.

The iterative nature of data science means that when you are adding AI to your business by using data science and machine learning, you cannot try experiments and build models and then “code it properly later” by handing off to an engineering team. The input data and the models are going to change, and business requirements and key personnel will change. Reflecting those changes in an updated workflow that was handed off as a finished, static, end-to-end piece of code will be a lot of work. What is needed is a process in which each component — the code, datasets, models, metrics, and run environment — are automatically tracked and versioned so that changes can be made quickly and easily to achieve accountability and auditability. The fact is, most companies, even the ones with data science teams, do not use such a process. As a result there are large amounts of ad hoc work and technical debt, which cost companies time and money in wasted opportunity.

So how does DevOps for ML help? In the 1990s, software engineering was siloed and inefficient. Releases took months to ship and involved many manual steps. Now, thanks to DevOps, and processes like continuous integration and continuous delivery (CI/CD), software can be shipped in seconds, because the steps involved are automated. At present, ML models are in a similar situation to software in the 1990s: Their creation is siloed and inefficient, they take months to ship into production, and they require many manual steps. At Dotscience, we believe that the same transformation that has taken place as a result of DevOps for software engineering can be achieved for ML, and our tool helps to lower the barriers to this transformation for businesses seeking to get more value from AI.

Through our collaboration with Red Hat, we’ve delivered a Dotscience MLOps pipeline on top of Red Hat OpenShift. This solution enables you to accelerate the development and delivery of ML models and AI-powered intelligent applications across data-center, edge, and public clouds.

The difference between DevOps and MLOps (DevOps for ML)

DevOps for ML, also known as MLOps, is different from the original DevOps because the data science and machine learning process is intrinsically complex in ways different from software engineering and contains elements that software DevOps does not. While software engineering is by no means easy or simple, data science and ML require the user to track several conceptually new parts of their activity that are fundamental to the workflow. These include data provenance, datasets, models, model parameters and hyperparameters, metrics, and the outputs of models in production. This is in addition to code versioning, compute environment, CI/CD, and general production monitoring. Table 1 summarizes this:

 

 

Table 1: Extra requirements of DevOps for ML versus DevOps for software

How can DevOps for ML be implemented?

So let’s say we are convinced of the need for DevOps for ML and would like to implement it. How can this be done?

 

 

Dotscience – MLOps platform for collaboration, deployment & tracking

Dotscience is an MLOps platform which delivers all of the requirements described above out of the box:

  1. Data provenance
  2. Data versioning
  3. Model versioning
  4. Hyperparameter and metric tracking
  5. Workflows

RedHat OpenShift – ML with the production-ready power of Enterprise Kubernetes

Containers and Kubernetes help accelerate machine learning lifecycle, as these technologies provide data scientists and software developers with agility, flexibility, portability, and scalability to train, test, and deploy ML models and associated intelligent applications in production. Red Hat OpenShift is the industry’s most comprehensive Kubernetes hybrid cloud platform. It provides the necessary benefits for machine learning by leaning on Kubernetes Operators, integrating DevOps capabilities, and integrating with GPU hardware accelerators. Red Hat OpenShift enables better collaboration between data scientists and software developers, accelerating the roll-out of intelligent applications across hybrid cloud.

Kubernetes Operators codify operational knowledge and workflows to automate the installation and lifecycle management of containerized applications with Kubernetes. For further details on Red Hat OpenShift Kubernetes Platform for accelerating AI/ML workflows, please visit the AI/ML on OpenShift webpage.

By configuring Dotscience to deploy ML models to OpenShift with the Dotscience OpenShift Operator, you combine the powerful, data scientist-friendly workflows from Dotscience, with Red Hat’s more reliable and scalable enterprise Kubernetes platform with integrated DevOps capabilities.

For example, from within Dotscience you can edit a ML model deployment and increase the number of replicas, and OpenShift will automatically scale the model horizontally across multiple GPU-capable servers. If a server fails, OpenShift can automatically reschedule the ML model pods across other available servers, driving less downtime for your critical AI-enabled business applications. And all this happens without the need to engage IT operations for scaling in and out.

DevOps for ML is appropriate for any real-world data science project, especially those that drive business value in production. Regular software engineering DevOps tools cannot be used because several intrinsically new concepts have to be tracked in DevOps for ML. While it is possible to implement it oneself using open source and/or available cloud tools, many businesses lack the time or expertise to do so. Products such as Dotscience on OpenShift can help such companies bridge the gap and derive greater value from their data via AI and machine learning.

Try Dotscience on OpenShift today

If you are convinced that an MLOps pipeline is critical to your business's successful adoption of AI/ML contact the Dotscience product solutions team for a demo. In the coming weeks, we'll publish a full how-to blog here, so stay tuned!

 


About the author

Red Hatter since 2018, tech historian, founder of themade.org, serial non-profiteer.

Read full bio