Introduction

This blog provides a high-level overview of the OpenShift sandboxed containers operator, which is available as a tech-preview in OpenShift 4.8.  More information about the product is available in OpenShift sandboxed containers documentation

Using the operator, a cluster administrator will be able to add the Kata containers runtime and manage its life cycle. The operator allows installing, removing, and status monitoring of the Kata containers’ runtime installation. It does that by using the Operator Lifecycle Manager (OLM), a component of the Operator framework

In the following sections, we will take you through how Kata was historically deployed, how the operator deployment approach evolved, and how it converged to using RHCOS extensions

We will provide an overview of the operator architecture and present its key features and capabilities. 

Life Before the OpenShift Sandboxed Containers Operator

Other solutions to deploy Kata in Kubernetes clusters have been available for some time. They build and deploy the binaries (often statically linked) to nodes on a cluster and remove them. This is a good solution for CI and similar environments with throwaway infrastructure. However, what they lack and what the operator adds on top is software life-cycle management using tools that are supported by the operating system (the Machine Config Operator and rpm-ostree in the case of OpenShift sandboxed containers). 

Standing on the shoulders of OLM, the OpenShift sandboxed containers operator can handle day-two tasks such as upgrades. In the future, additional capabilities such as fine-grained configuration management of the runtime, cri-o, and others will be handled by the operator. 

From Proof of Concept to Tech Preview

The architecture of the OpenShift sandboxed containers operator has evolved over time. When we released the first alpha version alongside OpenShift 4.6, the installation method was different. A daemonset was used to deploy the Red Hat Package Manager packages (RPMs) on all worker nodes. Those RPMs themselves were stored in yet another container image that was downloaded by the daemonset and unpacked. 

The following diagram depicts this approach:

It was a flexible and simple solution; however, it had a lot of disadvantages as well. This is because  it added RPMs on top of the default installation, which could potentially leave the system in a “dirty” state (as new content was placed on the host). It directly interacted with the OS, and that could interfere with other OpenShift components. What we needed was a solution that was better integrated with the OpenShift node and the system architecture. This is where RHCOS extensions come into play. This  feature, s introduced in OpenShift 4.6, h allows users to enhance the core RHCOS functionality with small features by enabling an extension. For OpenShift 4.8, we added the sandboxed containers extension. By creating a machine config that enables the extension, the software that is required to run Kata Containers including QEMU and its dependencies is installed in a supported way by the Machine Config Operator (MCO). 

Enough history. Let’s take a look at the current architecture using the sandboxed-containers RHCOS extension.

Overall Operator Architecture 

The Operator Component

The following diagram shows how the operator components are connected to the OpenShift overall architecture: 

 

The diagram above shows the basic parts of OpenShift clusters. 

Let’s start with a quick summary of what the components are:

  • OpenShift clusters consist of master and worker nodes organized in machine config pools
  • The control-plane nodes run all the services that are required to control the cluster such as the API server, etcd, controller-manager, and the scheduler. The OpenShift sandboxed containers operator runs on a control plane node.
  • The cluster worker nodes (sometimes called data plane) run all the workloads requested by the user and fulfilled by the kubelet. The container engine CRI-O uses either the default container runtime runc or, in our case, the Kata containers runtime

KataConfig Custom Resources 

The operator owns and controls the Custom Resource Definition (CRD) called ‘KataConfig.’ When an instance of this CRD is created by a cluster admin, the sandboxed containers operator   ‘sees’ this because it watches the custom resource. When an instance of the KataConfig CRD is created, the RHCOS extension is enabled. OpenShift’s Machine Config operator handles the extension on the nodes. What does ‘handles it’ mean? When a MachineConfig object is created which enables the sandboxed-containers extension, it installs the RPMs in the extension on all nodes. .  he RPMs that are required for sandboxed containers are QEMU, kata-containers, and its dependencies. 

The following diagram describes interactions between the OpenShift sandboxed containers operator, the machine config operator, and the RHCOS extension: 

A few points to mention:

  • When the RHCOS extension is enabled, the runtime is installed on the nodes, but there is a missing link. CRI-O does not  know about the new runtime yet. But what is CRI-O? 

CRI-O implements the Kubernetes Container Runtime Interface (CRI) and by that it enables runtimes that are OCI compatible such as the Kata Containers runtime. The CRI lets the kubelet use different container runtimes. One task of the OpenShift sandboxed containers operator is to make sure CRI-O knows about the Kata containers runtime. To achieve this, it distributes a CRI-O configuration file to the worker nodes that adds the Kata Containers runtime to the list of runtime handlers. The runtime_handler is a CRI term and is passed to CRI-O via the kubelet.

Kata Runtime Class 

What’s a runtime class?

This brings us to the role of the runtime class. A runtime class let’s you select a container runtime configuration that will be used to run Pod's containers.

After the operator enables sandboxed containers, the last thing it does is create a runtime class with the name ‘kata’ (as the Kata Containers runtime is the only one it manages at the moment). The name of the runtime class is the same name as the handler in the CRI-O configuration file. This is what connects the runtime class and the Kata Container’s runtime.

To run sandboxed containers, all the user needs to do is to set the runtimeClassName in the pod’s specification to ‘kata’ or in the template of their favorite application. 

Why do we need a runtime class?

To be clear, the operator will not replace the default container runtime. It adds Kata containers as a secondary runtime that is chosen per deployment unit (Pods, Deployment, StatefulSet, ..)  by specifying the runtimeClassName. 

Using Runtime Classes for Separating Workloads 

A quote from the Kubernetes documentation:

You can set a different RuntimeClass between different Pods to provide a balance of performance versus security. For example, if part of your workload deserves a high level of information security assurance, you might choose to schedule those Pods so that they run in a container runtime that uses hardware virtualization. You would  then benefit from the extra isolation of the alternative runtime, at the expense of some additional overhead.

You can also use RuntimeClass to run different Pods with the same container runtime but with different settings.

With runtime classes, we can choose a different container runtime for different kinds of workloads. The runtime class specified will be used to run all the containers in a Pod. The case of Kata Containers is even mentioned as a reference example in the official documentation of runtime classes.

But wait. We learned above that the Kata Containers runtime can be installed on a subset of worker nodes. How do we make sure that a Pod with runtimeClassName: kata will run on such a node and not somewhere else?

Scheduling Containers to the Right Nodes

The runtimeclass has a field called scheduling. It lets admins completely separate sandboxed workloads from others and also scale trusted and untrusted workloads separate from each other.  For a developer, it means that he does not t have to worry about where his application runs.

By adding a nodeSelector to the scheduling field, we can let the scheduler know which nodes are capable of running containers with our runtime class ‘kata.’  The operator lets the user specify the node selector when he creates the Kataconfig Custom resource 

[Note: The tech-preview version of the operator has a bug. The workaround is to add the scheduling.NodeSelector field manually in the RuntimeClass after the operator created it.]

All nodes that have the Kata Containers runtime installed need to indicate this capability with a label. When a Pod is created, the control plane will decide which node the Pod is scheduled to. One factor in this decision is the runtimeClassName and its scheduling.NodeSelector field. 

Let’s go through an example with the operator:

  1. We have a label attached to the nodes where the sandboxed containers extension was enabled. The label is kata-oc=true because the operator has created a custom machine-config pool called ‘kata-oc’
  2. We create a RuntimeClass and set the.nodeSelector field to use kata-oc=true

  1. A user creates a Pod
  2. The scheduler fills in the Node field. It takes into account:
    1. runtimeClassName in the Pods spec
    2. the scheduling field in the RuntimeClass spec
  3. The pod is created on the node the scheduler selected for it using the Kata Containers runtime.

Pod Overhead

Pods require resources when they run, such as pause containers, system components like the container runtime, kubelet, and even kernel resources. Because sandboxed containers run in a virtual machine, they need more resources such as the memory that is required for the QEMU process and the guest OS. In the specification of the runtime class, we can specify the additional overhead in terms of memory and CPU utilization that sandboxed containers need on top compared to containers using the default runtime. The scheduler can then take this into account when looking for nodes with enough capacity. For sandboxed containers, we set the Pod overhead default to:

overhead:
   podFixed:
       memory: "350Mi"
       cpu: "250m"

The values above were used in our team’s testing. Changing those is possible but not recommended.  

Summary

We have scratched the surface of the OpenShift sandboxed containers operator, giving you an introduction to the topic and the basic functionality. With the operator, you can deploy a second runtime, in our case Kata Containers, the same way you deploy and manage any other application or service on OpenShift. In the following blog, we will go a bit deeper into the technology of the operator, such as what it does under the hood and the features we have planned for the future. Stay tuned. 


About the authors

Jens Freimann is a Software Engineering Manager at Red Hat with a focus on OpenShift sandboxed containers and Confidential Containers. He has been with Red Hat for more than six years, during which he has made contributions to low-level virtualization features in QEMU, KVM and virtio(-net). Freimann is passionate about Confidential Computing and has a keen interest in helping organizations implement the technology. Freimann has over 15 years of experience in the tech industry and has held various technical roles throughout his career.

Read full bio