We are happy to announce that cgroup v2 is GA in OpenShift 4.13. It is not the default but can be picked at cluster installation time. See the OpenShift documentation to understand how to enable it on a cluster.

About cgroup v2

On Linux, control groups constrain resources that are allocated to processes.

The kubelet and the underlying container runtime (CRI-O) interface with cgroup to enforce resource management for pods and containers which includes cpu/memory requests and limits for containerized workloads.

There are two versions of cgroup in the Linux kernel: cgroup v1 and cgroup v2. cgroup v2 is the new generation of the cgroup API. Kubernetes took time in switching to v2 as it was waiting for the container runtimes to implement cgroup v2 support. It went GA in 1.25 in kubernetes upstream.

What is cgroup v2?

FEATURE STATE: OpenShift 4.13  [GA as non default]

Cgroup v2 is the next version of the Linux cgroup API. Cgroup v2 provides a unified control system with enhanced resource management capabilities.

Cgroup v2 offers several improvements over cgroup v1, such as the following:

  • Next generation of cgroups in the kernel. All new development happens in v2. 
  • Better node stability under OOM pressure scenarios.
  • Better page cache write-back accounting.
  • Current implementation in kubernetes and OpenShift is a 1:1 with v1 but it opens the door to start consuming new v2 specific features.

Upcoming Kubernetes features exclusively use cgroup v2 for enhanced resource management and isolation. For example, the MemoryQoS feature improves memory QoS and relies on newer cgroup v2 primitives. Other  improvements such as PSI and user space OOM Killer implementations are possible with cgroup v2. We continue to work on kubernetes upstream to add these features and make them available on OpenShift.

Conclusion 

We have seen better node stability with cgroup v2 when there is i/o pressure due to throttling. On cgroup v1 such nodes will go not ready but the node stays stable on v2. 

We highly recommend that most users switch to cgroup v2 as that’s where bug fixes and improvements will continue to land. Telco customers using the real time kernel or using the cpu load balancing disabled should continue to stay on v1 as we work to get those use cases  better covered with v2 in the kernel and in OpenShift.