Kernel Module Management (KMM) Operator manages, builds, signs, and deploys out-of-tree kernel modules and device plugins on OpenShift Container Platform clusters.
KMM adds a new ManagedClusterModule for the hub/spoke scenario, which describes an out-of-tree kernel module and its associated device plugin. You can use ManagedClusterModule resources to configure how to load the module, define ModuleLoader images for kernel versions, and include instructions for building and signing modules for specific kernel versions.
KMM is designed to accommodate multiple kernel versions at once for any kernel module, allowing for seamless node upgrades and reduced application downtime.
For more information about it, refer to the product documentation available at Kernel Module Management Operator | Specialized hardware and driver enablement | OpenShift Container Platform 4.12.
Don't forget that KMM is a community project at kubernetes-sigs/kernel-module-management and that it can be tested on upstream Kubernetes. The project has a Slack community channel at Kubernetes Slack.
In order to make it easier, we're going to use some acronyms and products in this text. Here are the most common ones:
- KMM: Kernel Module Management
- ACM: Advance Cluster Management
- OpenShift: Red Hat's Kubernetes-based product
- Hub: Central management Cluster that via ACM manages some spokes
- Spoke: Cluster managed via ACM from a management cluster referred to as hub
- SNO: Single-node OpenShift
- CRD: Custom Resource Definition
- Edge: Relevant to Telco 5G and other use cases, refers to systems that are placed close to the end user making use of the services to get better performance
- OOT: Out-of-tree, refers to Kernel Module
The test goal
One of the new features coming in KMM 1.1 is the ability to work in hub-spoke architectures by leveraging Advanced Cluster Management capabilities. Deploying KMM in this hub/spoke architecture is straightforward.
This kind of setup is very common at the Edge, where data centers have more resourceful servers, while Edge devices are more resource-constrained. Everything that can be saved can be used to provide a better experience to the closest users.
In this mode, KMM is able to build new kernel drivers for specific releases on the hub cluster and deliver the built images to the spokes clusters. These use fewer resources and still benefit from hardware enablement, which is automatically updated for each kernel when a newer OpenShift version image is released.
The KMM team wanted to perform these tests and grab metrics on the behavior in a large-scale environment that allowed them to evaluate the action points on the hub/spoke scenario.
The goal for this test is to reach 1000 nodes deployed with the KMM module and monitor the actual resource utilization for the whole process.
The test environment consisted of 69 systems in total, all equal, and with the following specifications:
- Dell R650 with 512Gb of RAM, 3TB NVME, 2x1.8Tb SSD + 1x447GB SSD, powered by the Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz processor reported as 112 CPU's (dual processor, 56 cores, 112 threads).
In order to set up the environment, we used four hosts for base infrastructure, designed to become:
- One Bastion for deploying other hosts and interacting with the environment (Ansible playbook execution, mirror registry, etc.).
- Three nodes for creating an OpenShift Baremetal Compact cluster that will hold the hub cluster.
The remaining 65 hosts were configured as hypervisors for running virtual machines with KVM to be used as Single Node OpenShift (SNO) clusters.
|Bastion||Dell R650||112||512||446 SSD / 2x1.8TB SSD / 2.9TB NVMe||1|
|Hub BM Cluster||Dell R650||112||512||446 SSD / 2x1.8TB SSD / 2.9TB NVMe||3|
|SNO HV||Dell R650||112||512||44.6 SSD / 2x1.8TB SSD / 2.9TB NVMe||65|
|SNO VM||Libvert KVM||8||18||120||1755 max|
It might sound easy to achieve, but in reality, there are several things to take into account to set up the whole environment:
- Configure all the relevant hosts and get access to them using Ansible.
- Enable the required packages and services for virtualization.
- Define the network cards for the VMs to be interconnected and able to reach the hub.
- Have proper sizing and placing of the VMs so that the faster hard drives are used for holding the VM disks. This avoids the extra workload due to VM density causing disk pressure on the same drive.
Most of this logic is already present in the scripts at the https://github.com/redhat-performance/jetlag/ repository. It is closely tied to the environment that we used for this setup to prepare and configure the relevant resources.
We achieved 27 VMs per host, getting to a total of 1755 potential SNOs. Note the word potential here… we're setting up the infrastructure to mimic real hardware:
- Remotely power on and off the VMs (using sushy-tools) to interact with libvirt).
- KVM for running the VMs.
Each SNO is configured with 8 VCPU and 18Gb of RAM, which is in line with the bare minimum requirements for an OpenShift deployment.
For example, we had to alter the limits for the KMM deployment to allow it to use more memory during the module build. It was initially limited to 128Mb of RAM and was not enough for the compilation of the module.
Note that some other spokes will be discarded in the next paragraphs.
For this setup, and to avoid external issues, we decided to do the installation following the disconnected IPv6 approach, that is, the bastion is running an internal registry which is later configured in all clusters via a CatalogSource:
publisher: Red Hat
This process is performed automatically as part of the scripts used to deploy the environment. We wanted to validate and fix some hosts that failed to apply the policy that automatically was configuring it.
To prepare the setup, we also had to mirror the KMM operator into the registry. We also added the Grafana operator for reporting.
The Grafana instance
OpenShift already includes a Grafana dashboard, together with ACM, that can show some information about the bare-metal cluster used for management.
However, installing a custom instance is required to customize the dashboard. This custom instance is shown in the next screenshots to highlight the cluster behavior and the numbers obtained.
There were a total of 1,754 valid possible SNOs deployed and in operation with one SNO down due to a registration issue.
Of the 1,754 SNO spoke clusters, six were removed as they failed in different steps (test upgrade of OpenShift, operators in a non-working state, etc.).
For testing, 1,748 SNOs were used for deploying KMM. This is well over the 1,000 target set as a requirement prior to deployment and testing at scale.
For deploying KMM, the operator was first deployed on the hub.
Using ACM, we defined a policy to add it to the managed clusters, as seen in the next graphs.
First, the KMM Hub installation started at 11:42, and we can see some data on the following graph:
As we started the spokes installation in the period between 12:17 and 12:23, we see that after the initial setup of the hub and some activity during the spokes installation, the usage of resources is more or less steady. However, there is a bump around 13:30. This occurred after all the deployment activity was finished.
We compare this pattern with the SNOs resources usage:
The SNOs were active but without too much RAM or CPU load as a big part of the installation is just deploying the required components in the clusters.
If we focus on the average graphs to avoid the spikes of regular activity, patterns emerge:
...and for the spokes:
In general, the hub cluster has been unaffected. For the SNOs, there is a bit more activity in average and RAM usage. If we check the numbers, they are around 200Mb of RAM usage in difference. A small difference in terms of resource consumption.
For KMM controllers, we can see it clearly here:
The number of KMM controllers increased in a really short period of time as deployed to the total number of operative SNOs.
Now that KMM has been installed, we need to actually deploy a module. This causes KMM to compile it and prepare the relevant images that the spokes will consume.
The example used for this test is a really simple module called kmm-kmod, which is part of the KMM project. It just prints a “Hello World” and “Goodbye World” message to the kernel log, but it is suitable for testing building and loading kernel modules with KMM like any other production module would be.
In our case, the kernel module is compiled and installed prior to testing so we can compare the workload increase when the module is already prepared in the hub and ready for distribution.
The container that was already working in the hub started to be deployed and we can check the count change here:
Around 13:30, all SNOs deployed and started the module container. After repeating the graphs provided earlier for the whole period, we can see that the hub increased memory usage:
...and for the spokes:
Once KMM was installed, a bump in the memory (under 200Mb) happened but no appreciable change occurred once the module was loaded.
Note that in this situation, the hub creates the module (compilation, etc.), builds the image, and then ACM deploys to the spokes.
From the numbers, we see that 100% of the operative SNOs deployed the KMM controller and loaded the module within a really short timeframe with no noticeable impact.
One of the tests performed was to upgrade the kernel on the SNOs. This triggers KMM by doing the new module compilation and delivers the updated kernel module to the spoke clusters.
The kernel upgrade is performed as part of an OpenShift upgrade. In the testing, we performed a cluster upgrade on each SNO spoke cluster.
This might sound easy, but it means having to mirror the required images for the newer OpenShift release, apply an ICSP to all the SNOs, add the signatures for validating the new image, and launch the upgrade itself.
The hub started the upgrade at 16:20, starting with an OpenShift version 4.12.3 to the final version 4.12.6 which was previously mirrored, with the relevant signatures added, etc.
The upgrade took around 90 minutes in total for the hub. Once the hub upgraded to 4.12.6, it recompiled the kernel module.
We manually tested on the first SNO. It was launched for all the remaining and active nodes once it was validated.
The timing was the following:
- 18:22 First SNO (sno0001) and others start applying YAML for version signatures and for ICSP (just missing the upgrade itself)
- 18:24 Script to launch all the upgrades in parallel started
- 19:38 1500 updated, 249 in progress
- 19:49 4 nodes failed the OpenShift Upgrade
- 22:51 all SNOs, including the failing ones, are upgraded to 4.12.6
Out of the total count, four nodes didn't come back (API not responding). A manual reboot of the SNOs got them into API responding, and apparently good progress towards finishing the upgrade to the 4.12.6 release.
The upgrade itself caused no appreciable load on the hub, but as highlighted, several SNO hosts did not perform the upgrade properly.
Finally, a confirmation of what was expected:
The hub did perform a build once the first SNO required the new kernel, but the spokes did no builds at all (which is the exact use case of hub/spoke architecture).
Timeline and milestones
Timeline and milestones for the whole process:
- 11:42 KMM installation at hub.
- 12:17-12:23 KMM installation at SNOs (controller).
- 13:30 KMM KMOD added and deployed to all SNOs.
- 16:20 Hub OpenShift upgrade from 4.12.3 to 4.12.6.
- 17:50 Hub upgrade finished.
- 18:22 First SNO (sno0001) and others start applying YAML for version signatures and for ICSP (just missing the upgrade itself).
- 18:24 Script to launch all the upgrades in parallel started.
- 19:38: 1500 updated, 249 in progress.
- 19:49: Four nodes failed the OpenShift upgrade.
- 22:51: All SNOs, including the failing ones, are upgraded to 4.12.6.
Metrics about spokes:
|Spokes not working properly||7|
For the KMM Operator deployment at the hub, OperatorHub via UI has been used and a policy has been applied with oc client to deploy the controller at spokes:
|KMM DEPLOYMENT||Deployment time||CPU utilization post-deployment||MEM utilization post-deployment|
|After||<7 min||negligible||200 MB|
For the KMM-KMOD deployment, a ManagedClusterModule has been applied so the image is built on the hub and then deployed to all spokes:
|KMM KMOD Deployment||Build time||Deployment time||CPU utilization||MEM utilization|
|Hub||<2 mins||N/A||30% peak|
|Spoke (per)||N/A||< 1 min after build|
|Spoke (avg.)||N/A||< 1 min||0.08%||80Mb|
|Spoke (total)||N/A||11 mins||0.2%||No appreciable change in RAM usage|
For the KMM-KMOD upgrade, we used the different kernel versions between RHCOS shipped on OCP 4.12.3 and 4.12.6. Both hub and spokes were upgraded to 4.12.6, so the new kernel version was detected by KMM and a new KMM-KMOD was built at the hub and automatically deployed to all spokes:
Note that some values are reported as N/A for the spokes as the operation came as part of the OpenShift upgrade itself.
|KMM KMOD Upgrade||Build time||Deployment time||CPU utilization||MEM utilization|
|Hub||<2.5 mins||N/A||Peak of 60% (one host, one core) for a brief time during compilation||Peak of 4.3Gb and stabilizing on 3.3Gb|
As highlighted previously, this test was affected by several infrastructure issues:
- OpenShift deployment (SNO installation) by ACM.
- OpenShift Upgrades (SNOs failing in the upgrade and requiring manual intervention).
The infrastructure uses a dense number of VMs to simulate the clusters. This can cause:
- Bottlenecks on the network during the installation of SNOs.
- Netmask requiring adjustment when using IPv4 or IPv6.
- Using IPv6 requires using a disconnected environment, resulting in extra work for "simple" things like:
- Mirroring the images.
- Deploying new operators (like the custom Grafana for customizing the dashboard).
We contributed back to the jetlag repository for infrastructure issues found. However, there were a lot of manual tasks required for setting up the environment itself.
In the end, the KMM test was very smooth and the result was along the expected lines, which was little to no impact at all on the clusters.
We want to give special thanks to the following who helped contribute to the success of this endeavor:
- Scale Lab team
- Alex Krzos
- Edge Pillar, Partner Accelerator team