How to create and scale 6,000 virtual machines in 7 hours with Red Hat OpenShift Virtualization

In this learning path by Boaz Ben Shabat, learn how to create a large amount of virtual machines for a production infrastructure in less than a weekend’s worth of time. 

In this learning path by Boaz Ben Shabat, learn how to create a large amount of virtual machines for a production infrastructure in less than a weekend’s worth of time. 

Hour 4-6: Deploying OpenShift

2 hrs

Upon the successful deployment and setup of Red Hat® Ceph® Storage, it’s time to deploy Red Hat OpenShift® for the remaining clusters.

What will you learn?

  • Deploying Red Hat OpenShift for the purposes of this large scale example

What you need before starting:

  • Configured network and tuned profiles
  • Red Hat Ceph Storage deployed

OpenShift Deployment

In order to deploy the OpenShift cluster we used JetSki , a tool that utilizes IPI (Installer-provisioned installation) which is able to deploy any size OpenShift cluster. However there are other ways to deploy the cluster such as the Assisted Installer, which is an interactive installer wizard.

KubeletConfig

This Kubeletconfig that we applied is related to BZ#1984442. It enables us to achieve a more balanced distribution of VM pods across all nodes, this is not a must, but we would recommend it.

For more information please refer to the Knowledge-centered support (KCS) tuning guide.

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: custom-scheduling
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: enabled
  kubeletConfig:
    nodeStatusMaxImages: -1

Both custom Kubeletconfigs can be enabled for worker nodes by using a label:

oc label machineconfigpool worker custom-kubelet=enable

Note that KubeletConfig modifications will reboot the associated nodes.

Rate Limiter

To accommodate the potential surge in on-the-fly requests that a large-scale setup may generate during significant actions, we enhanced the rate limiting configuration. Specifically, we increased the number of queries per second (QPS) from 5 to 100, allowing the system to handle a higher request rate. Additionally, we raised the burst limit from 10 to 200, enabling the system to momentarily handle a burst of requests beyond the specified QPS threshold, for more information please refer to the KCS tuning guide.

The updated configuration was implemented using the following command, ensuring that the system is better equipped to manage and respond to a greater volume of requests during critical operations:

oc annotate -n kubevirt-hyperconverged hco kubevirt-hyperconverged hco.kubevirt.io/tuningPolicy='{"qps":100,"burst":200}' -n openshift-cnv

Now we just need to enable the new values through annotation:

oc patch -n kubevirt-hyperconverged hco kubevirt-hyperconverged --type=json -p='[{"op": "add", "path": "/spec/tuningPolicy", "value": "annotation"}]' -n openshift-cnv

Prometheus Config

By default, Prometheus stores metrics on ephemeral storage, which presents a limitation: any metrics data saved will be erased upon restarting or terminating the Prometheus pods. To overcome this challenge, we used a local persistent storage solution for the Prometheus database. This was achieved by allocating two SSDs, each with a 2.9TB capacity, on two designated worker nodes that host the Prometheus pods.

Taking into account the guidelines outlined in the database storage requirement documentation, we calculated that the allocation of 2.9TB proves to be adequately sufficient to comprehensively store all the metrics data amassed throughout the entirety of this release's scale test. This strategic storage approach ensures the preservation and accessibility of critical metrics data even in the face of pod restarts or terminations.

  1. To configure local volume for Prometheus pod, we first format the two SSDs on the worker nodes with:

    $ mkfs.ext4 /dev/nvme0n1 
  2. Create a mounting point:

    $ sudo mount /dev/nvme0n1 /var/promdb
  3. To make the mount persist after a reboot, we create a systemd unit file, note the name of the file must follow the mount path separated by ‘-’.

    vi var-promdb.mount
    [Unit]
            Description=Promethus Database Mount
            [Mount]
            What=/dev/disk/by-uuid/6b0e3351-1082-4ae7-a934-77a64cb886cd
            Where=/var/promdb
            Type=ext4
            Options=defaults
            [Install]
            WantedBy=multi-user.target
  4. And now we can enable those changes and make them persistent:

    $ systemctl enable var-promdb.mount
  5. Manually provision the PV using the following yaml:

    apiVersion: v1
            kind: PersistentVolume
            metadata:
              name: worker000-r650
            spec:
              capacity:
                storage: 2975Gi
              volumeMode: Filesystem 
              accessModes:
              - ReadWriteOnce
              persistentVolumeReclaimPolicy: Retain
              storageClassName: manual-pv
              local:
                path: /var/promdb
              nodeAffinity:
                required:
                  nodeSelectorTerms:
                  - matchExpressions:
                    - key: kubernetes.io/hostname
                      operator: In
                      values:
                      - worker000-r650

A parallel set of procedures needs to be executed for the second node, given the presence of two replicas of Prometheus pods. It's imperative to ensure consistency and redundancy in this configuration. 
Moreover, it's important to note that the default persistentVolumeReclaimPolicy is set to "delete," which necessitates an adjustment to ensure data retention. This precautionary measure involves changing the persistentVolumeReclaimPolicy to "retain" to safeguard against accidental deletion of the Persistent Volume (PV) or Persistent Volume Claim (PVC) and the consequential loss of valuable data.

  1. Label the node with the corresponding node name:

    oc label node worker000-r650 nodePool=promdb
    oc label node worker001-r650 nodePool=promdb
  2. We use the following ConfigMap yaml to bind Prometheus PVC to the PVs and attach Prometheus pods to the corresponding PVCs. Storage request size is set to 2975Gi to give prometheus pods the full claim of that  SSD.

    apiVersion: v1
            kind: ConfigMap
            metadata:
              name: cluster-monitoring-config
              namespace: openshift-monitoring
            data:
              config.yaml: |
                prometheusK8s:
                  retention: 12w
                  nodeSelector:
                    nodePool: promdb
                  volumeClaimTemplate:
                    metadata:
                      name: prometheusdb
                    spec:
                      storageClassName: manual-pv
                      resources:
                        requests:
                          storage: 2975Gi

By default, Prometheus retains the data for 15 days, we extended the retention time to 12 weeks.  nodePool label is used to ensure that Prometheus pods will always be scheduled on the worker nodes with the Prometheus database.  

Note that the above procedure can also be done using a machine config, we choose to do it in this manner in order to avoid the node reboots that occur when applying an MCP (machine configuration).

Multus  Secondary Network Config

When dealing with high-throughput applications, like managing 6000 VMs, it's advisable to use a dedicated network interface to ensure optimal performance and avoid bottlenecks. In this case, we utilized the available "ens1f1" interface as the dedicated NIC for the VMs' network operations. We achieved this by deploying the nmstate tool through the user interface (UI) for easy configuration and management, or by using the following steps:

  1. Create the namespace for nmstate:

    apiVersion: v1
    kind: Namespace
    metadata:
      labels:
        kubernetes.io/metadata.name: openshift-nmstate
        name: openshift-nmstate
      name: openshift-nmstate
    spec:
      finalizers:
      - kubernetes
  2. Add the subscription:

    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      labels:
        operators.coreos.com/kubernetes-nmstate-operator.openshift-nmstate: ""
      name: kubernetes-nmstate-operator
      namespace: openshift-nmstate
    spec:
      channel: stable
      installPlanApproval: Automatic
      name: kubernetes-nmstate-operator
      source: redhat-operators
      sourceNamespace: openshift-marketplace
  3. Create the operator group:

    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      annotations:
        olm.providedAPIs: NMState.v1.nmstate.io
      generateName: openshift-nmstate-
      name: openshift-nmstate-tn6k8
      namespace: openshift-nmstate
    spec:
      targetNamespaces:
      - openshift-nmstate
  4. Create an empty VLAN on top of ens1f1 using network bridge:

    Create the network bridge:
    apiVersion: nmstate.io/v1
    kind: NodeNetworkConfigurationPolicy
    metadata:
      name: linux-bridge
    spec:
      desiredState:
        interfaces:
        - bridge:
            options:
              stp:
                enabled: false
            port:
            - name: ens1f1
              vlan: {}
          description: Linux bridge on secondary device
          ipv4:
            dhcp: true
            enabled: true
          name: linbridge
          state: up
          type: linux-bridge
      nodeSelector:
        node-role.kubernetes.io/worker: ""
  5. Create the network attachment definition:

    apiVersion: "k8s.cni.cncf.io/v1"
    kind: NetworkAttachmentDefinition
    metadata:
      name: br-vlan
    spec:
      config: '{
        "cniVersion": "0.3.1",
        "name": "br-vlan-nad",
        "type": "cnv-bridge",
        "bridge": "linbridge",
        "macspoofchk": true,
        "vlan": 1
      }'

    VM YAML example:

     volumes:
          - name: node-os-vm-root-disk
            dataVolume:
              name: node-os-vm-root-disk
          - name: cloudinitdisk
            cloudInitNoCloud:
              networkData: |
                version: 2
                ethernets:
                  enp2s0:
                    addresses: [ 172.17.0.1/16 ]
              userData: |
                runcmd:
                  - ["sysctl", "-w", "net.ipv6.conf.enp2s0.disable_ipv6=1"]

Migration Settings

We also edited the hyperconverged-cluster-operator:

oc edit hco -n openshift-cnv kubevirt-hyperconverged

And set the following migration settings in order to increase the amount of parallel migrations:

  liveMigrationConfig:
    completionTimeoutPerGiB: 800 
    parallelMigrationsPerCluster: 25 # default 5
    parallelOutboundMigrationsPerNode: 5 # default 2
    progressTimeout: 150

Next is to get ready for cloning setup and virtual machine creation. 

Previous resource
Hour 3 - Storage deploy
Next resource
Hour 7 - Cloning setup and VMs

This learning path is for operations teams or system administrators
Developers may want to check out Foundations of OpenShift on developers.redhat.com.

Get started on developers.redhat.com

Hybrid Cloud Logo LinkedIn YouTube Facebook Twitter

Products

Tools

Try, buy, sell

Communicate

About Red Hat

We’re the world’s leading provider of enterprise open source solutions—including Linux, cloud, container, and Kubernetes. We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.