Overview

In this article we will look at how to configure OpenShift and VMware across failure domains. In the cloud, it is pretty simple: A failure domain is a region. Inside the data center, however, a failure domain could be a rack, a row, a building, even an entire data center. When deploying OpenShift on VMware, most everything is taken care of, such as: creating networking, virtual machines, storage, and so on. However, a key piece missing is creating failure domains and ensuring they are visible to OpenShift.

Not exposing the VMware topology and failure domains to OpenShift means that OpenShift will not know how to spread workloads across failure domains. As an example, it is entirely possible that a developer deploys a replica three application, and OpenShift schedules those pods on three different workers which happen to be on the same ESXi server.

To ensure high availability, we need to first configure DRS rules in VMware and ensure OpenShift masters as well as workers are spread across failure domains. Next, we need to ensure that the VMware failure domain topology is visible within OpenShift. Finally, we need to ensure proper roles and responsibilities are understood between VMware administrators, OpenShift administrators and developers, or application owners.

VMware dynamic resource sharing

VMware DRS enables the ability to create failure domains or groups of ESXi servers. This ensures OpenShift virtual machine nodes are spread across those domains or groups evenly and ensures a preferred ESXi server is set. As those virtual machines move around, they eventually go back to their original point of origin maintaining the topology integrity.

Configure DRS to balance the workloads across ESXi hosts:

Ensure that DRS is enabled:

Create VM groups for all the failure domains, select the VMs for each zone, and add them to the group:

        

Create host groups for each failure domain and assign the ESXi hosts to the zones:

        

Once all the VM and host groups are created, we are ready to create DRS rules:

Create VM/Host Rules for each zone:

OpenShift Topology

Topology is defined in OpenShift and Kubernetes using labels. A node can be labeled to illustrate its relationship to a failure domain defined in VMware. OpenShift provides a special node label called topology.kubernetes.io/zone. This is commonly used in cloud configurations to ensure that regions, for example, are exposed to end users. Here, we will create three zones that correlate to the groups or failure domains defined in VMware. To ensure workloads are spread across the topology or zones, a developer or application owner simply needs to use a selector in their application deployment. OpenShift provides an anti-affinity pod selector and the label topologyKey: "topology.kubernetes.io/zone". This ensures that pods scheduled from the scheduler (replicaset) will not land within the same failure domain or zone.  

Configure the nodes labels and placement rules for workloads:

1. Label the workers nodes with the proper topology label.

oc label node <worker-name> topology.kubernetes.io/zone=zone2

2. Show all labels on the worker node. The “topology.kubernetes.io/zone” label is set.

$ oc get nodes --show-labels |grep "worker-"
demo-d7zpb-worker-gn57w   Ready    worker   9d    v1.21.6+b82a451   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=demo-d7zpb-worker-gn57w,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=zone1
demo-d7zpb-worker-nxg7n   Ready    worker   9d    v1.21.6+b82a451   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=demo-d7zpb-worker-nxg7n,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=zone2
demo-d7zpb-worker-t6hps   Ready    worker   9d    v1.21.6+b82a451   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=demo-d7zpb-worker-t6hps,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=zone3

 

3. Deploy an application with the placement rule as shown below.

The configuration uses pod anti-affinity rule and specifies the toplogyKey.

kind: Deployment

spec:

template:

    affinity:
     podAntiAffinity:
       requiredDuringSchedulingIgnoredDuringExecution:
         - labelSelector:
             matchExpressions:
               - key: app
                 operator: In
                 values:
                   - test2
           topologyKey: topology.kubernetes.io/zone

4. There are 3 failure domains and 3 replicas, and pods are now distributed across all zones.

$ oc describe pods -n demo |grep Node:
Node:         demo-d7zpb-worker-t6hps/192.168.1.100
Node:         demo-d7zpb-worker-gn57w/192.168.1.75
Node:         demo-d7zpb-worker-nxg7n/192.168.1.76

Summary

In this article, we discussed the importance of failure domains in an environment where OpenShift workloads are running on VMware. Both OpenShift and VMware need to be made aware of each other's topology. Since on-premise failure domains have such a high degree of variance, this is a day-two operation and sadly something that is often missed. This article provided step-by-step instructions for properly configuring both OpenShift and VMware so failure domains are enforced. Following these recommended practices will improve application availability for OpenShift workloads running on VMware.