Introduction

We are proud to announce the general availability of Bring Your Own Host Support for Windows nodes to Red Hat OpenShift. With this offering you will be able to onboard your custom Windows nodes (aka “pets”) into an OpenShift cluster. We recognize customers have dedicated Windows server instances in their data centers that they regularly update, patch and manage. Often these instances run on vSphere, OpenStack or Bare Metal platforms. It is essential to take advantage of these servers to run containerized workloads so their computing power can be harnessed in a hybrid cloud world. Enabling the  Bring Your Own Host (BYOH) Support for these Windows servers can help customers lift and shift their on-premises workloads to a cloud native world.

Pets vs Cattle  (A brief history)

The BYOH feature will let users bring their “pet” Windows server instances to OpenShift. In the broadest sense, “cattle” are akin to the Machines in an OpenShift cluster that is managed by the Machine API and “pets” are just nodes that are not managed by the Machine API. Read more for a brief background on pets vs. cattle.

Pets

At the end of the day - in small quantities, pets are actually quite easy to take care of - when they are young, you take them to the vet for their shots. As they grow, you provide them with food, water, and a clean litter box (or take them outside once in a while) and they are pretty much "good to go". Like pets, you give traditional virtual machines their "shots" when they are first created (via Puppet, Chef, Ansible, or through manual updates) and they are pretty much "good to go".  Of course, if they get "sick", you take virtual machines to "the vet" - you log into them, troubleshoot problems, fix problems, or run update scripts. Usually by hand, or driven by some automation, but managed individually. The problem is raising pets in a house doesn't scale. It's hard to manage 2,000 cats and dogs in a house.

Cattle

Raising cattle is quite different from raising a household pet. It's actually quite a bit more complex. Cows, sheep, and chickens are raised on farms because it's more efficient. Farms are set up to handle the scale. This requires large amounts of land, tractors, fences, silos for grain/feed, specialized trailers for your truck, specialized train cars, and specialized processing plants. In addition, farms have to keep shifting which fields are used for grazing so that they don't become unusable over time.  Farms are more efficient, but quite a bit more expensive than a house to run day to day. Cloud platforms (e.g. OpenShift) are more akin to farms than houses. Firing up a cloud is like setting up a farm from scratch. It requires a lot of planning and execution. After firing up your cloud, there is constant technical care and maintenance - e.g. adding/removing storage, fixing hung instances, adding/removing VLANS, fixing pods stuck in a pending state, returning highly available services (Cinder, API nodes, OSE/Kube Master, Hawkular Metrics) back to production, upgrading the cloud platform, etc., etc. There is a lot of farm work with a cloud. Farms are quite efficient at raising thousands of animals. I do not think, however, that you just tear down an entire farm when it is no longer running in an optimal state - instead - you fix it.  Clouds are quite similar. 

Clouds are more work for operators, but less work for developers. Raising large amounts of chicken is harder for farmers and easier for consumers. The farmers hide the complexity from consumers.

Using the Windows Machine Config Operator (WMCO) to add and remove Windows nodes 

The intent of the Windows Machine Config Operator (WMCO) is to allow a cluster administrator to add a Windows compute node as a day 2 operation with a prescribed configuration to an OpenShift cluster (OpenShift Container Platform/OKD) and enable scheduling of Windows workloads. Windows instances can be added either by creating a MachineSet, or by specifying existing instances through a ConfigMap. Through either method, the Windows instance must have the Docker container runtime installed. The operator will do all the necessary steps to configure the instance so that it can join the cluster as a worker node. 

Prerequisites

Before setting up a Windows BYOH node to be a member of an OpenShift cluster, you must make sure that the prerequisites are in place. Some of these prerequisites need to be done as part of the cluster installation, and cannot be performed after installation. So take special note at the prerequisites, and make sure they are done at the time of installation.

The prerequisites are as follows:

  • OpenShift version 4.8+
  • Use OVNKubernetes as the SDN (needs to be set up at Installation)
  • Set up Hybrid-Overlay Networking (needs to be setup at Installation)
  • Windows Server 2019 version 2004
  • The instance must be on the same network as the Linux worker nodes in the cluster
  • Detailed Cluster and OS prerequisites
  • BYOH instance prerequisites

The Windows Server specific requirements will be covered in a later section. For more information on Windows Containers, please refer to the official documentation. Pay special attention to the networking setup installation requisites as, noted earlier, they can only be done at installation time.

Installing the WMCO

Once the cluster is installed and you’ve verified that the prerequisites were satisfied, you can proceed to install the Windows Machine Config Operator (WMCO). The WMCO is the entry point for OpenShift administrators who want to run containerized Windows workloads on their clusters. You can install this Operator as a Day 2 task from the Operator Hub.

Login to your cluster as a cluster administrator and navigate to Operators ~> OperatorHub using the left side navigation.

Now type Windows Machine Config Operator in the Filter by keyword…​ box. Click on the Windows Machine Config Operator card.

This will bring up the overview page. Here, you will click on the “Install” button.

On the “Install Operator” overview page, make sure you have “stable” selected in the "Update channel" section. Also, in the "Installation mode" section, leave “A specific namespace on the cluster” selected. Leave the "Installed Namespace" section as “Operator recommended Namespace” and tick on “Enable Cluster Monitoring”. Finally, leave the "Approval strategy" as “Automatic”. Then click “Install”

The "Installing Operator" status page will come up. This will stay up during the duration of the installation.

When the screen says "ready for use", the WMCO Operator is successfully installed.

You can verify the Operator has been installed successfully by checking to see if the Operator Pod is running.

$ oc get pods -n openshift-windows-machine-config-operator

NAME                                           READY   STATUS RESTARTS   AGE

windows-machine-config-operator-749bb9db45-7vzfh   1/1 Running   0      148m

Once the WMCO is installed and running, you will need to create/provide an SSH key for the WMCO to use. This same SSH key is also going to be installed on the Windows node and will be the way the WMCO configures it for OpenShift. You can use an existing SSH key. If you don’t have one, or would like to create one specifically for Windows Nodes, you can do so by running:

$ ssh-keygen -t rsa -f ${HOME}/.ssh/winkey -q -N ''

Once you have your key ready, add the private key as a secret for the WMCO to use in the openshift-windows-machine-config-operator namespace.

$ oc create secret generic cloud-private-key \

--from-file=private-key.pem=${HOME}/.ssh/winkey \

-n openshift-windows-machine-config-operator

Please note that you can only have one Windows Server SSH key pair in the cluster. There is no way, currently, to provide a key pair per individual Windows Server.

For more information about how to install the WMCO, please refer to the official documentation.

Setting Up The Windows Server 

The following table outlines the supported Windows Server version based on the applicable cloud provider. Note: Any unlisted Windows Server versions are NOT supported and will cause errors. To prevent these errors, only use the appropriate version according to the cloud provider in use.

Cloud ProviderSupported Windows Server Version
AWS

Windows Server Long-Term Servicing Channel (LTSC): Windows Server 1809



 

Azure

Windows Server Long-Term Servicing Channel (LTSC): Windows Server 1809



 

VMware vSphereWindows Server Semi-Annual Channel (SAC): Windows Server 2004

 

An Azure Example

To set up a Windows server instance on Azure, you need to ensure you select a supported server version (1809) and make sure that the instance is on the same network as the Linux worker nodes in the cluster.

Login to Azure and select a supported Windows Server version

For simplicity you can select the same Resource Group in which you created the OpenShift cluster. Also make sure you use lower case names for your instance name

Set up the Administrator account and click Next to select the Disks.

Select your Disks and then click Next to set up Networking. Ensure that you select the name vnet for the Windows server instance as that of your OCP cluster. Also place the instance on the worker subnet. Then create the VM.

After the VM is created, navigate to the Network Security Group and add the following rules for allowing network access.

Ensure you can RDP and SSH into the Virtual Machine. Note its private IP

 

An AWS Example

Login to AWS EC2 and select a supported Windows Server version:

Select same Network as OCP cluster and subnet for worker nodes:

After configuring custom storage and tags, select a security group:

Ensure the selected security group has an RDP inbound rule created:

Configure the instance to use an SSH key pair and then launch the instance:

Ensure you can RDP and SSH into the Virtual Machine. Note the instance’s Private IP:

 

A vSphere Example

Read here for trying the solution on vSphere

Adding The Windows Server As An OpenShift Node

Now that the WMCO is installed and the Windows Node is set up, you can add it as an OpenShift Node. The WMCO watches for a configMap named windows-instances to be created in the openshift-windows-machine-config-operator namespace, describing the instances that should be joined to a cluster. The required information to configure an instance is:

  • A DNS name to SSH into the instance with.
  • The name of the administrator user

Each Windows node entry in the data section of the ConfigMap should be formatted with the IP address of the Windows node as the key, and a value with the format of username=<username>. In my environment, the configmap looks like this.

kind: ConfigMap

apiVersion: v1

metadata:

  name: windows-instances

  namespace: openshift-windows-machine-config-operator

data:

  192.168.1.85: |-

    username=anachand

Save the configmap. It usually takes about 10-15 mins to prep and bring the Windows Server instance as a worker node to the OpenShift Cluster. 

You can look at the Operator logs to get a status of the BYOH node addition

After it is successfully added, you can navigate to Compute->Nodes and view the BYOH node.

Windows nodes can be removed by removing the instance's entry in the ConfigMap. This will revert the Windows node back to the state that it was in before, barring any logs and container runtime artifacts.

For example, in order to remove the instance 192.168.1.92 from the above example, the ConfigMap would be changed to the following:

kind: ConfigMap

apiVersion: v1

metadata:

  name: windows-instances

  namespace: openshift-windows-machine-config-operator

data:

  192.168.1.85: |-

    username=Administrator

In order for a Windows node to be cleanly removed, it must be accessible with the current private key provided to WMCO. 

Deploying a Sample Workload

Now that we have a Windows OpenShift Node up and running, we can test running a sample Windows Container. Before we begin, you’ll have to keep a few things in mind. The Windows Containers you want to run on a Windows Server need to be compatible with each other. For instance, you cannot run a Windows Container built on Server 2016 to run on Server 2019. Even more granular, you may not be able to run a container built on Windows 10 on Server 2019.  Please consult the official Microsoft documentation for more information about compatibility.

Another thing to keep in mind is that Windows Container images can be VERY large. In some cases, a small base image can be 8GB in size! This can pose trouble for the Kubernetes scheduler which has a default timeout of 2 minutes. To work around this, it’s suggested to pre-pull any base images you need. 

Lets deploy this sample Windows web server application. Note that it has a tolerance set to match the corresponding taint created by WMCO on the Windows node.

tolerations:

      - key: "os"

        value: "Windows"

        Effect: "NoSchedule"

Deploy the application using the following command

$ oc create -f WinWebServer.yaml

Expose the application as a service:

$ oc expose service win-webserver

In the OpenShift console, navigate to Networking->Routes and click on the following URL and you should be able to access the application.

Conclusion

With this General Availability of Bring Your Own Host Support for Windows nodes to Red Hat OpenShift, cluster administrators can onboard “pet” Windows nodes into an OpenShift cluster. 

The  Windows Machine Config Operator (WMCO) helps you add/remove Windows compute node to an OpenShift cluster (OpenShift Container Platform/OKD) and enable scheduling of Windows workloads. 

Stay tuned with what’s happening in the Windows Container by visiting this topic page. Please take the Windows Machine Config Operator for a spin from the on-cluster #OperatorHub and provide feedback by opening GitHub issues.