In a distributed bare metal environment, there have traditionally been two options for control plane architecture when dealing with large-scale OpenShift deployments:

  1. Run one control plane at a centralized location, with many remote workers.
  2. Run the control plane on each remote node (Single Node OpenShift).

Hosted Control Planes, based on the upstream HyperShift project, is currently in tech preview for bare metal, offering us a third option:

  1. Run many containerized control planes at a centralized location, each connected to a small number of remote workers.

arch-1

This approach can offer a few benefits:

  • Reduced resource consumption on the remote nodes compared to Single Node OpenShift.
  • Faster deployment times at the remote sites compared to Single Node OpenShift.
  • A much more scalable architecture when compared to a single control plane with many remote workers. Rather than having one control plane (limited to three nodes) serving hundreds of workers, you can deploy hosted control planes as needed, which run as pods on worker nodes as part of a centralized cluster. The number of worker nodes in this centralized cluster can be scaled as demand dictates.

A previous blog post has shown how to create Hosted Control Plane clusters using the CLI. In a production environment, you need something that allows you to scale your deployments consistently and reliably. This is where the OpenShift GitOps Operator, based on ArgoCD, comes in.

Setting up the management cluster

My management cluster consists of three bare metal servers. The cluster has ODF installed, which provides persistent storage. Persistent storage (meaning an available StorageClass) is required for ACM/MCE, as well as HyperShift (used to store the hosted cluster etcd database).

ACM/MCE

The management cluster requires either RHACM or the Multicluster Engine Operator. Once either of these operators is installed, we will enable the hypershift-preview feature, as well as CIM (Central Infrastructure Management), to allow zero-touch provisioning of bare metal worker nodes. Using CIM to provision bare metal workers is known as the Agent platform when dealing with Hosted Control Planes.

See this document for instructions on how to enable the hypershift-preview feature. Since ACM 2.7/MCE 2.2, the HyperShift Operator is automatically installed on the local-cluster after it is enabled.

See this document for instructions on how to enable Central Infrastructure Management (the Assisted Service). This feature will allow the management cluster to provision bare metal worker nodes by automatically mounting the installer ISO and installing OpenShift.

MetalLB

Since the control plane for our new cluster will be running as pods on the management cluster, we will use MetalLB to expose the API endpoint for the new cluster.

See this document for instructions on how to install the MetalLB Operator.

GitOps

See this document for instructions on how to install the GitOps Operator.

We will provide an install-config.yaml file as the values for a Helm chart that will deploy a hosted cluster on top of our management cluster. The GitOps Operator will apply the Helm chart for us and ensure that the specified configuration stays in sync with what is running on the management cluster.

DNS

The hosted cluster still needs DNS entries for api.<cluster-name>.<domain> and *.apps.<cluster-name>.<domain>.

Since we will be serving the API using MetalLB (layer 2) on the management cluster, the address we choose for api.<cluster-name>.<domain> needs to be in the same subnet as the management cluster.

*.apps.<cluster-name>.<domain> will be served using MetalLB running on the hosted cluster workers, therefore, that IP address needs to be in the same subnet as the hosted cluster workers.

Creating an Install Config

We can use a regular install-config.yaml file as the input to our Helm chart, with just a few additional parameters. Since the control plane will be hosted on the management cluster, we only need to specify worker nodes in our install config.

The install-config.yaml file needs to be stored in a Git repository. Since this file contains sensitive information, we'll keep it in a private repository that requires authentication to access.

apiVersion: v1

# Additional parameters for hypershift-helm
hypershift:
clusterImageSet: quay.io/openshift-release-dev/ocp-release:4.12.2-x86_64 # Required

baseDomain: <cluster_domain>
compute:
- name: worker
replicas: 1
metadata:
name: example-cluster-name
networking:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
networkType: OVNKubernetes
serviceNetwork:
- 172.30.0.0/16
platform:
baremetal:
apiVIP: <api_address> # Should be in the same subnet as the management cluster
ingressVIP: <ingress_address> # Should be in the same subnet as the hosted cluster worker nodes
hosts:
- name: openshift-worker-0
role: worker
bmc:
address: "redfish-virtualmedia://<bmc_ip_address>/redfish/v1/Systems/1"
username: <username>
password: <password>
bootMACAddress: <nic1_mac_address>
rootDeviceHints:
hctl: "1:0:0:0"
networkConfig:
interfaces:
- name: eno1
type: ethernet
macAddress: <nic1_mac_address>
state: up
ipv4:
enabled: true
dhcp: true
ipv6:
enabled: false
pullSecret: "<pull secret>"
sshKey: |
ssh-rsa ...

 

Creating the ArgoCD app

Since the release of OpenShift GitOps 1.8, the operator has supported the ability to define multiple sources for an application. This is important for our Helm chart, since your Values file (install-config.yaml) will be in a different repository than the Helm chart itself.

Since the Git repository that hosts our install-config.yaml requires authentication, we need to configure it with credentials in ArgoCD first. See the ArgoCD docs for instructions on adding a private repository.

Once we have created our install-config.yaml and committed it to our private repository, we can create an ArgoCD app:

cat << EOF | oc apply -f -
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: hypershift-cluster
namespace: openshift-gitops
spec:
destination:
server: https://kubernetes.default.svc
project: default
sources:
- repoURL: 'https://loganmc10.github.io/hypershift-helm'
chart: deploy-cluster
targetRevision: 0.1.12
helm:
valueFiles:
- $values/helm/install-config-hypershift.yaml
- repoURL: 'https://github.com/loganmc10/edge-installer-configs.git'
targetRevision: main
ref: values
EOF

In the example above, we are pulling the Helm chart from a publicly available repository, https://loganmc10.github.io/hypershift-helm, and we are pulling the install-config.yaml file from a private repository, https://github.com/loganmc10/edge-installer-configs.git. This Helm chart is not supported by Red Hat, it is just a personal project of mine.

The Helm chart is essentially a collection of YAML templates that use the install-config.yaml file as an input to create the required resources on the management cluster (BareMetalHosts, HostedCluster, NodePool, etc).

Syncing

Once we have created the ArgoCD app, we can log in to the Web UI to look at the status:

OutOfSync

The status will show "OutOfSync". Once we click "Sync", ArgoCD will go to work applying the Helm chart on our cluster.

The Helm chart will:

  • Create a new namespace for our cluster.
  • Create BareMetalHost, NMStateConfig, and InfraEnv objects, which will be used by ACM/MCE to provision our bare metal hosts and install OpenShift.
  • Create HostedCluster and NodePool objects. These objects define the configuration for our new hosted cluster.
  • Configure MetalLB on the management cluster to provide access to the hosted cluster API endpoint.
  • Scale the NodePool, which will allow the BareMetalHosts to be attached to the new cluster (Agent platform).
  • Install and configure MetalLB on the hosted cluster workers, to serve the Ingress endpoint.

After about 25-30 minutes, if we check the status of our app again, we will see:

Synced

We can check the status of our new hosted cluster via the CLI as well:

[loganmc10@fedoralogan ~]$ oc get hostedcluster,nodepool -n hyper
NAME VERSION KUBECONFIG PROGRESS AVAILABLE PROGRESSING MESSAGE
hostedcluster.hypershift.openshift.io/hyper 4.12.2 hyper-admin-kubeconfig Completed True False The hosted control plane is available

NAME CLUSTER DESIRED NODES CURRENT NODES AUTOSCALING AUTOREPAIR VERSION UPDATINGVERSION UPDATINGCONFIG MESSAGE
nodepool.hypershift.openshift.io/hyper hyper 1 1 False False 4.12.2

Accessing the cluster

To access the new hosted cluster, we need to get the kubeconfig:

oc get secret -n <cluster-name> <cluster-name>-admin-kubeconfig -o jsonpath='{.data.kubeconfig}' | base64 -d > ~/hosted-kubeconfig

We can then check the status of our new cluster:

[loganmc10@fedoralogan ~]$ export KUBECONFIG=~/hosted-kubeconfig 
[loganmc10@fedoralogan ~]$ oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
console 4.12.2 True False False 97s
csi-snapshot-controller 4.12.2 True False False 24m
dns 4.12.2 True False False 2m24s
image-registry 4.12.2 True False False 2m19s
ingress 4.12.2 True False False 23m
insights 4.12.2 True False False 3m3s
kube-apiserver 4.12.2 True False False 24m
kube-controller-manager 4.12.2 True False False 24m
kube-scheduler 4.12.2 True False False 24m
kube-storage-version-migrator 4.12.2 True False False 2m33s
monitoring 4.12.2 True False False 36s
network 4.12.2 True False False 3m14s
node-tuning 4.12.2 True False False 5m42s
openshift-apiserver 4.12.2 True False False 24m
openshift-controller-manager 4.12.2 True False False 24m
openshift-samples 4.12.2 True False False 2m
operator-lifecycle-manager 4.12.2 True False False 23m
operator-lifecycle-manager-catalog 4.12.2 True False False 23m
operator-lifecycle-manager-packageserver 4.12.2 True False False 24m
service-ca 4.12.2 True False False 3m1s
storage 4.12.2 True False False 24m
[loganmc10@fedoralogan ~]$ oc get node
NAME STATUS ROLES AGE VERSION
openshift-worker-0 Ready worker 9m10s v1.25.4+a34b9e9

 

Managing the cluster

In the future, if we want to update the cluster to a newer version, all we need to do is update our original install-config.yaml.

hypershift:
clusterImageSet: quay.io/openshift-release-dev/ocp-release:4.12.3-x86_64

 

Once we update the Git repository that stores the install-config.yaml file, ArgoCD will notice the change and mark the app as "OutOfSync". Re-syncing the app will cause ArgoCD to update the HostedCluster and NodePool resources, triggering an update to our hosted cluster. This approach will allow us to manage the lifecycle of the cluster the GitOps way.