Background

Expanding an existing cluster on Baremetal can be challenging since it depends on the installation method used to deploy the cluster in the first place ( AI , IPi ,UPI). Here we are proposing a way to expand your cluster regardless of the deployment methodology. We are going to use the MultiCluster Engine ( MCE) to expand an existing cluster. Multicluster engine for Kubernetes provides the foundational components that are necessary for the centralized management of multiple Kubernetes-based clusters across data centers, public clouds, and private clouds. You can use the engine to create Red Hat OpenShift Container Platform clusters on selected providers, or import existing Kubernetes-based clusters. After the clusters are managed, you can use the APIs that are provided by the engine to distribute configuration based on placement policy. Placement policy is a significant part of creating sophisticated MultiCluster management applications because you can select the applicable clusters. One of the key advantages of MCE is that it came with the same subscription of Openshift meaning no additional cost.

Environment Setup

The environment we are using for this blog consists of a set of baremetal servers. Our existing cluster is a compact node cluster with three masters and no worker running on OCP 4.10.25. We plan to add three workers to this cluster using MCE.

Initial setup

$ oc get nodes
NAME STATUS ROLES AGE VERSION
master-0.noknom-aicli.hubcluster-1.lab.eng.cert.redhat.com Ready master,worker 6h24m v1.23.5+012e945
master-1.noknom-aicli.hubcluster-1.lab.eng.cert.redhat.com Ready master,worker 5h55m v1.23.5+012e945
master-2.noknom-aicli.hubcluster-1.lab.eng.cert.redhat.com Ready master,worker 6h23m v1.23.5+012e945

$ oc get clusterversions.config.openshift.io
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.10.25 True False 5h56m Cluster version is 4.10.25

Prerequisite

In order to add worker node you need some prerequisites in your infrastructure: You need an existing cluster or hub-cluster with MCE (Multicluster Engine for Kubernetes) version 2.1+ Persistent volume ( here we will use ceph storage using ODF )

Steps to follow to expand your cluster

  • Installing MCE operator and deploy the multi cluster engine
  • Enable the Central Infrastructure Manager( CIM) and Create the AgentServiceConfig
  • Import the existing cluster to MCE
  • Adding new worker

Installing the Multi cluster engine Operator

You can either install MCE using GUI or CLI, here we use CLI You need first to create the namespace then the

  • Install MCE Operator using CLI
    mce_operator.yaml:
apiVersion: v1
kind: Namespace
metadata:
name: multicluster-engine
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: multicluster-engine-cggrq
namespace: multicluster-engine
spec:
targetNamespaces:
- multicluster-engine
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: multicluster-engine
namespace: multicluster-engine
spec:
channel: stable-2.1
installPlanApproval: Automatic
name: multicluster-engine
source: redhat-operators
sourceNamespace: openshift-marketplace

$ oc apply -f mce_operator.yaml
  • Initially, there are two pods created:
$ oc -n multicluster-engine get pods  
NAME READY STATUS RESTARTS AGE
multicluster-engine-operator-7d58847669-rbxfx 1/1 Running 0 20s
multicluster-engine-operator-7d58847669-g2j9h 1/1 Running 0 20s
  • Enable and deploy MCE Plugins using CLI
apiVersion: multicluster.openshift.io/v1
kind: MultiClusterEngine
metadata:
name: multiclusterengine-sample
spec:
availabilityConfig: High
overrides:
components:
- enabled: true
name: assisted-service
- enabled: true
name: cluster-lifecycle
- enabled: true
name: cluster-manager
- enabled: true
name: discovery
- enabled: true
name: hive
- enabled: true
name: server-foundation
- enabled: true
name: cluster-proxy-addon
- enabled: false
name: managedserviceaccount-preview
- enabled: false
name: hypershift-preview
- enabled: true
name: console-mce
targetNamespace: multicluster-engine

$ oc apply -f multicluster-engine.yaml
  • Check MCE PODs
$ oc -n multicluster-engine get po 
NAME READY STATUS RESTARTS AGE
cluster-curator-controller-5d57b56b7f-gtkxj 1/1 Running 0 156m
cluster-curator-controller-5d57b56b7f-qnrvv 1/1 Running 0 156m
cluster-manager-5b4d79f7c9-48tm7 1/1 Running 0 156m
cluster-manager-5b4d79f7c9-5s6gt 1/1 Running 0 156m
cluster-manager-5b4d79f7c9-wqr45 1/1 Running 0 156m
cluster-proxy-86c5cb5b65-hdfdk 1/1 Running 0 155m
cluster-proxy-86c5cb5b65-tjnpm 1/1 Running 0 155m
cluster-proxy-addon-manager-6fd57776d9-4cb69 1/1 Running 0 156m
cluster-proxy-addon-manager-6fd57776d9-r8bbn 1/1 Running 0 156m
cluster-proxy-addon-user-5f5b4957f7-nn8nz 1/1 Running 0 155m
cluster-proxy-addon-user-5f5b4957f7-qt6k2 1/1 Running 0 155m
clusterclaims-controller-df8c484f9-q6jjx 2/2 Running 0 156m
clusterclaims-controller-df8c484f9-tnd7k 2/2 Running 0 156m
clusterlifecycle-state-metrics-v2-747dbd6f9f-cq5km 1/1 Running 0 156m
console-mce-console-87bbcfdfc-hs6s7 1/1 Running 0 156m
console-mce-console-87bbcfdfc-xhfrt 1/1 Running 0 156m
discovery-operator-8558fdc9c7-8xpf2 1/1 Running 0 156m
hive-operator-7978d99789-d5fbc 1/1 Running 0 156m
infrastructure-operator-fcddcfbf-qj4fs 1/1 Running 0 156m
managedcluster-import-controller-v2-8678fcfbf8-c49qc 1/1 Running 0 156m
managedcluster-import-controller-v2-8678fcfbf8-xmtq7 1/1 Running 0 156m
multicluster-engine-operator-7d58847669-g2j9h 1/1 Running 0 158m
multicluster-engine-operator-7d58847669-rbxfx 1/1 Running 0 158m
ocm-controller-5f4bf7d744-c8zgq 1/1 Running 0 156m
ocm-controller-5f4bf7d744-vlxv5 1/1 Running 0 156m
ocm-proxyserver-6d484c6776-kxgp5 1/1 Running 0 156m
ocm-proxyserver-6d484c6776-x8vzf 1/1 Running 0 156m
ocm-webhook-c9444bdcd-2g92q 1/1 Running 0 156m
ocm-webhook-c9444bdcd-2l8hx 1/1 Running 0 156m
provider-credential-controller-76fc6b89d4-gm7wv 2/2 Running 0 156m
  • Access MCE Console Gui
    mce-gui
  • Enable the Central Infrastructure Manager(CIM)
    Central Infrastructure Management (CIM) is an implementation of the assisted installer service in RHACM. CIM is not configured out of the box because it requires persistent storage. The CIM instance can be configured manually by running the next command on the hub cluster.

  • To enable the CIM service, complete the following steps:

    • Patching the provisioning-configuration
    • Allow the provisioning-configuration to watch all namespace
$  oc patch provisioning provisioning-configuration --type merge -p '{"spec":{"watchAllNamespaces": true }}'
  • Create the AgentServiceConfig
00_create-agent-service.yaml:
apiVersion: agent-install.openshift.io/v1beta1
kind: AgentServiceConfig
metadata:
name: agent
spec:
databaseStorage:
storageClassName: ocs-storagecluster-cephfs
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
filesystemStorage:
storageClassName: ocs-storagecluster-cephfs
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
imageStorage:
storageClassName: ocs-storagecluster-cephfs
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
osImages:
- openshiftVersion: "4.10"
version: "410.84.202207140725-0"
url: "https://mirror.openshift.com/pub/openshift-v4/x86_64/dependencies/rhcos/4.10/4.10.3/rhcos-4.10.3-x86_64-live.x86_64.iso"
rootFSUrl: "https://mirror.openshift.com/pub/openshift-v4/x86_64/dependencies/rhcos/4.10/4.10.3/rhcos-4.10.3-x86_64-live-rootfs.x86_64.img"
cpuArchitecture: "x86_64"

$ oc apply -f 00_create-agent-service.yaml
  • To validate that CIM is configured correctly, run the next commands on the hub cluster
$ oc get pods -n multicluster-engine | grep assisted
assisted-image-service-0 1/1 Running 0 5h42m
assisted-service-774fd45fdf-gd2ms 2/2 Running 0 5h42m

Import the existing cluster to MCE

In order to import the existing cluster below Custom Ressources Definition (CRDs ) are needed.

  • Create Namespace for imported cluster:
    01_create-ns.yaml:
apiVersion: v1
kind: Namespace
metadata:
name: noknom-aicli

$ oc apply -f 01_create-ns.yaml
  • Create PullSecret
    02_create-pull-secet.yaml:
apiVersion: v1
data:
.dockerconfigjson: xxxxxxxxxxxxxxxxxxxxxxxxxx
kind: Secret
metadata:
name: pull-secret
namespace: noknom-aicli
type: kubernetes.io/dockerconfigjson

$ oc apply -f 02_create-pull-secet.yaml
  • Create InfraEnv
    03_create-infraenv.yaml:
apiVersion: agent-install.openshift.io/v1beta1
kind: InfraEnv
metadata:
name: noknom-aicli-infraenv
namespace: noknom-aicli
labels:
agentclusterinstalls.extensions.hive.openshift.io/location: Dallas
networkType: static
spec:
pullSecretRef:
name: pull-secret
sshAuthorizedKey: 'ssh-rsa xxxxxxxxxxxxxxxxxxxxxxxx'
nmStateConfigLabelSelector:
matchLabels:
infraenvs.agent-install.openshift.io: noknom-aicli

$ oc apply -f 03_create-infraenv.yaml

Ssh public key is good to provide here instead from MCE Gui
Note: ignitionConfigOverride can be commented out. Above InfraEnv cfg is using network static.

  • Create Cluster Image Set
    04_create-cluster-image-set.yaml:
apiVersion: hive.openshift.io/v1
kind: ClusterImageSet
metadata:
name: openshift-v4.10.23
spec:
releaseImage: quay.io/openshift-release-dev/ocp-release:4.10.23-x86_64

$ oc apply -f 04_create-cluster-image-set.yaml
  • Create Agent Cluster Install
    AgentClusterInstall name should match the namespace name.
    05_create-agent-cluster-install.yaml:
apiVersion: extensions.hive.openshift.io/v1beta1
kind: AgentClusterInstall
metadata:
name: noknom-aicli
namespace: noknom-aicli
spec:
apiVIP: 192.168.24.79
clusterDeploymentRef:
name: noknom-aicli
imageSetRef:
name: openshift-v4.10.0
ingressVIP: 192.168.24.78
platformType: BareMetal
networking:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
serviceNetwork:
- 172.30.0.0/16
provisionRequirements:
controlPlaneAgents: 3
sshPublicKey: "ssh-rsa xxxxxxxxxxxxxxxxxxxxxxxxx"

$ oc apply -f 05_create-agent-cluster-install.yaml

Note: controlPlaneAgents must be 3 for compact cluster and SNO is 1
imageSetRef is from step 04. sshPublicKey is needed to provide.

  • Create Kubeconfig and kubeadmin user password
    06_create-kubeadmin-secrets.sh:
oc -n noknom-aicli create secret generic noknom-aicli-admin-kubeconfig --from-file=kubeconfig=/path/to/your/kubeconfig/noknom-kubeconfig
oc -n noknom-aicli create secret generic noknom-aicli-admin-password --from-literal=username=kubeadmin --from-literal=password=nHf3h-mU6HU-hiRwq-kMjc5

$ bash 06_create-kubeadmin-secrets.sh

Note: this step requires creating kubeconfig and kubeadmin user/password as secret when it needs to be used with the ClusterDeployment step(Import).

  • Create / Import Cluster (ClusterDeployment)
    07_create-cluster-deployment.yaml:
apiVersion: hive.openshift.io/v1
kind: ClusterDeployment
metadata:
name: noknom-aicli
namespace: noknom-aicli
spec:
baseDomain: hubcluster-1.lab.eng.cert.redhat.com
clusterInstallRef:
group: extensions.hive.openshift.io
kind: AgentClusterInstall
name: noknom-aicli
version: v1beta1
clusterMetadata:
adminKubeconfigSecretRef:
name: noknom-aicli-admin-kubeconfig
adminPasswordSecretRef:
name: noknom-aicli-admin-password
clusterID: "aff38079-7233-4e78-8561-121a61024965" # see below for clusterID value
infraID: "noknom-aicli-2w2kn" # see below for infraID value
clusterName: noknom-aicli
installed: true
platform:
agentBareMetal:
agentSelector:
matchLabels:
cluster-name: noknom-aicli
pullSecretRef:
name: pull-secret

$ oc apply -f 07_create-cluster-deployment.yaml
  • Info how to get clusterID and infraID:
    infraID: oc get infrastructure cluster -o json | jq .status.infrastructureName
    clusterID: oc get clusterversion version -o json | jq .spec.clusterID

Note: Most of the parameters on ClusterDeployment are referenced from previous steps

  • Validations to be done after ClusterDeployment Creation
oc get clusterdeployments.hive.openshift.io -A
NAMESPACE NAME INFRAID PLATFORM REGION VERSION CLUSTERTYPE PROVISIONSTATUS POWERSTATE AGE
noknom-aicli noknom-aicli noknom-aicli-jzkzb agent-baremetal 4.10.25 Provisioned 15h

Once the ClusterDeployment CR is created successfully you can see that a cluster is shown in the Multi cluster engine interface on the GUI and you can start importing the cluster.

mce-import-gui

import-cluster-button

Once you click on Import, it takes about 5-10 minutes(depend on your cluster size) then all Installed cluster hosts will be visible from the menus and MCE will manage new imported cluster

$ oc get mcl
NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE noknom-aicli true https://api.noknom-aicli.hubcluster-1.lab.eng.cert.redhat.com:6443 True True 4d16h

Now you are ready to add a new worker to the existing cluster.

Adding New Worker

In order to add a new worker you need first to add a host to the cluster you need first to add host to the infraenv. Click on host inventory then select the infrastructure.

mce-infraenv-add-host

For DHCP installation , you can click with Discovery ISO and download the ISO to install CoreOS but since we are using static IP configuration we will use NMState CRDs to add static information to the ISO for the new host.

worker-0-nmstate.yaml:

apiVersion: agent-install.openshift.io/v1beta1
kind: NMStateConfig
metadata:
labels:
infraenvs.agent-install.openshift.io: noknom-aicli
name: worker-0
namespace: noknom-aicli
spec:
config:
dns-resolver:
config:
server:
- 192.168.24.80
interfaces:
- ipv4:
address:
- ip: 192.168.24.86
prefix-length: 25
enabled: true
mac-address: b8:ce:f6:56:a9:ea
name: eno1
state: up
type: ethernet
routes:
config:
- destination: 0.0.0.0/0
next-hop-address: 192.168.24.1
next-hop-interface: eno1
interfaces:
- macAddress: b8:ce:f6:56:a9:ea
name: eno1

$ oc apply -f worker-0-nmstate.yaml

Note: If you have more than worker node to add/expand, then add them separately e.g. worker-1.
In our case we added three workers.

  • Download Discovery ISO for new worker node
    Get Route CA Secret and save to ca-trust path:
$ oc get secret router-ca -o json -n openshift-ingress-operator |jq -r ".data.\"tls.crt\""|base64 -d > /etc/pki/ca-trust/source/anchors/nokianom.crt
  • Update CA TRUST:
$ update-ca-trust
  • Get ISO URL Download Link:
$ oc get infraenv -A -o json|jq -r '.items[0].status.isoDownloadURL'
  • Start Download ISO using curl:
$ wget -O discovery.iso 'https://assisted-image-service-multicluster-engine.apps.noknom-aicli.hubcluster-1.lab.eng.cert.redhat.com/images/f826f079-1d00-42a4-bb32-0b8df4462499?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiJmODI2ZjA3OS0xZDAwLTQyYTQtYmIzMi0wYjhkZjQ0NjI0OTkifQ.J4v0kDXr8LiyUbauRgCjleYbj_DxuLvNlCCTKFLvuQyRPX4s3VnH7ypR7H7p4-JGZK7c4mhO7voeobTTPYwYlw&arch=x86_64&type=minimal-iso&version=4.10'

Note: if download ISO using curl not working, then just paste the https link to your browser to get ISO file(Make sure add fqdn to your /etc/hosts)

Boot ISO and Approve the new host

Boot ISO from the iLO or idrac or other BIOS HW via Virtual CD/DVD
First boot will discover the host and wait for the user to Approve the host to be available.

approve-host
host-available

  • Go back to main menu and click on Clusters->ClusterName
    Click on Add-Host from Menu:
    mce-cluster-addhost

If you have more than one host available, then click on ‘+’ to increase it.
add-host-submit

add-host-submit-2

After clicking on submit button, there will be two reboots before two CSRs appear to be Pending for approval manual from user. This CSR action is needed before a new host can join the existing cluster.

  • Approve CSR
$ oc get csr -A | grep Pending|awk '{print $1}' | xargs oc adm certificate approve
certificatesigningrequest.certificates.k8s.io/csr-kq2b5 approved
certificatesigningrequest.certificates.k8s.io/csr-lvptv approved

Note: there are a least 2 CSR that need to be approved, so keep checking

After Approved CSRs then check new worker is joint the imported cluster or not

$ oc get nodes
NAME STATUS ROLES AGE VERSION
master-0.noknom-aicli.hubcluster-1.lab.eng.cert.redhat.com Ready master,worker 147m v1.23.5+012e945
master-1.noknom-aicli.hubcluster-1.lab.eng.cert.redhat.com Ready master,worker 119m v1.23.5+012e945
master-2.noknom-aicli.hubcluster-1.lab.eng.cert.redhat.com Ready master,worker 147m v1.23.5+012e945
worker-0.noknom-aicli.hubcluster-1.lab.eng.cert.redhat.com Ready worker 15m v1.23.5+012e945
worker-1.noknom-aicli.hubcluster-1.lab.eng.cert.redhat.com NotReady worker 34s v1.23.5+012e945
worker-2.noknom-aicli.hubcluster-1.lab.eng.cert.redhat.com NotReady worker 36s v1.23.5+012e945
  • Validating the cluster after adding new workers
$ oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.10.25 True False False 4d20h
baremetal 4.10.25 True False False 4d21h
cloud-controller-manager 4.10.25 True False False 4d21h
cloud-credential 4.10.25 True False False 4d21h
cluster-autoscaler 4.10.25 True False False 4d21h
config-operator 4.10.25 True False False 4d21h
console 4.10.25 True False False 4d20h
csi-snapshot-controller 4.10.25 True False False 4d21h
dns 4.10.25 True False False 4d21h
etcd 4.10.25 True False False 4d21h
image-registry 4.10.25 True False False 4d20h
ingress 4.10.25 True False False 4d20h
insights 4.10.25 True False False 4d21h
kube-apiserver 4.10.25 True False False 4d20h
kube-controller-manager 4.10.25 True False False 4d21h
kube-scheduler 4.10.25 True False False 4d21h
kube-storage-version-migrator 4.10.25 True False False 4d21h
machine-api 4.10.25 True False False 4d21h
machine-approver 4.10.25 True False False 4d21h
machine-config 4.10.25 True False False 4d21h
marketplace 4.10.25 True False False 4d21h
monitoring 4.10.25 True False False 4d20h
network 4.10.25 True False False 4d21h
node-tuning 4.10.25 True False False 4d21h
openshift-apiserver 4.10.25 True False False 4d20h
openshift-controller-manager 4.10.25 True False False 4d2h
openshift-samples 4.10.25 True False False 4d21h
operator-lifecycle-manager 4.10.25 True False False 4d21h
operator-lifecycle-manager-catalog 4.10.25 True False False 4d21h
operator-lifecycle-manager-packageserver 4.10.25 True False False 4d21h
service-ca 4.10.25 True False False 4d21h
storage 4.10.25 True False False 4d21h

$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.10.25 True False 4d20h Cluster version is 4.10.25
$ oc get no
NAME STATUS ROLES AGE VERSION
master-0.noknom-aicli.hubcluster-1.lab.eng.cert.redhat.com Ready master,worker 4d21h v1.23.5+012e945
master-1.noknom-aicli.hubcluster-1.lab.eng.cert.redhat.com Ready master,worker 4d20h v1.23.5+012e945
master-2.noknom-aicli.hubcluster-1.lab.eng.cert.redhat.com Ready master,worker 4d21h v1.23.5+012e945
worker-0.noknom-aicli.hubcluster-1.lab.eng.cert.redhat.com Ready worker 4d21h v1.23.5+012e945
worker-1.noknom-aicli.hubcluster-1.lab.eng.cert.redhat.com Ready worker 4d21h v1.23.5+012e945
worker-2.noknom-aicli.hubcluster-1.lab.eng.cert.redhat.com Ready worker 4d17h v1.23.5+012e945

$ oc get bmh -A
NAMESPACE NAME STATE CONSUMER ONLINE ERROR AGE
openshift-machine-api master-0.noknom-aicli.hubcluster-1.lab.eng.cert.redhat.com unmanaged noknom-aicli-2w2kn-master-0 true 4d21h
openshift-machine-api master-1.noknom-aicli.hubcluster-1.lab.eng.cert.redhat.com unmanaged noknom-aicli-2w2kn-master-1 true 4d21h
openshift-machine-api master-2.noknom-aicli.hubcluster-1.lab.eng.cert.redhat.com unmanaged noknom-aicli-2w2kn-master-2 true 4d21h
openshift-machine-api worker-0.noknom-aicli.hubcluster-1.lab.eng.cert.redhat.com unmanaged noknom-aicli-2w2kn-worker-0-gsgj2 true 4d21h
openshift-machine-api worker-1.noknom-aicli.hubcluster-1.lab.eng.cert.redhat.com unmanaged noknom-aicli-2w2kn-worker-0-k484h true 4d21h
openshift-machine-api worker-2 externally provisioned noknom-aicli-worker-2 true 4d17h

$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-c1bf8e8c8b0094eb12ede70a11e83d4b True False False 3 3 3 0 4d21h
worker rendered-worker-4bcf9a7afd843619051835018677413a True False False 3 3 3 0 4d21h

$ oc get agent -A
NAMESPACE NAME CLUSTER APPROVED ROLE STAGE
noknom-aicli ebe515c4-f9c8-528d-9b21-3613a16336e5 noknom-aicli true worker Done

$ oc get mcl
NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE
noknom-aicli true https://api.noknom-aicli.hubcluster-1.lab.eng.cert.redhat.com:6443 True True 4d19h

$ oc get infraenvs.agent-install.openshift.io -A
NAMESPACE NAME ISO CREATED AT
noknom-aicli noknom-aicli-infraenv 2022-09-14T22:59:37Z

$ oc get agentserviceconfigs.agent-install.openshift.io
NAME AGE
agent 4d19h
$ oc get agentclusterinstalls.extensions.hive.openshift.io
NAME CLUSTER STATE
noknom-aicli-agent-cluster-install noknom-aicli adding-hosts

$ oc get machineset -A
NAMESPACE NAME DESIRED CURRENT READY AVAILABLE AGE
openshift-machine-api noknom-aicli-2w2kn-worker-0 2 2 2 2 4d21h

Conclusion

In this publication, we have provided a method on how to add worker node to your existing cluster ( Day-2 operation) using MCE.

Links