In this article, I will describe the procedure for disaster recovery of the loss of one master node.
To begin, remember that in any disaster, a backup of the platform is of paramount importance for recovery.
So before going ahead, check out part 1 of the disaster recovery series in which I explain setting up and automatic procedure for generating backups of ETCD.
OCP Disaster Recovery Part 1 - How to create Automated ETCD Backup in OpenShift 4.x
With the backup of ETCD done, the next steps will be essential for a successful recovery.
NOTE: It is only possible to recover an OpenShift cluster if there is still a single integral master left. If you have lost all master nodes, the following steps cannot be replicated successfully.
This solution has been tested from versions 4.7 onwards.
When you lose more than one master node, the OpenShift API will be completely offline.The following steps will be used for recovering a single master node. Here is the full online state of the cluster before we begin:
The cluster is functional with all the machines in the deployment:
````
$ oc get nodes
NAME STATUS ROLES AGE VERSION
zmaciel-f9fbb-master-0 Ready master 2d8h v1.20.0+551f7b2
zmaciel-f9fbb-master-1 Ready master 2d8h v1.20.0+551f7b2
zmaciel-f9fbb-master-2 Ready master 2d8h v1.20.0+551f7b2
zmaciel-f9fbb-worker-52tds Ready worker 2d8h v1.20.0+551f7b2
zmaciel-f9fbb-worker-nxhw8 Ready worker 2d8h v1.20.0+551f7b2
````
Online machines are validated:
````
$ oc get machines -A -ojsonpath='{range .items[*]}{@.status.nodeRef.name}{"\t"}{@.status.providerStatus.instanceState}{"\n"}' | grep -v running
zmaciel-f9fbb-master-0 poweredOn
zmaciel-f9fbb-master-1 poweredOn
zmaciel-f9fbb-master-2 poweredOn
zmaciel-f9fbb-worker-52tds poweredOn
zmaciel-f9fbb-worker-nxhw8 poweredOn
````
And the cluster operators are available:
````
$ oc get clusteroperators
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
authentication 4.7.3 True False False 39h
baremetal 4.7.3 True False False 2d8h
cloud-credential 4.7.3 True False False 2d8h
cluster-autoscaler 4.7.3 True False False 2d8h
config-operator 4.7.3 True False False 2d8h
console 4.7.3 True False False 24h
csi-snapshot-controller 4.7.3 True False False 2d8h
dns 4.7.3 True False False 2d8h
etcd 4.7.3 True False False 2d8h
image-registry 4.7.3 True False False 2d8h
ingress 4.7.3 True False False 24h
insights 4.7.3 True False False 2d8h
kube-apiserver 4.7.3 True False False 2d8h
kube-controller-manager 4.7.3 True False False 2d8h
kube-scheduler 4.7.3 True False False 2d8h
kube-storage-version-migrator 4.7.3 True False False 24h
machine-api 4.7.3 True False False 2d8h
machine-approver 4.7.3 True False False 2d8h
machine-config 4.7.3 True False False 2d8h
marketplace 4.7.3 True False False 24h
monitoring 4.7.3 True False False 2d8h
network 4.7.3 True False False 2d8h
node-tuning 4.7.3 True False False 2d8h
openshift-apiserver 4.7.3 True False False 2d8h
openshift-controller-manager 4.7.3 True False False 2d8h
openshift-samples 4.7.3 True False False 2d8h
operator-lifecycle-manager 4.7.3 True False False 2d8h
operator-lifecycle-manager-catalog 4.7.3 True False False 2d8h
operator-lifecycle-manager-packageserver 4.7.3 True False False 2d8h
service-ca 4.7.3 True False False 2d8h
storage 4.7.3 True False False 2d8h
````
NOTE: For this article, we used a cluster OpenShift 4.7.3 IPI in vSphere.
Verifications
Let’s showcase the state of the cluster prior to going into recovery.
The master-1 machine was lost:
````
$ oc get nodes
NAME STATUS ROLES AGE VERSION
zmaciel-f9fbb-master-0 Ready master 2d8h v1.20.0+551f7b2
zmaciel-f9fbb-master-1 NotReady master 2d8h v1.20.0+551f7b2
zmaciel-f9fbb-master-2 Ready master 2d8h v1.20.0+551f7b2
zmaciel-f9fbb-worker-52tds Ready worker 2d8h v1.20.0+551f7b2
zmaciel-f9fbb-worker-nxhw8 Ready worker 2d8h v1.20.0+551f7b2
````
Operators that have pods running on master nodes start to get degraded because they need three pods running:
````
$ oc get clusteroperators
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
authentication 4.7.3 False False True 13m
baremetal 4.7.3 True False False 2d8h
cloud-credential 4.7.3 True False False 2d8h
cluster-autoscaler 4.7.3 True False False 2d8h
config-operator 4.7.3 True False False 2d8h
console 4.7.3 True False False 24h
csi-snapshot-controller 4.7.3 True False False 2d8h
dns 4.7.3 True False False 2d8h
etcd 4.7.3 True False True 2d8h
image-registry 4.7.3 True False False 2d8h
ingress 4.7.3 True False False 24h
insights 4.7.3 True False False 2d8h
kube-apiserver 4.7.3 True False True 2d8h
kube-controller-manager 4.7.3 True False True 2d8h
kube-scheduler 4.7.3 True False True 2d8h
kube-storage-version-migrator 4.7.3 True False False 24h
machine-api 4.7.3 True False False 2d8h
machine-approver 4.7.3 True False False 2d8h
machine-config 4.7.3 False False True 113s
marketplace 4.7.3 True False False 24h
monitoring 4.7.3 False True True 6m45s
network 4.7.3 True False False 2d8h
node-tuning 4.7.3 True False False 2d8h
openshift-apiserver 4.7.3 True False True 11m
openshift-controller-manager 4.7.3 True False False 2d8h
openshift-samples 4.7.3 True False False 2d8h
operator-lifecycle-manager 4.7.3 True False False 2d8h
operator-lifecycle-manager-catalog 4.7.3 True False False 2d8h
operator-lifecycle-manager-packageserver 4.7.3 True False False 9m35s
service-ca 4.7.3 True False False 2d8h
storage 4.7.3 True False False 6m38s
````
The OpenShift cluster is available, but in the Control Plane, the operators mentioned above are degraded and begin to generate critical alerts:
The Master's MachineConfigPool enters the updating process to try to recover the lost machine. However, it is unsuccessful and the Operator Machine-Config is degraded:
````
$ oc get machineconfigpool
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-8ee9ac3bef7c772854cb539086c44835 False True False 3 2 3 0 2d8h
worker rendered-worker-1f0162bfc17dded5a238424783fb5b36 True False False 2 2 2 0 2d8h
````
Determine if the machine is not running:
````
$ oc get machines -A -ojsonpath='{range .items[*]}{@.status.nodeRef.name}{"\t"}{@.status.providerStatus.instanceState}{"\n"}' | grep -v running
zmaciel-f9fbb-master-0 poweredOn
zmaciel-f9fbb-master-1 poweredOff
zmaciel-f9fbb-master-2 poweredOn
zmaciel-f9fbb-worker-52tds poweredOn
zmaciel-f9fbb-worker-nxhw8 poweredOn
````
The machine with a status other than powerOn or Running is the machine that is not working.
Identifying an unhealthy etcd member:
````
$ oc get etcd -o=jsonpath='{range .items[0].status.conditions[?(@.type=="EtcdMembersAvailable")]}{.message}{"\n"}'
2 of 3 members are available, zmaciel-f9fbb-master-1 is unhealthy
````
With the result above, we identified that one of the members of the ETCD is not available.
Determine if the node is not ready:
````
$ oc get nodes -o jsonpath='{range .items[*]}{"\n"}{.metadata.name}{"\t"}{range .spec.taints[*]}{.key}{" "}' | grep unreachable
zmaciel-f9fbb-master-1 node-role.kubernetes.io/master node.kubernetes.io/unreachable node.cloudprovider.kubernetes.io/shutdown node.kubernetes.io/unreachable
````
Recovering the Failed Master Node
After verifying everything above, you can begin the failed master node recovery procedure.
First, remove the unhealthy member.
a. In a terminal that has access to the cluster as a cluster-admin user, run the following command:
````
$ oc get pods -n openshift-etcd | grep -v etcd-quorum-guard | grep etcd
etcd-zmaciel-f9fbb-master-0 3/3 Running 0 2d8h
etcd-zmaciel-f9fbb-master-1 3/3 Running 0 2d8h
etcd-zmaciel-f9fbb-master-2 3/3 Running 0 2d8h
````
b. Connect to the running ETCD container, and pass in the name of a pod that is not on the affected node:
````
$ oc project openshift-etcd
Now using project "openshift-etcd" on server "https://api.zmaciel.rhbr-lab.com:6443".
$ oc rsh etcd-zmaciel-f9fbb-master-0
Defaulting container name to etcdctl.
Use 'oc describe pod/etcd-zmaciel-f9fbb-master-0 -n openshift-etcd' to see all of the containers in this pod.
````
c. View the member list:
````
sh-4.4# etcdctl member list -w table
+------------------+---------+------------------------+----------------------------+----------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+------------------------+----------------------------+----------------------------+------------+
| 4319119f2850cd6a | started | zmaciel-f9fbb-master-0 | https://10.36.250.63:2380 | https://10.36.250.63:2379 | false |
| 654b8780898910de | started | zmaciel-f9fbb-master-1 | https://10.36.250.177:2380 | https://10.36.250.177:2379 | false |
| 88d623c9d503fcb1 | started | zmaciel-f9fbb-master-2 | https://10.36.250.77:2380 | https://10.36.250.77:2379 | false |
+------------------+---------+------------------------+----------------------------+----------------------------+------------+
````
d. Remove the unhealthy ETCD member by providing the ID:
````
sh-4.4# etcdctl member remove 654b8780898910de
Member 654b8780898910de removed from cluster 9e399fdce41c910d
````
e. View the member list again and verify that the member was removed:
````
sh-4.4# etcdctl member list -w table
+------------------+---------+------------------------+---------------------------+---------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+------------------------+---------------------------+---------------------------+------------+
| 4319119f2850cd6a | started | zmaciel-f9fbb-master-0 | https://10.36.250.63:2380 | https://10.36.250.63:2379 | false |
| 88d623c9d503fcb1 | started | zmaciel-f9fbb-master-2 | https://10.36.250.77:2380 | https://10.36.250.77:2379 | false |
+------------------+---------+------------------------+---------------------------+---------------------------+------------+
````
f. Remove any old secrets from the unhealthy ETCD member that was removed:
````
$ oc get secrets -n openshift-etcd | grep zmaciel-f9fbb-master-1
etcd-peer-zmaciel-f9fbb-master-1 kubernetes.io/tls 2 2d9h
etcd-serving-metrics-zmaciel-f9fbb-master-1 kubernetes.io/tls 2 2d9h
etcd-serving-zmaciel-f9fbb-master-1 kubernetes.io/tls 2 2d9h
$ oc delete secrets etcd-peer-zmaciel-f9fbb-master-1 -n openshift-etcd
secret "etcd-peer-zmaciel-f9fbb-master-1" deleted
$ oc delete secrets etcd-serving-zmaciel-f9fbb-master-1 -n openshift-etcd
secret "etcd-serving-zmaciel-f9fbb-master-1" deleted
$ oc delete secrets etcd-serving-metrics-zmaciel-f9fbb-master-1 -n openshift-etcd
secret "etcd-serving-metrics-zmaciel-f9fbb-master-1" deleted
````
With the above commands you will remove the peer, serving, and metrics secrets.
g. Obtain the machine configuration for the unhealthy member:
````
$ oc get machines -n openshift-machine-api -o wide
NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE
zmaciel-f9fbb-master-0 Running 2d9h zmaciel-f9fbb-master-0 vsphere://420156fd-d64a-ac6c-fcd0-0bb30524d146 poweredOn
zmaciel-f9fbb-master-1 Running 2d9h zmaciel-f9fbb-master-1 vsphere://4201f3ac-b3d4-384d-ec97-f3daf96b062f poweredOff
zmaciel-f9fbb-master-2 Running 2d9h zmaciel-f9fbb-master-2 vsphere://4201243a-a689-57be-50a1-6cc62aad599f poweredOn
zmaciel-f9fbb-worker-52tds Running 2d9h zmaciel-f9fbb-worker-52tds vsphere://4201bf12-613c-dda2-b877-34b504fd7622 poweredOn
zmaciel-f9fbb-worker-nxhw8 Running 2d9h zmaciel-f9fbb-worker-nxhw8 vsphere://4201f344-8f77-d579-e5cc-dc33d05ac7f7 poweredOn
````
h. Save the machine configuration to a file:
````
$ oc get machines zmaciel-f9fbb-master-0 -n openshift-machine-api -o yaml > new-master-machine.yml
````
i. Edit the new-master-machine.yml file that was created in the previous step to assign a new name and remove unnecessary fields.
When editing the file, you must remove the following parameters:
Status section;
````
status:
addresses:
- address: 10.36.250.63
type: InternalIP
- address: fe80::fac:27e1:13f5:1645
type: InternalIP
- address: zmaciel-f9fbb-master-0
type: InternalDNS
lastUpdated: "2021-04-23T00:55:05Z"
nodeRef:
kind: Node
name: zmaciel-f9fbb-master-0
uid: 8d3e0a21-c41d-4b4e-9910-59a364bb6008
phase: Running
providerStatus:
conditions:
- lastProbeTime: "2021-04-20T16:35:49Z"
lastTransitionTime: "2021-04-20T16:35:49Z"
message: Machine successfully created
reason: MachineCreationSucceeded
status: "True"
type: MachineCreation
instanceId: 420156fd-d64a-ac6c-fcd0-0bb30524d146
instanceState: poweredOn
````
spec.providerID;
````
spec:
metadata: {}
providerID: vsphere://420156fd-d64a-ac6c-fcd0-0bb30524d146
````
metadata.annotations;
````
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
annotations:
machine.openshift.io/instance-state: poweredOn
...
````
metadata.generation;
````
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
...
generation: 2
...
````
metadata.resourceVersion;
````
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
...
resourceVersion: "871091"
...
````
metadata.uid;
````
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
...
uid: 310d6108-b46c-4d3c-a61e-95fa3f2ad07a
...
````
Once complete, you will need to set two new parameters in the file:
Change the metadata.name field to a new name:
````
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
...
name: zmaciel-f9fbb-master-3
...
````
Update the metadata.selfLink:
````
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
...
selfLink: /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machines/zmaciel-f9fbb-master-3
...
````
j. Delete the machine of the unhealthy member:
````
$ oc delete machine zmaciel-f9fbb-master-1 -n openshift-machine-api
machine.machine.openshift.io "zmaciel-f9fbb-master-1" deleted
````
At this point, you will lose communication with the API for a few seconds.
k. Verify that the machine was deleted:
````
$ oc get machines -n openshift-machine-api -o wide
NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE
zmaciel-f9fbb-master-0 Running 2d9h zmaciel-f9fbb-master-0 vsphere://420156fd-d64a-ac6c-fcd0-0bb30524d146 poweredOn
zmaciel-f9fbb-master-2 Running 2d9h zmaciel-f9fbb-master-2 vsphere://4201243a-a689-57be-50a1-6cc62aad599f poweredOn
zmaciel-f9fbb-worker-52tds Running 2d9h zmaciel-f9fbb-worker-52tds vsphere://4201bf12-613c-dda2-b877-34b504fd7622 poweredOn
zmaciel-f9fbb-worker-nxhw8 Running 2d9h zmaciel-f9fbb-worker-nxhw8 vsphere://4201f344-8f77-d579-e5cc-dc33d05ac7f7 poweredOn
````
l. Create the new machine using the new-master-machine.yml file:
````
$ oc apply -f new-master-machine.yml
machine.machine.openshift.io/zmaciel-f9fbb-master-3 created
$ oc get machines -n openshift-machine-api
NAME PHASE TYPE REGION ZONE AGE
zmaciel-f9fbb-master-0 Running 2d9h
zmaciel-f9fbb-master-2 Running 2d9h
zmaciel-f9fbb-master-3 Provisioning 117s
zmaciel-f9fbb-worker-52tds Running 2d9h
zmaciel-f9fbb-worker-nxhw8 Running 2d9h
$ oc get machines -n openshift-machine-api
NAME PHASE TYPE REGION ZONE AGE
zmaciel-f9fbb-master-0 Running 2d9h
zmaciel-f9fbb-master-2 Running 2d9h
zmaciel-f9fbb-master-3 Running 8m34s
zmaciel-f9fbb-worker-52tds Running 2d9h
zmaciel-f9fbb-worker-nxhw8 Running 2d9h
$ oc get nodes
NAME STATUS ROLES AGE VERSION
zmaciel-f9fbb-master-0 Ready master 2d9h v1.20.0+551f7b2
zmaciel-f9fbb-master-2 Ready master 2d9h v1.20.0+551f7b2
zmaciel-f9fbb-master-3 Ready master 2m51s v1.20.0+551f7b2
zmaciel-f9fbb-worker-52tds Ready worker 2d9h v1.20.0+551f7b2
zmaciel-f9fbb-worker-nxhw8 Ready worker 2d9h v1.20.0+551f7b2
````
m. Check that all ETCD pods are working correctly:
````
$ oc get pods -n openshift-etcd | grep -v etcd-quorum-guard | grep etcd
etcd-zmaciel-f9fbb-master-0 3/3 Running 0 82s
etcd-zmaciel-f9fbb-master-2 3/3 Running 0 6m21s
etcd-zmaciel-f9fbb-master-3 3/3 Running 0 2m31s
````
n. If the output from the previous command lists only two pods, you can manually force a redeployment of ETCD:
````
$ oc patch etcd cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$( date --rfc-3339=ns )"'"}}' --type=merge
````
NOTE: The “forceRedeploymentReason” value must be unique, so a time stamp is attached.
o. During the pods redeploy process, Kube-APIServer will redeploy your pods.
````
$ oc get clusteroperator kube-apiserver
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
kube-apiserver 4.7.3 True True False 2d9h
````
Upon completion of the redeploy of the KubeAPIServer pods, your cluster will be available again.
p. Final verification of cluster operators:
````
$ oc get clusteroperators
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
authentication 4.7.3 True False False 9m24s
baremetal 4.7.3 True False False 2d10h
cloud-credential 4.7.3 True False False 2d10h
cluster-autoscaler 4.7.3 True False False 2d10h
config-operator 4.7.3 True False False 2d10h
console 4.7.3 True False False 26h
csi-snapshot-controller 4.7.3 True False False 12m
dns 4.7.3 True False False 2d10h
etcd 4.7.3 True False False 2d10h
image-registry 4.7.3 True False False 2d10h
ingress 4.7.3 True False False 26h
insights 4.7.3 True False False 2d9h
kube-apiserver 4.7.3 True False False 2d10h
kube-controller-manager 4.7.3 True False False 2d10h
kube-scheduler 4.7.3 True False False 2d10h
kube-storage-version-migrator 4.7.3 True False False 26h
machine-api 4.7.3 True False False 2d10h
machine-approver 4.7.3 True False False 2d10h
machine-config 4.7.3 True False False 22m
marketplace 4.7.3 True False False 26h
monitoring 4.7.3 True False False 14m
network 4.7.3 True False False 2d10h
node-tuning 4.7.3 True False False 2d10h
openshift-apiserver 4.7.3 True False False 109m
openshift-controller-manager 4.7.3 True False False 2d9h
openshift-samples 4.7.3 True False False 2d9h
operator-lifecycle-manager 4.7.3 True False False 2d10h
operator-lifecycle-manager-catalog 4.7.3 True False False 2d10h
operator-lifecycle-manager-packageserver 4.7.3 True False False 10m
service-ca 4.7.3 True False False 2d10h
storage 4.7.3 True False False 104m
````
Final Thoughts
This concludes part 2 on OpenShift disaster recovery. Note that all checks mentioned in the article are very important, because with these, you will have the true status of the cluster.
If you encounter a cluster that has lost two master nodes, do not worry, because part 3 in the series will focus on cluster recovery when you lose two masters.
I hope I have contributed to your knowledge.
Categories