lightning-g47c2d679b_1920

Linus Torvalds once wrote, "Only wimps use tape backup: real men just upload their important stuff on ftp, and let the rest of the world mirror it." This quote has not aged well for a number of reasons: Real men? FTP? Tape backup? Well, actually that last one isn't so out-dated as you may think. Regardless of the type of media involved (and yes, tape still holds a significant—though shrinking—share of the world's archive data) backups still provide a peace of mind that trusting fully in a cloud service just can't replace.

With OpenShift Virtualization bringing old workloads into the GitOps future, it can seem like the old way of running servers is no more, but it still makes sense to keep backups of those VMs while they await their turn at modernization. To that end, OpenShift API for Data Protection (OADP) provides a plugin to properly manage virtual machines as part of its cluster backup and restore capabilities.

This article will demonstrate the setup of OADP to be used for VM backups, including snapshots of the VM disks then walk through the basic use case of backing up and restoring a NameSpace containing a running VirtualMachine resource.

Storage Environment

With the various storage options available to use with OpenShift, it is difficult to design a demonstration that will work for most environments. The examples in this blog are based on OpenShift Data Foundation 4.10.3 (ODF) and OpenShift API for Data Protection 1.0.3 (OADP).

Key requirements from the standpoint of backing up VirtualMachines include object storage and a snapshot capable storage class. Similar to Amazon's S3 buckets, the object store is used to hold tarred backups of object manifests from the cluster. This allows OADP to back up VirtualMachines and DataVolumes among other resource types. To back up PersistentVolumes (PVs), the storage back end should provide a StorageClass capable of RWX PVCs in either block or filesystem mode and an associated VolumeSnapshotClass capable of providing snapshots of those PVCs. For storage systems lacking a snapshot capability, OADP does provide for manual copies of PVs through Restic.

Please note: Because OADP relies on snapshots of VM PVCs, it is recommended to install the QEMU guest agent on any virtual machines that require online backups. This allows the guest agent to quiesce in-flight data in the guest OS during the snapshot process, and avoid possible data corruption.

This blog employs ODF on an external Ceph cluster. ODF provides all the requirements mentioned above. After completing the ODF installation, make sure to annotate the ceph-rbd StorageClass to be the default class:

oc annotate sc ocs-external-storagecluster-ceph-rbd storageclass.kubernetes.io/is-default-class="true" 

With ODF and OADP working together to provide both snapshots of PVCs and backups of Kubernetes objects, we are able to provide an OpenShift friendly, scheduled backup and recovery service for virtual machines in the cluster.

At this point, it is a good idea to install OpenShift Virtualization if it has not already been installed, as some OADP options only appear if it is already present.

Set up OADP

Once storage has been installed and configured, it is time to set up the OpenShift APIs for Data Protection. Following the OADP documentation, there are five essential tasks:

  • Install the OADP Operator
  • Set up the VolumeSnapshotClass for PVC backups
  • Create an object store Bucket for resource backups
  • Create a Secret to access the bucket
  • Instantiate OADP with a Data Protection Application

Install the OADP Operator

Installing the OADP Operator from OperatorHub is straightforward. The default settings are fine; the operator will create the openshift-adp namespace.

Set up the Volume Snapshot Class

Locate the VolumeSnapshotClass that corresponds to the default StorageClass that will be used for VMs. In this example, we set ocs-external-storagecluster-ceph-rbd as the default StorageClass and ocs-external-storagecluster-rbdplugin-snapclass is the corresponding VolumeSnapshotClass.

There are two important changes that need to be made here. First, the label velero.io/csi-volumesnapshot-class: "true" must be added to the VolumeSnapshotClass so OADP (specifically Velero) can identify this as the snapshot location for PVCs.

oc label vsclass ocs-external-storagecluster-rbdplugin-snapclass velero.io/csi-volumesnapshot-class=true

Second, for this to be a proper backup solution the snapshots must persist even if the VolumeSnapshot objects are deleted. If not, deleting a namespace will completely lose all PVCs ever backed up in it.

To protect VolumeSnapshots, patch the corresponding VolumeSnapshotClass, setting deletionPolicy to Retain:

oc patch volumesnapshotclass ocs-external-storagecluster-rbdplugin-snapclass --type=merge -p '{"deletionPolicy": "Retain"}'

Create a Bucket for Backups

The next step is to create an ObjectBucketClaim. This can most readily be accomplished through the OpenShift Console by navigating to Storage -> ObjectBucketClaims and clicking "Create Object Bucket Claim". Name the claim something informative like oadp-backups and select the local cluster's openshift.storage.noobaa.io as the StorageClass.

obc_creation

Once the ObjectBucketClaim is created, scroll down to the "Object Bucket Claim Data" section and click "Reveal Values" to gather information needed in the next two steps. This data is also available from the command line by looking for a Secret and a ConfigMap with the same name as your ObjectBucketClaim.

Create a Secret for OADP to Access the Bucket

OADP will need access to the bucket just created via a Secret in the openshift-adp namespace. For use by Velero, the access keys from the ObjectBucketClaim's Secret must be formatted like an S3 credentials file, which looks like the following:

[default]
aws_access_key_id=<ACCESS_KEY_ID>
aws_secret_access_key=<SECRET_ACCESS_KEY>

The following script will generate the credentials file and then create a secret from it in the appropriate namespace:

NAMESPACE=openshift-adp
OBCNAME=oadp-backups

export ACCESS_KEY=$(oc -n $NAMESPACE get secret $OBCNAME -o json | jq -r '.data.AWS_ACCESS_KEY_ID|@base64d')
export SECRET_KEY=$(oc -n $NAMESPACE get secret $OBCNAME -o json | jq -r '.data.AWS_SECRET_ACCESS_KEY|@base64d')

cat << EOF > ./credentials-velero
[default]
aws_access_key_id=${ACCESS_KEY}
aws_secret_access_key=${SECRET_KEY}
EOF

oc create secret generic cloud-credentials -n openshift-adp --from-file cloud=credentials-velero

Note: the preceding script requires a recent version of jq to handle the @base64d filter. If yours is incapable of inline decode, you will need to decode manually with base64 -d.

Create the Data Protection Application

The last step in setting up OADP is to create a DataProtectionApplication (DPA). This tells Velero where to find its backup store, how PVCs should be snapshotted, and what kind of plugins to install. In the case of our ODF backed OADP install, the DPA will need the following features:

  • An S3 URL from the ObjectBucketClaim's ConfigMap mentioned above
  • The cloud-credentials secret created in the last section
  • The internal name of the Bucket that backs the ObjectBucketClaim above
  • Activation of the CSI Velero plugin to enable CSI based snapshots
  • Activation of the kubevirt plugin

First, use oc describe to nicely print all the helpful information encapsulated in the ObjectBucketClaim's ConfigMap:

oc -n openshift-adp describe cm oadp-backups

Name: oadp-backups
Namespace: openshift-adp
Labels: app=noobaa
bucket-provisioner=openshift-storage.noobaa.io-obc
noobaa-domain=openshift-storage.noobaa.io
Annotations: <none>

Data
====
BUCKET_HOST:
----
s3.openshift-storage.svc
BUCKET_NAME:
----
oadp-backups-35f2daff-c363-420c-97ad-b0d2ed8fafbb
BUCKET_PORT:
----
443
BUCKET_REGION:
----

BUCKET_SUBREGION:
----


BinaryData
====

Events: <none>

We create the S3 URI by joining the BUCKET_HOST and BUCKET_PORT into a URL like: https://s3.openshift-storage.svc:443.

Much like ServiceClass provisioned PersistentVolumes have a machine generated name that resembles that of their corresponding PersistentVolumeClaim, ObjectBuckets have a machine generated name referred to in their corresponding ObjectBucketClaim. The BUCKET_NAME printed here will be needed below.

Constructing the DPA from the gathered information will yield something like the following:

apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
name: velero-sample
namespace: openshift-adp
spec:
backupLocations:
- velero:
config:
insecureSkipTLSVerify: "true"
profile: default
region: minio
s3ForcePathStyle: "true"
s3Url: https://s3.openshift-storage.svc:443
credential:
key: cloud
name: cloud-credentials
default: true
objectStorage:
bucket: oadp-backups-35f2daff-c363-420c-97ad-b0d2ed8fafbb
prefix: velero
provider: aws
configuration:
restic:
enable: true
velero:
defaultPlugins:
- openshift
- aws
- csi
- kubevirt

Apply this yaml to the cluster, and the OADP operator will create a corresponding BackupStorageLocation:

oc -n openshift-adp get BackupStorageLocation

NAME PHASE LAST VALIDATED AGE DEFAULT
velero-sample-1 Available 17s 10m true

Note the name of this BackupStorageLocation as you will need it momentarily when creating a Backup.

Test Backup and Restore of a Namespace with VMs

To set up a quick test for OADP to back up and restore, create a NameSpace. In this example we use "wonderland". Within the NameSpace, create a VirtualMachine using the available templates. Here, we create a RHEL9 VM and give it the name rhel9-white-rabbit. Log in to the VM and touch a file in the default user's home directory to help demonstrate persistence. As the provided RHEL guest images all include the QEMU guest agent by default, nothing more needs to be done to make this VM ready for backups by live snapshot.

Make a Backup

Backups in OADP are custom resources representing targets to back up. When the targets in the Backup CR are backed up or snapshotted, the status field is updated with success or failure and time stamps.

An example Backup for the wonderland NameSpace looks like:

apiVersion: velero.io/v1
kind: Backup
metadata:
generateName: wonderland-
namespace: openshift-adp
spec:
includedNamespaces:
- wonderland
snapshotVolumes: true
storageLocation: velero-sample-1
ttl: 720h0m0s

Note the use of generateName instead of name in the metadata. This will cause each backup to create a distinct object with a randomized name. Additionally, note that the storageLocation should be populated with the name of the BackupStorageLocation mentioned above. According to the time to live (ttl) field, this backup will remain on the system for 30 days.

Apply the Backup yaml to the cluster and observe it by checking its status.

oc create -f backup-wonderland.yaml 

backup.velero.io/wonderland-jlcpk created

Once the backup completes, its Phase should show as completed:

oc -n openshift-adp describe backup wonderland-jlcpk

[ Output truncated for readability ]

. . .

Status:
Completion Timestamp: 2022-08-04T18:47:21Z
Expiration: 2022-09-03T18:46:40Z
Format Version: 1.1.0
Phase: Completed
Progress:
Items Backed Up: 81
Total Items: 81
Start Timestamp: 2022-08-04T18:46:40Z
Version: 1

Inspecting Backups

With the Amazon Web services aws tool, it is possible to perform aws s3 commands against ObjectStorageBuckets to better understand the OADP (and Velero) backup process. The Amazon Web Services aws command-line tool may be installed from an RPM found by visiting https://aws.amazon.com/cli/ .

The following commands will set up a local port forward to the S3 service within the cluster, and alias a new command s3 that encapsulates all the information needed to access the ObjectStorageBucket created earlier.

    oc port-forward -n openshift-storage service/s3 10443:443 &
ACCESS_KEY=$(oc -n openshift-adp get secret oadp-backups -o json | jq -r '.data.AWS_ACCESS_KEY_ID|@base64d')
SECRET_KEY=$(oc -n openshift-adp get secret oadp-backups -o json | jq -r '.data.AWS_SECRET_ACCESS_KEY|@base64d')
alias s3='AWS_ACCESS_KEY_ID=$ACCESS_KEY AWS_SECRET_ACCESS_KEY=$SECRET_KEY aws --endpoint https://localhost:10443 --no-verify-ssl s3'
s3 ls

[ insecure request warnings truncated ]

2022-08-04 19:05:32 oadp-backups-35f2daff-c363-420c-97ad-b0d2ed8fafbb
2022-08-04 19:05:32 first.bucket

Let's explore that oadp-backups bucket further. Here we add s3:// to denote a S3 URL, and run a recursive copy:

s3 cp s3://oadp-backups-35f2daff-c363-420c-97ad-b0d2ed8fafbb/ . --recursive

This will yield the following directory and file structure:

tree

.
└── velero
└── backups
└── wonderland-jlcpk
├── velero-backup.json
├── wonderland-jlcpk-csi-volumesnapshotcontents.json.gz
├── wonderland-jlcpk-csi-volumesnapshots.json.gz
├── wonderland-jlcpk-logs.gz
├── wonderland-jlcpk-podvolumebackups.json.gz
├── wonderland-jlcpk-resource-list.json.gz
├── wonderland-jlcpk.tar.gz
└── wonderland-jlcpk-volumesnapshots.json.gz

Items of interest include the resource list, logs, and the tar archive of all the resources in JSON format:

zcat wonderland-jlcpk-resource-list.json.gz | jq .

zless wonderland-jlcpk-logs.gz

tar tvf wonderland-jlcpk.tar.gz

At this point, backing up the underlying storage to offsite tape is an option; see your storage administrators for details, as getting from OADP to actual tape is beyond the scope of this blog.

Simulate Disaster

Now it's time to introduce a little controlled chaos into the environment. Say we decide to clean up unused namespaces, and delete wonderland. Some time later, the admin of that namespace comes around and says, "I was using that. Put it back please!" We know we have a valid backup of the namespace, so this request can easily be answered by initiating a Restore.

Restore from Backup

The Restore can be created in one of three ways; through a form in the OpenShift console, using the velero command line client, or by creating directly from a yaml manifest. In any case, the effect is the same—a Restore custom resource is created.

apiVersion: velero.io/v1
kind: Restore
metadata:
name: restore-wonderland
namespace: openshift-adp
spec:
backupName: wonderland-jlcpk
excludedResources:
- nodes
- events
- events.events.k8s.io
- backups.velero.io
- restores.velero.io
- resticrepositories.velero.io
restorePVs: true

The important variables here are the backupName which corresponds to the generated name of the Backup we created earlier, and the restorePVs which must be set to true to get the VM's disks back from snapshot.

Let's check on our VM:

oc -n wonderland get vm

NAME AGE STATUS READY
rhel9-white-rabbit 3m21s Running True

That appears to be in order. Pulling the ssh command from the VM's Details page (or by looking up the rhel9-white-rabbit-ssh-service to get the proper NodePort), we are able to log in and verify the file we left in cloud-user's home directory earlier.

Note: Because the service is of type NodePort, it is highly likely that the port number will change post restore.

Conclusion

We hope this article will prove useful to OpenShift VM admins who could use a little peace of mind when dealing with legacy services where backups may be critical. For more information about the capabilities of OADP, including setting up a backup schedule and managing old backups, see the OADP documentation.