Subscribe to our blog

Introduction

Red Hat Advanced Cluster Management for Kubernetes (RHACM) defines two main types of clusters: hub clusters and managed clusters.

The hub cluster is the main cluster with RHACM installed on it. You can create, manage, and monitor other Kubernetes clusters with the hub cluster. The managed clusters are Kubernetes clusters that are managed by the hub cluster. You can create some clusters by using the RHACM hub cluster, and you can also import existing clusters to be managed by the hub cluster. Since the hub cluster manages the cluster fleet, it is vital that there is a business continuity scenario built in so that when an unexpected event causes a hub cluster to fail, the cluster fleet can be managed by a new hub cluster.

The RHACM backup and restore feature, available starting with version 2.5, offers support for building a Disaster Recovery solution to recover the hub cluster when it fails. There is a shortcoming for this feature though: only managed clusters created using the Hive API are automatically connected to the restored hub cluster. Imported managed clusters must be manually reconnected on the new hub cluster.

RHACM 2.7 provides a solution to automatically import managed clusters when restoring on a new hub cluster.

The purpose of this blog is to provide a walk-through on how to enable and make use of the solution available with RHACM 2.7 to automatically import managed clusters on a restore hub cluster operation. Before showing how to use the auto import feature available with RHACM 2.7, let's see why this approach is needed in the first place.

Why imported clusters must be manually reimported after restore

When the backup data is moved to another hub cluster, only Hive managed clusters are automatically connected with the new hub cluster. Hive clusters are managed clusters created on the hub cluster using the Create cluster action available from the Clusters tab in the console.

Managed clusters connected with the initial hub cluster by using the Import cluster action appear as Pending Import when the hub cluster data is restored on a new hub cluster, and the clusters must be manually imported back on the new hub cluster.

Hive managed clusters are automatically connected with the new hub cluster because Hive stores the managed cluster kubeconfig in the managed cluster namespace on the hub cluster, and this is being backed up and restored on the new hub cluster. The import controller updates the bootstrap kubeconfig on the managed cluster using this restored configuration. This information is only available for managed clusters created by using the Hive API and is not available for imported clusters.

The workaround provided with RHACM 2.5 and RHACM 2.6 for reconnecting imported clusters with the new hub cluster is to manually create the auto-import-secret after the restore operation is started. The auto-import-secret must be created on the restore hub cluster in the managed cluster namespace, for each cluster in Pending Import state. This auto-import-secret must use a kubeconfig or token with enough permissions for the import component to start the auto import on the new hub cluster.

For a large number of imported managed clusters, this is a very tedious operation since it is ran manually for each managed cluster. It increases the Recovery Time Objective time and requires the user, who runs the restore operation, to establish access between each managed cluster and a token that can be used to connect with the managed cluster. This token must have a klusterlet role binding or a role with equivalent permissions.

Automatically reconnecting managed clusters with RHACM 2.7

Continue reading the new solution for automatically connecting imported clusters to the new hub cluster by using the ManagedServiceAccount feature, available with the backup and restore component in RHACM 2.7. The following sections show you how to enable this feature with RHACM 2.7 and explain possible limitations.

How the automatic connection works

The backup controller available with RHACM 2.7 uses the ManagedServiceAccount component on the primary hub cluster to create a token for each of the imported managed clusters.

This token is backed up in each managed cluster namespace and is set to use a klusterlet-bootstrap-kubeconfig ClusterRole binding, which allows the token to be used when importing the managed cluster with the auto import secret. The klusterlet-bootstrap-kubeconfig ClusterRole can only get or update the bootstrap-hub-kubeconfig secret, so there is limited access to the managed cluster.

When the activation data is restored on the new hub cluster, the restore controller runs a post restore operation and looks for all managed clusters in the Pending import state. For these managed clusters, it checks if there is a valid token generated by the ManagedServiceAccount and, if found, creates an auto-import-secret by using this token. As a result, the cluster import component tries to reconnect the managed cluster, and if the cluster is accessible, the operation is successful.

Automatic import value

When the hub cluster backup data is restored on a new hub cluster, all managed clusters are automatically connected with the new hub cluster.

Prerequisites

See the following prerequisites to follow along in this blog.

For both active and passive hub clusters:

  • RHACM version 2.7 or later must be installed on your hub cluster. See the following screen capture:

    acm-operator

  • Enable the ManagedServiceAccount component on MultiClusterEngine by editing the MultiClusterEngine resource and setting enabled: true for the managedserviceaccount-preview component. See the following exmaple:

    apiVersion: multicluster.openshift.io/v1
    kind: MultiClusterEngine
    metadata:
    name: multiclusterhub
    spec:
    overrides:
    components:
    - enabled: true
    name: managedserviceaccount-preview
  • Enable the cluster-backup Operator on the hub cluster. Edit the MultiClusterHub resource and set enabled: true for the cluster-backup component. This also installs the OADP operator in the open-cluster-management-backup namespace. See the following example:

    apiVersion: operator.open-cluster-management.io/v1
    kind: MultiClusterHub
    spec:
    overrides:
    components:
    - enabled: true
    name: cluster-backup

    oadp-operator

  • You must create the DataProtectionApplication resource in the open-cluster-management-backup namespace and point to a valid storage location for backups.

Enabling the automatic import feature on active hub cluster

To enable the automatic import feature, set the useManagedServiceAccount property to true when creating the BackupSchedule.cluster.open-cluster-management.io resource on the active hub cluster. See the following example:

apiVersion: cluster.open-cluster-management.io/v1beta1
kind: BackupSchedule
metadata:
name: schedule-acm-msa
spec:
veleroSchedule: 0 */1 * * *
veleroTtl: 240h
useManagedServiceAccount: true

Once the useManagedServiceAccount is set to true, the backup controller will start processing imported managed clusters and for each of them:

  • Creates a ManagedClusterAddon named managed-serviceaccount.

    msa-addon

  • Creates a ManagedServiceAccount resource named auto-import-account and sets the token validity as defined by the BackupSchedule.

    msa-account-1

  • The ManagedServiceAccount resource is processed by the ManagedClusterAddon which triggers on the managed cluster the creation of a token with the same name. This token is pushed back on the hub under the managed cluster namespace.

    Managed Service Account token on managed cluster:

    msa-token-cls

    Note that the token is created only if the managed cluster is accessible. If the managed cluster is not accessible at the time the ManagedServiceAccount is created, the token is created at a later time when the managed cluster becomes available. This hub cluster secret gets backed up.

    Managed Service Account token on hub cluster:

    msa-token-hub

    • For each of the ManagedServiceAccount resources, the backup controller creates a ManifestWork used to setup on the managed cluster, a klusterlet-bootstrap-kubeconfig RoleBinding for the ManagedServiceAccount token. The klusterlet-bootstrap-kubeconfig ClusterRole can only get or update the bootstrap-hub-kubeconfig secret. This role is going to be used in a backup restore post operation, to auto import the managed cluster on the restored hub cluster.

    Managed Service Account role binding on managed cluster:

    msa-role-binding

    Notes:

    • You can disable the automatic import cluster feature at any time by setting the useManagedServiceAccount option to false on the BackupSchedule resource. Removing the property has the same result since the default value is set to false.

      • When you disable the automatic import cluster feature, the backup controller removes the following resources created: ManagedClusterAddon, ManagedServiceAccount and ManifestWork, which in turn will delete the auto import token, on the hub cluster and managed cluster:

        apiVersion: cluster.open-cluster-management.io/v1beta1
        kind: BackupSchedule
        metadata:
        name: schedule-acm-msa
        spec:
        veleroSchedule: 0 */1 * * *
        veleroTtl: 240h
        useManagedServiceAccount: false
    • The ManagedServiceAccount auto-import-account token validity duration is automatically set to be twice the value of veleroTtl, to maximize the chance of the token being valid for all backups storing the token for their entire lifecycle. You can choose to change this value if you want to control how long a token should be valid, but keep in mind that this could result in producing backups with tokens set to expire during the lifecycle of the backup. Use the managedServiceAccountTTL property to change the token TTL:

      apiVersion: cluster.open-cluster-management.io/v1beta1
      kind: BackupSchedule
      metadata:
      name: schedule-acm-msa
      spec:
      veleroSchedule: 0 */2 * * *
      veleroTtl: 120h
      useManagedServiceAccount: true
      managedServiceAccountTTL: 2h

Automatically reconnect imported clusters on restore hub cluster

The backup data is restored on the new hub cluster using a Restore resource, as shown in the following example:

apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Restore
metadata:
name: restore-acm
namespace: open-cluster-management-backup
spec:
cleanupBeforeRestore: CleanupRestored
veleroManagedClustersBackupName: latest
veleroCredentialsBackupName: latest
veleroResourcesBackupName: latest

When the managed cluster backup data is restored on the new hub cluster, the restore controller runs a post restore operation and looks for all managed clusters in Pending import state.

For these managed clusters, it checks whether there is a valid auto-import-account token under the managed cluster namespace on the new hub. If such token is found, the post restore routine creates an auto-import-secret using this token.

As a result, the cluster import component tries to reconnect the managed cluster and if the cluster is accessible the operation is successful.

You should see the following status message for the Restore resource if the post restore operation has created an auto-import-secret secret, triggering the auto import operation for a managed cluster in Pending Import state:

apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Restore
metadata:
name: restore-acm
namespace: open-cluster-management-backup
spec:
cleanupBeforeRestore: CleanupRestored
veleroManagedClustersBackupName: latest
veleroCredentialsBackupName: latest
veleroResourcesBackupName: latest
status:
lastMessage: Velero restores have run to completion
messages:
- Created auto-import-secret for managed cluster (vb-managed-cls-1)
phase: Finished
veleroCredentialsRestoreName: restore-acm-acm-credentials-schedule-20221021133542
veleroManagedClustersRestoreName: restore-acm-acm-managed-clusters-schedule-20221021133542
veleroResourcesRestoreName: restore-acm-acm-resources-schedule-20221021133542

Limitations with the automatic import feature

There are a set of limitations with the above approach which could result in the managed cluster not being auto imported when moving to a new hub. These are the situations that can result in the managed cluster not being imported:

  1. Since the automatic import operation is making use of the cluster import feature using the auto import secret, it is required that the hub is able to access the managed cluster and run the cluster import operation.

  2. Since the auto-import-secret created on restore uses the ManagedServiceAccount token to connect to the managed cluster, the managed cluster must also provide the kube apiserver information. The apiserver must be set on the ManagedCluster resource as in the sample below. Only OCP clusters have this apiserver setup automatically when the cluster is imported on the hub. For any other type of managed clusters, such as EKS clusters, this information must be set manually by the user, otherwise the automatic import feature will ignore these clusters and they stay in Pending Import when moved to the restore hub cluster:

    apiVersion: cluster.open-cluster-management.io/v1
    kind: ManagedCluster
    metadata:
    name: managed-cluster-name
    spec:
    hubAcceptsClient: true
    leaseDurationSeconds: 60
    managedClusterClientConfigs:
    url: <apiserver>
  3. The backup controller is regularly looking for imported managed clusters and it creates the ManagedServiceAccount resource under the managed cluster namespace as soon as such managed cluster is found. This should trigger a token creation on the managed cluster. If the managed cluster is not accessible at the time this operation is executed though, for example the managed cluster is hibernating or is down, the ManagedServiceAccount is unable to create the token. As a result, if a hub backup is run at this time, the backup will not contain a token to auto import the managed cluster.

  4. It is possible for a ManagedServiceAccount secret to not be included in a backup if the backup schedule runs before the backup label is set on the ManagedServiceAccount secret. ManagedServiceAccount secrets don't have the cluster.open-cluster-management.io/backup label set on creation. For this reason, the backup controller looks regularly for ManagedServiceAccount secrets under the managed clusters namespaces, and adds the backup label if not found.

  5. If the auto-import-account secret token is valid and is backed up but the restore operation is run at a time when the token available with the backup has already expired, the auto import operation fails. In this case, the restore.cluster.open-cluster-management.io resource status should report the invalid token issue for each managed cluster in this situation.

Conclusion

This blog describes how to use the cluster backup and restore operator available with RHACM 2.7 to automatically reconnect imported managed clusters to the new hub after a restore operation. It shows how to enable the automatic connect feature and how it works.

References


About the authors

Browse by channel

automation icon

Automation

The latest on IT automation that spans tech, teams, and environments

AI icon

Artificial intelligence

Explore the platforms and partners building a faster path for AI

open hybrid cloud icon

Open hybrid cloud

Explore how we build a more flexible future with hybrid cloud

security icon

Security

Explore how we reduce risks across environments and technologies

edge icon

Edge computing

Updates on the solutions that simplify infrastructure at the edge

Infrastructure icon

Infrastructure

Stay up to date on the world’s leading enterprise Linux platform

application development icon

Applications

The latest on our solutions to the toughest application challenges

Original series icon

Original shows

Entertaining stories from the makers and leaders in enterprise tech