Azure Red Hat OpenShift (ARO) and Red Hat OpenShift Service on AWS (ROSA) are OpenShift managed service offerings from Red Hat that run respectively on Azure and AWS clouds. As managed services, their value proposition is to reduce the administrative burden on the platform teams that run them.That being said, today, the way these services are provisioned is for the most part CLI-driven. Wouldn't it be nice to have a declarative way to provision ARO and ROSA clusters in a manner that was GitOps friendly?
In this article we will explore how this can be accomplished.
These following are the requirements that we are trying to achieve:
- Declarative creation of ARO/ROSA clusters.
- Secure handling of the returned OpenShift admin credentials.
- Registration of the newly provisioned cluster to Red Hat Advanced Cluster Manager (ACM), if available, or the Multi Cluster Engine (MCE) operator.
- Seeding of the newly provisioned cluster with OpenShift GitOps, which is Red Hat supported distribution for Argo CD, in order to bootstrap the day-two configuration process.
How do we turn a CLI-driven process into a declarative one? First, the CLI’s associated with each managed offering (rosa and az) simply make one or more calls to an API behind the scenes. For ROSA, the API is OpenShift Cluster Manager (OCM), a service managed by Red Hat, and for ARO, the API is directly against the Azure API.
Because we are making use of an API driven approach, we have a good chance of being able to create declarative automation. To the extent of my knowledge, these are the options that are available:
- Create an operator that interacts directly with the API. An operator for ROSA has been actually created and it can be found here.
- Use a Terraform controller (an operator that can turn any Terraform script into declarative configuration) to wrap terraform scripts that can provision either ROSA or ARO (both Flux [here] and ArgoCD [here] have Terraform controllers).
- Use Crossplane to define the automation needed to spin up ROSA and ARO clusters.
The operator approach is a good one, provided that the options exposed by the operator closely track the options found within the CLI (or underlying API) and that the operator is maintained (if not supported). Normally, these conditions are met only when the organization creating the service also publishes the operator. Additionally, no operator currently exists for ARO.
The Terraform controller approach would work as there is a supported Terraform provider both for the OCM APIs and for the Azure ARO APIs (currently via the general-purpose AzAPI provider). That said, the Terraform controllers do not offer a very good level of abstraction since the manifest offered by these controllers is very generic: it includes just a pointer to a Terraform script and some input variables.
Crossplane offers both a supported solution (from Upbound) and the ability to build structured interfaces. On top of that, one of Crossplane's unique features is that it has a way to manage the admin credentials returned by services that are created. In our case, admin credentials for ROSA and ARO will be returned once the clusters have been created.
I chose to go with Crossplane mostly because of the consideration that if an organization is working with ARO and ROSA, there is a good chance that they are clearly invested in the cloud and are probably automating other cloud services. Crossplane has a strong offering for automating cloud infrastructure in a declarative way. So, it represents a method for capturing all of the cloud related automation with a single tool (Crossplane can even automate beyond cloud services).
The high-level design of how the automation being presented in this article is depicted in the diagram below:
The yellow squares represent Custom Resources (CRs) that exist in a Git repository and are deployed by our GitOps Operator. The top yellow square represents a CR that describes either a ROSA or ARO cluster. The Crossplane operator interprets it and creates either a ROSA or ARO cluster.
During the cluster creation process, credentials are returned and the Crossplane operator stores them in HashiCorp Vault. Subsequently the cluster can be registered to ACM/MCE, pointing to the extracted secret.
Crossplane high-level concepts
Before we continue, let’s review some of the high-level concepts in Crossplane.
Crossplane providers are little controllers in the Crossplane ecosystem that know how to interact with a set of APIs (AWS, Azure, etc). Because cloud providers have a very large API surface, the Crossplane providers for cloud services have been modularized into many smaller providers, each focused on a specific subset of APIs.
Compositions are ways to describe a set of resources (for example network configurations, VM configurations) that need to be created and have high cohesion. In a way, compositions are reminiscent of Helm templates, but with one interesting additional feature: you can use the output of a created resource as an input for another resource in the same composition. This is very common for the way cloud provider APIs work.
Claims are namespaced-scoped resources that tenants can create to request the execution of a composition. Claims are instances of a new CRD that Crossplane automatically creates when a composition is defined.
So, in the above design diagram, the ARO and ROSA CRs are Crossplane claims. The diagram does not show the complexity of the compositions beyond those claims.
While it would have been great to create all the needed cloud resources with Crossplane both for ARO and ROSA, this was not possible for reasons which will be explained later. For now, suffice it to say that in those situations, we fell back to using Terraform powered by the Crossplane Terraform provider.
The following are a list of requirements needed to be able to stand up the design we explained above:
- An OpenShift cluster with ACM or MCE installed.
- A set of credentials with enough permissions to provision either ROSA or ARO. Follow the instructions described later on to create Kubernetes Secrets with the appropriate credentials.
- An instance of HashiCorp Vault configured to allow OpenShift pods to authenticate against the instance. In particular, this setup expects the default Service Account in the vault-admin namespace to have elevated privileges within Vault, and in particular, to be able to create Secret Engines and Kubernetes authentication roles.
The next few paragraphs describe the steps needed to install this automation. We assume that you are familiar with GitOps tools (we use Argo CD in this case) and we will reference a git repository with all of the automation included as it would be impossible to show every single manifest.
All of the configurations can be found within this repository. Before attempting to run this automation, make sure to create the needed Secrets as described in the repository README. In a real scenario, these credentials would be likely sourced from a secret management service.
After the prerequisites are met, we can deploy Crossplane. Crossplane consists of several components. To manage the dependencies between these components, it is recommended that the following four (4) Argo CD Applications by created:
- crossplane-operator - This Argo CD Application deploys the crossplane operator itself, plus it configures the External Secret Store with the Crossplane Vault plugin. The External Secret Store feature allows for the storing of credentials on Secret Stores. Currently, Vault and Kubernetes Secrets are supported as External Secret Stores. Luckily for us, Vault is exactly what we need. Both the primary Crossplane operator and the Vault plugin are deployed as Helm charts. You can find the configuration here. One thing to keep in mind is that all Crossplane pods run with a static non root user, so we need to provide the nonroot-v2 SecurityContextConstraint in order for them to start correctly.
- crossplane-providers - This Argo CD Application deploys all the Crossplane providers that are needed. Multiple providers are required as described previously as the Crossplane cloud providers have been modularized. In addition to the cloud modular providers, the terraform provider is also required. For ARO/Azure, we need the network and authorization providers. For Rosa/AWS we need the EC2 provider. The entire configuration can be seen here.
- crossplane-provider-config - This Argo CD Application is used to set several provider-specific configurations. In our case, for each provider, we need to configure two items: the credentials used to access the cloud API and the use of Vault as the store for any returned secrets. For the Crossplane Terraform provider, it is possible to specify properties that will be shared by all of the Terraform scripts run by this provider. This configuration can be used to configure the Terraform providers and their credentials. By using this approach, we centrally control the credentials and avoid having to repeat that configuration in every terraform script. The details for this Application can be found here.
- crossplane-vault - This Argo CD Application configures Vault for Crossplane as it creates a KV2 secret engine where Crossplane can store secrets. It also creates a role for the Kubernetes authentication engine and relative policy to provide proper access to the kv2 Secrets Engine. In order to configure Vault declaratively, we use the vault-config-operator. The associated Argo CD Application can be found here.
This following diagram summarizes the configuration:
For this architecture, we assume that cloud credentials are deployed manually . In a more realistic scenario, cloud credentials would be likely retrieved from Vault.
Creating the ARO/ROSA Compositions
Now that Crossplane is in place, we can create the Compositions that will contain all the manifests needed to create ARO and ROSA clusters.
A composition works shares similar traits to a Helm chart in the fact that it allows for the bundling of multiple resources with high cohesion. The templating capability of a Composite Resource is limited in comparison to what we can do with Helm charts. However, using composite resources offers two significant benefits that Helm charts cannot provide:
- A new CRD is automatically created for us representing the Composition, known as XRDs. This makes the use of a composition more ergonomic: instead of values files when using Helm, we express the parameters of the template as fields of this new CRD.
- A claim is optionally created for us. As described previously, claims enable multi-tenancy by allowing one to request the composition as a service, while not having to give tenants access to the highly-privileged credentials needed to provision the resources behind that claim. In our case, we could use this feature to enable a ARO/ROSA aaS capability for our tenants, if we chose to do so.
The compositions defined for ARO and ROSA can be found respectively here and here . These compositions address simple use cases representing an ARO and ROSA deployment. More advanced use cases can be developed using these starting points.
Once these Compositions are deployed, we can simply create clusters by defining a claim.
This following represents a claim for an ARO cluster:
And, similarly for a ROSA cluster:
As you can probably see, both ARO and ROSA have some form of network prerequisites that can be met by creating Crossplane resources from the relative cloud provider. The bulk of the work, however, is facilitated by the Terraform provider. The following diagram represents this type of situation:
In green, we can see the network requirements for this particular ROSA deployment. Within the red rectangle, we have the Workspace manifest representing the terraform script needed to provision the cluster. On the left, rosa-decl is the claim (possibly created by a tenant) and rosa-decl-qcv46 is the corresponding XRD instance that drives the cluster creation (following a similar pattern that exists between a Kubernetes PersistentVolumeClaims and PersistentVolumes).
Let’s discuss why we needed to use a Terraform script to provision ARO and ROSA clusters.
ARO: while ARO is a primitive API in Azure, Crossplane does not yet have a provider that covers it. When falling back to Terraform, we realized that ARO also is not covered by azurerm, the standard Hashicorp Azure Terraform provider. So, we had to use the azapi Terraform provider, which is supported by Microsoft and covers all of the Azure APIs.
ROSA: ROSA provisioning is driven by OCM, not by AWS. At the present moment, there is no Crossplane provider for OCM. So, we had to find another way. We could have used the open source OCM operator. But, in order to have better support and for symmetry with the ARO provisioning process, we used the rhcs terraform provider, which is supported by RedHat.
Creating ARO/ROSA Clusters
At this point, we are finally ready to create our ARO and ROSA clusters. We can simply create the previously described manifests in a namespace and the provisioning will start.
While the clusters are coming up, let’s discuss a few things.
First, we have instructed Crossplane to store the resulting admin credentials as Secrets within Vault. This is accomplished in the Workspace manifest with this fragment:
If everything succeeds, we should see the following in Vault for the ARO deployment:
Now that the admin secrets are stored in Vault, we can extract them as a Kubernetes secret and use them to trigger the cluster auto-import-secret cluster registration approach to import the newly created cluster to ACM.
This Helm chart demonstrates how this can be accomplished, which assumes that the auto-import-secret already exists.
Included within this Helm chart are several ManifestWork resources that push baseline configurations to the newly provisioned cluster to kickstart the Argo CD based day two configurations.
Once that is completed, we have a new cluster that is part of the fleet of clusters that our ACM hub manages,is configured, and is ready to be used.
The architectures of the clusters demonstrated within this article represent basic examples. In a real deployment, other environmental constraints, including specific network requirements may not necessarily map to these examples (for instance, public endpoints are utilized where they may be forbidden in certain environments). The examples included are designed to be baselines for which they can be customized and extended as necessary to meet the needs of the targeted deployment.
In this article, we presented an approach to declaratively creating ARO and ROSA clusters. Neither ARO or ROSA support this capability out of the box making it difficult to work with in a fully declarative environment.
That said, with the help of Crossplane and a good dose of Terraform, we were able to design a fully declarative approach for provisioning ARO and ROSA clusters.
Along the way, we learned a bit about Crossplane: how Crossplane compositions compare to Helm charts, how we can use Crossplane to store admin credentials in Vault, and how Crossplane supports multi-tenancy via the concept of Claims. These are all good capabilities that can turn out to be useful not just when working with ARO and ROSA clusters, but with any cloud resource.
We also saw how to register the newly-created clusters to ACM and kick start the day two configurations.
Wrapping a Terraform script into a Kubernetes controller (the Crossplane Terraform provider in our case), is the weakest part of our design. It is hard to troubleshoot issues and at the same time, there is a certain level of fragility inherent in wrapping imperative automation with a declarative approach. This situation could improve if Crossplane providers for ARO and ROSA were available. For ARO, the ARO APIs should soon become available for the azurerm Terraform provider; this should translate in having ARO supported in Crossplane as the Crossplane providers for Azure are derived from the azurerm Terraform Provider. For ROSA, the development of a Crossplane provider for OCM will need to occur and thus far, no efforts towards this goal have been initiated.
Making it easy to provision ready-to-use clusters is always useful, especially if an organization is trying to create a cluster as a service capability. We leave the last logical step in this direction as an exercise to the reader: create a RedHat Developer Hub (RHDH) scaffolder template to provision ARO and ROSA clusters using a nice UI.