At Red Hat, we have several Communities of Practice (CoP). A Community of Practice is a group of individuals that share common interests and form a consensus that benefit the greater community at large.
In the Container CoP, we have a group of individuals developing and publishing operators for Kubernetes and OpenShift.
We have been working in this space for several years now, and given the adoption by customers and the community, this is a good time to summarize the operators we maintain and how they can be used to facilitate adopting OpenShift.
Here is a table summarizing the operators maintained by the Container CoP:
This operator syncs groups with several IDP providers.
This operator applies configurations to Namespace/Users/Groups based on label and annotation selectors. It helps manage multitenant environment at scale.
This operator enhances the experience around TLS certificates.
This operator supports declarative configuration workflows Hashicorp Vault
An operator to apply patches to Kubernetes objects in a declarative way.
This operator manages egress IPs, fully automating the IPAM process.
This operator creates keepalived-based self-hosted load balancers based on LoadBalancer and/or ExternalIP Kubernetes Service definitions.
This operator configures a global load balancer based on the status of the applications and the clusters that need to be load balanced.
This operator expands volumes when the used storage reaches a certain threshold.
This operator allows you to scale up nodes before you run out of capacity.
This simplifies the collection and upload of must-gather info.
It’s worth reminding that these operators are not supported by Red Hat and that the Container Community of Practice provides best effort support. And with that disclaimer out of the way, let's examine these operators one by one.
The group-sync-operator automates the synchronization of groups from several IDPs into OpenShift groups. Currently the following IDPs are supported: Azure, GitHub, GitLab, LDAP, Keycloak, Okta.
Being able to synchronize groups is a fundamental capability needed to setup a multitenant deployment of OpenShift. In particular, it allows for the assignment of RBAC permissions to users.
With the group-sync-operator, you can declaratively configure how groups should be obtained from the various identity providers and then the operator takes it from there, enforcing the synchronization over time.
If you own a fleet of OpenShift clusters, you can deploy this operator in each cluster, point it to your IDP, and you will have groups propagated to every instance of your fleet.
OpenShift 4.10 introduced a feature that dynamically creates groups based on OIDC claims for users authenticating through the OpenID Connect Identity Provider. While this feature provides an out of the box option, it does have limitations with regards to the identity providers that can be leveraged as well as the ways in which groups can be customized.
As one of the oldest operators produced by the CoP, the group-sync-operator has gained maturity through its use by the community and is fairly feature complete. Continued hardening and identification of identity providers are being evaluated based on interest by the community.
When building a multi-tenant deployment of Kubernetes or OpenShift, each tenant namespace should be configured with a set of initial settings before it can actually be safely used by a tenant. Configurations typically include:
- RBAC permissions
- Limit Ranges
- Egress network policies
- Network Policies
- Namespace-level service mesh settings
- Egress IPs
- Pull secrets and other credentials.
To enforce these configurations and customize them on a tenant and even namespace level, you can use the namespace-configuration-operator.
This operator allows the administrator to select a subset of namespaces using labels and annotations and to declaratively apply configurations to the selected subset.
With this operator you can manage namespace configurations at scale. For example if you need to update a namespace configuration, the change will be automatically rolled out to all selected namespaces.
This operator can be used in conjunction with the group-sync-operator to fully automate assigning permissions to OpenShift users. An end to end example of how to do that can be found in this blog post.
- Ability to inject TLS secrets in OpenShift routes
- Ability to convert PEM-formatted TLS certificates to java consumable truststores and keystores.
- Ability to raise prometheus alerts when certificates are about to expire.
- Ability to inject CAs to various Kubernetes objects that may need to consume it (enhancing the ability to configure the cluster declaratively).
This operator assumes that certificates are already provisioned and offers the above capabilities on existing TLS secrets. Being able to manage the lifecycle of certificates can be a challenge and it is highly recommended that there be an automated way of provisioning TLS certificates. Cert-manager (supported by Red Hat since OpenShift 4.10) is an operator that automates the provisioning of certificates and can integrate with an enterprise PKI. Cert-manager and cert-utils-operator work well together to provide a full end-to-end automation solution for certificate provisioning and rotation. This blog post details how such a solution can be implemented.
On the subject of automating credential provisioning (certificates are a form of credentials), HashiCorp Vault offers a good basis for building an enterprise credential management system where credentials can be consumed by several systems in the enterprise, including applications running in Kubernetes clusters. For Kubernetes, Vault can help enable a fully declarative Gitps approach which can be supplemented by the use of the Vault Agent sidecar, the Vault CSI Driver for CSI Secret Volumes, or operators such as External Secrets to retrieve credentials.
However, Vault itself requires that configuration steps be completed in an imperative fashion before secrets can be retrieved from it. The vault-config-operator wraps the imperative Vault API, turning them into declarative Kubernetes APIs, thus achieving the goal of using GitOps methodologies with Vault and Kubernetes. For more information about this operator see this blog post. As an example of how this operator can be used refer to this blog post in which a deployment for managing narrowly-scoped and short-lived pull secrets at scale is showcased.
Currently, this operator covers a limited portion of the Vault API and there are plans to increase the overall coverage in the future.
Managing cluster configuration with GitOps as highlighted previously is the recommended approach for managing Kubernetes as it prevents drift, and at same time, allows for scalability in terms of both number of tenants hosted and number of clusters that can be managed. Yet, when implementing GitOps at scale using a GitOps based operator, one might incur some limitations that are usually due to the fact that a GitOps operator itself cannot change resources that it does not own. The patch-operator enables creating patches (to non-owned objects) in a declarative way such that they can be managed by GitOps operators.
This blog post describes the driving forces behind why such an operator was created as well as how it can be used.
EgressIPs are a very much appreciated feature of OpenShift and can be used to support systems where there is a requirement for a known, fixed IP address to identify a workload. This feature is typically needed by legacy workloads as more modern cloud based systems do not make use of IP’s as an identifier given their ephemeral nature.
By using EgressIPs, one can assign static IPs to workloads running in an OpenShift namespace and these IPs can then be safely used to identify these workloads and applied within firewall rules or authentication policies.
The egressip-ipam-operator helps manage EgressIPs at scale as it implements an IPAM process for them. When running within a cloud provider, this operator also manages the allocation of the EgressIPs to the underlying cloud VMs.
The keepalived-operator helps honor the LoadBalancer and ExternalIP service type API for Kubernetes Services in those deployment in which there is no cloud provider or the cloud provider cannot create load balancers. This operator takes the place of those integrated options and facilitates the creation and lifecycle of keepalived instances.
With OpenShift 4.9 and 4.10, MetalLB has been introduced and it largely covers the initial use case for the keepalived-operator. We recommend using MetalLB whenever possible as it is directly supported by Red Hat. That said, ExternalIPs are not covered by MetalLB, and for that use case you can use the keepalived-operator. However, given the supported capabilities provided within the OpenShift product itself, the CoP plans to put this operator in maintenance mode without any further development.
Our third and last network-related operator is the global-load-balancer-operator. When deploying applications to multiple clusters and/or multiple data centers, a global load balancer is needed to send traffic to the different application instances. This global load balancer should allow for implementing different load balancing strategies as well as supporting different disaster recovery strategies. Several approaches are discussed in this blog post.
Once an approach for the global load balance implementation has been decided, we will need a way to configure it on a per application basis. Possibly, we would like to be able to declaratively apply global load balancing configurations based on the status of the applications as deployed across our fleet of clusters. This is where the global-load-balancer-operator can be of service.
This operator supports basic DNS-based global load balancing capabilities with any DNS product supported by externalDNS and includes advanced global load balancing features with Route53 in AWS, Traffic Manager in Azure and Global Load Balancer in Google.
In Kubernetes, applications using persistent volumes can run out of storage as they consume more and more disk space from their initially allocated volume. Automated preventive maintenance is one of the tenants of the SRE approaches, so in a similar vein as pod autoscaling, we can seek to implement a form of volume expansion based on the volume consumption metrics that are produced inside a Kubenretes cluster. The volume-expander-operator accomplishes this task and this blog post explains some of the principles on which the operator was built.
As the CSI specification matures and more CSI drivers start emitting metrics, we plan to upgrade the operator to consuming CSI metrics as opposed to leveraging kubelet-produced metrics as it does today.
Enabling node autoscaling is a handy feature when implementing a capacity management strategy for OpenShift. But, the node autoscaler is very reactive in the way it manages nodes, leading sometimes to a poor user experience (i.e. long pod deployment wait times during a node scale up event).
The proactive-node-scaling-operator alleviates this situation by creating a tradeoff between spare capacity and wait time. For more information on how this tradeoff and the relative automation is implemented, please see this blog post.
This simple operator helps you file support cases to Red Hat support for diagnosis and investigation purposes. It streamlines and automates the running of must gather processes and uploading of the collected information to a case. The information gathered by the must gather tool is often requested by support to help solve a case, so the primary intention of this operator is to help accelerate the initial information gathering step and shorten the time to resolution.
In this post, we provided a high level overview of the operators maintained in the Red Hat Container Community of Practice. In our view, these operators complement and expand upon the existing features that are supported out of the box in OpenShift. Our hope is that these operators will help accelerate the journey of adopting Kubernetes, and specifically OpenShift.