Dynamic Certificates for ROSA Custom Domain
This content is authored by Red Hat experts, but has not yet been tested on every supported configuration.
There may be situations when you prefer not to use wild-card certificates. This ROSA guide talks about certificate management with cert-manager and letsencrypt, to dynamically issue certificates to routes created on a custom domain that’s hosted on AWS Route53.
- Prerequisites
- Set up environment
- Prepare AWS Account
- Set up cert-manager
- Create the Issuer and the Certficiate
- Create the Custom Domain, which will be used to access your applications.
- Dynamic Certificates for Custom Domain Routes.
- Test an application.
- Debugging
Prerequisites
- A Red Hat OpenShift on AWS (ROSA) cluster
- The oc CLI #logged in.
- The aws CLI #logged in.
- The rosa CLI #logged in.
- jq
- gettext
- A Public Route53 Hosted Zone, and the related Domain to use.
Set up environment
Export few environment variables
export CLUSTER_NAME="sts-pvtlnk-cluster" export OIDC_PROVIDER=$(oc get authentication.config.openshift.io cluster -o json \ | jq -r .spec.serviceAccountIssuer| sed -e "s/^https:\/\///") export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) export SCRATCH_DIR=/tmp/scratch export AWS_PAGER="" export LETSENCRYPT_EMAIL=youremail@work.com export HOSTED_ZONE_ID=ABCDEFGHEXAMPLE export HOSTED_ZONE_REGION=us-east-2 export DOMAIN=lab.domain.com #Custom Hosted Zone Domain for apps mkdir -p $SCRATCH_DIR
Install jq & gettext
Installing or ensuring the gettext & jq package is installed, will allow us to use envsubst to simplify some of our configuration so we can use output of CLI commands as input into YAMLs to reduce the complexity of manual editing.
brew install gettext jq # or, for Linux / Windows WSL #sudo dnf install gettext jq
Prepare AWS Account
In order to make changes to the AWS Route53 Hosted Zone to add/remove DNS TXT challenge records by the cert-manager pod, we first need to create an IAM role with specific policy permissions & a trust relationship to allow access to the pod.
My Custom Domain Hosted Zone is in the same accout as the ROSA cluster. If these are in different accounts, few additional steps for Cross Account Access will be required.
Prepare an IAM Policy file
cat <<EOF > $SCRATCH_DIR/cert-manager-r53-policy.json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "route53:GetChange", "Resource": "arn:aws:route53:::change/*" }, { "Effect": "Allow", "Action": [ "route53:ChangeResourceRecordSets", "route53:ListResourceRecordSets" ], "Resource": "arn:aws:route53:::hostedzone/*" }, { "Effect": "Allow", "Action": "route53:ListHostedZonesByName", "Resource": "*" } ] } EOF
Create the IAM Policy using the above created file.
This creates a named policy for the cluster, you could use a generic policy for multiple clusters to keep things simpler.
POLICY=$(aws iam create-policy --policy-name "${CLUSTER_NAME}-cert-manager-r53-policy" \ --policy-document file://$SCRATCH_DIR/cert-manager-r53-policy.json \ --query 'Policy.Arn' --output text) || \ echo $POLICY
Create a Trust Policy
cat <<EOF > $SCRATCH_DIR/TrustPolicy.json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "${OIDC_PROVIDER}:sub": [ "system:serviceaccount:cert-manager:cert-manager" ] } } } ] } EOF
Create an IAM Role for the cert-manager Operator, with the above trust policy.
ROLE=$(aws iam create-role \ --role-name "${CLUSTER_NAME}-cert-manager-operator" \ --assume-role-policy-document file://$SCRATCH_DIR/TrustPolicy.json \ --query "Role.Arn" --output text) echo $ROLE
Attach the permissions policy to the Role
aws iam attach-role-policy \ --role-name "${CLUSTER_NAME}-cert-manager-operator" \ --policy-arn $POLICY
Set up cert-manager
Create a project (namespace) in the ROSA cluster.
oc new-project cert-manager --display-name="Certificate Manager" --description="Project contains Certificates and Custom Domain related components."
Install the cert-manager Operator
cat <<EOF | oc create -f - apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: generateName: cert-manager- namespace: cert-manager --- apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: cert-manager namespace: cert-manager spec: channel: stable installPlanApproval: Automatic name: cert-manager source: community-operators sourceNamespace: openshift-marketplace EOF
It will take a few minutes for this operator to install and complete its set up.
Wait for cert-manager operator to finish installing.
Our next steps depends on the successful installation of the operator. I recommend that you login to your cluster console to confirm the succeess status of cert-manager operator, in the Installed Operators section.
Annotate the ServiceAccount.
This is to enable the AWS SDK client code running within the cert-manager pod to interact with AWS STS service for temporary tokens, by assuming the IAM Role that was created in an earlier step. This is referred to as IRSA .
oc annotate serviceaccount cert-manager -n cert-manager eks.amazonaws.com/role-arn=$ROLE
Normally, after ServiceAccount annotations, a restart of the pod is required. However, the next step will automatically cause a restart of the pod.
Update the CA truststore of the cert-manager pod.
This step is usually not required. However, it was noticed that the cert-manager pod had difficulties in trusting the STS & LetsEncrypt endpoints. So the below commands essentially downloads the CA chain of these endpoints, puts them into a ConfigMap, which then gets attached to the pod as a Volume. Along with this step, I’ll also be setting the NameServers that the cert-manager will use for DNS01 self-check The Volume attachment to the pod and the setting of NameServers will be done together by patching the cert-manager CSV resource to persist these changes to the cert-manager deployment. This will cause the rollout of a new deployment and restart of the cert-manager pod.
openssl s_client -showcerts -verify 5 -connect sts.amazonaws.com:443 < /dev/null 2> /dev/null | awk '/BEGIN/,/END/{ if(/BEGIN/){a++}; print}' > $SCRATCH_DIR/extra-ca.pem openssl s_client -showcerts -verify 5 -connect acme-v02.api.letsencrypt.org:443 < /dev/null 2> /dev/null | awk '/BEGIN/,/END/{ if(/BEGIN/){a++}; print}' >> $SCRATCH_DIR/extra-ca.pem oc create configmap extra-ca --from-file=$SCRATCH_DIR/extra-ca.pem -n cert-manager CERT_MANAGER_CSV_NAME=$(oc get csv | grep 'cert-manager' | awk '{print $1}') CLUSTER_DNS_SERVICE_IP=$(oc get svc -n openshift-dns | grep 'dns-default' | awk '{print $3}') echo $CERT_MANAGER_CSV_NAME echo $CLUSTER_DNS_SERVICE_IP
oc patch csv $CERT_MANAGER_CSV_NAME --type='json' -p '[{"op": "add", "path": "/spec/install/spec/deployments/0/spec/template/spec/containers/0/args/-", "value":'--dns01-recursive-nameservers-only'}, {"op": "add", "path": "/spec/install/spec/deployments/0/spec/template/spec/containers/0/args/-", "value":'--dns01-recursive-nameservers=$CLUSTER_DNS_SERVICE_IP:53'}]' oc patch csv $CERT_MANAGER_CSV_NAME --type='json' -p '[{"op": "add", "path": "/spec/install/spec/deployments/0/spec/template/spec/volumes", "value": [{"name": "extra-ca"}]}, {"op": "add", "path": "/spec/install/spec/deployments/0/spec/template/spec/volumes/0/configMap", "value": {"name": "extra-ca", "defaultMode": 420}}, {"op": "add", "path": "/spec/install/spec/deployments/0/spec/template/spec/containers/0/volumeMounts", "value": [{"name": "extra-ca", "mountPath": "/etc/ssl/certs/extra-ca-bundle.pem", "readOnly": true, "subPath": "extra-ca-bundle.pem"}]}]'
During an Operator upgrade, the above changes might be lost. There seems to be improvement plans to facilitate these changes directly through the Operator config, but until then, it’d be a good idea to maintain some automation around this to persist these changes if it ever gets overridden to defaults.
Create the Issuer and the Certficiate
Configure Certificate Requestor
Create Cluster Issuer to use Let’s Encrypt
envsubst <<EOF | oc apply -f - apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: letsencryptissuer spec: acme: server: https://acme-v02.api.letsencrypt.org/directory email: $LETSENCRYPT_EMAIL # This key doesn't exist, cert-manager creates it privateKeySecretRef: name: prod-letsencrypt-issuer-account-key solvers: - dns01: route53: hostedZoneID: $HOSTED_ZONE_ID region: $HOSTED_ZONE_REGION secretAccessKeySecretRef: name: '' EOF
Describe the ClusterIssuer to confirm it’s ready.
oc describe clusterissuer letsencryptissuer
You should see an output that mentions that the issuer is Registered/Ready. Note this can take a few minutes.
Conditions: Last Transition Time: 2022-11-17T10:29:37Z Message: The ACME account was registered with the ACME server Observed Generation: 1 Reason: ACMEAccountRegistered Status: True Type: Ready Events: <none>
Once the above command is complete, the status of the ClusterIssues in the cluster console will look similar to the below screenshot.
Create the Certificate, which will later be used by the Custom Domain.
*I’ve used a SAN certificate here to show how SAN certificates could be created, which will be useful for clusters intended to run only a fixed set of applications. However, this is optional; a single subject/domain certificate works too *
Configure the Certificate
envsubst <<EOF | oc apply -f - apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: customdomain-cert namespace: cert-manager spec: secretName: custom-domain-certificate-tls issuerRef: name: letsencryptissuer kind: ClusterIssuer commonName: "x.apps.$DOMAIN" dnsNames: - "x.apps.$DOMAIN" - "y.apps.$DOMAIN" - "z.apps.$DOMAIN" EOF
View the Certificate status
It’ll take up to 5 minutes for the Certificate to show as Ready status. If it takes too long, the oc describe command will mention issues if any.
oc get certificate customdomain-cert -n cert-manager oc describe certificate customdomain-cert -n cert-manager
Create the Custom Domain, which will be used to access your applications.
Create the Custom Domain
envsubst <<EOF | oc apply -f - apiVersion: managed.openshift.io/v1alpha1 kind: CustomDomain metadata: name: appdomain spec: domain: x.apps.$DOMAIN certificate: name: custom-domain-certificate-tls namespace: cert-manager scope: Internal EOF
View the status of the Custom Domain
oc get customdomain appdomain -n cert-manager
It will take 2-3 minutes for the custom domain to change from NotReady to Ready status. When ready, an endpoint also will be visible in the output of the above command, as shown below
- Next, we need to add a DNS record in my Custom Domain Route53 Hosted Zone to CNAME the the wildcard applications domain to the above obtained endpoint, as shown below.
CUSTOM_DOMAIN_ENDPOINT=$(oc get customdomain appdomain -n cert-manager | grep 'appdomain' | awk '{print $2}')
echo $CUSTOM_DOMAIN_ENDPOINT
cat <<EOF > $SCRATCH_DIR/add_cname_record.json
{
"Comment":"Add apps CNAME to Custom Domain Endpoint",
"Changes":[{
"Action":"CREATE",
"ResourceRecordSet":{
"Name": "*.apps.$DOMAIN",
"Type":"CNAME",
"TTL":30,
"ResourceRecords":[{
"Value": "$CUSTOM_DOMAIN_ENDPOINT"
}]
}
}]
}
EOF
aws route53 change-resource-record-sets --hosted-zone-id $HOSTED_ZONE_ID --change-batch file://$SCRATCH_DIR/add_cname_record.json
The wild card CNAME’ing avoids the need to create a new record for every new application. The certificate that each of these applications use will NOT be a wildcard certificate
At this stage, you will be able to expose cluster applications on any of the listed DNS names that were specified in the previously created Certificate. But what if you have many more applications that will need to be securely exposed too. Well, one approach is to keep updating the Certificate resource with additional SAN names as more applications prepare to get onboarded, and this Certificate update which will trigger an update to the Custom Domain to honor the newly added SAN names. Another approach is to dynamically issue a Certificate to every new Route; Read on to find the details about this latter approach.
Dynamic Certificates for Custom Domain Routes.
- Create OpenShift resources required for issuing Dynamic Certificates to Routes.
This step will create a new deployment (and hence a pod) that’ll watch out for specifically annotated routes in the cluster, and if the issuer-kind and issuer-name annotations are found in a new route, it will request the Issuer (ClusterIssuer in my case) for a new Certificate that’s unique to this route and which will honor the hostname that was specified while creating the route.
oc apply -f https://github.com/cert-manager/openshift-routes/releases/latest/download/cert-manager-openshift-routes.yaml -n cert-manager
Additonal OpenShift resources such as a ClusterRole (with permissions to watch and update the routes across the cluster), a ServiceAccount (with these permissions, that will be used to run this newly created pod) and a ClusterRoleBinding to bind these two resources, will be created in this step too. If the cluster does not have access to github, you may as well save the raw contents locally, and run oc apply -f localfilename.yaml -n cert-manager
- View the status of the new pod.
Check if all the pods are running successfully and that the events do not mention any errors.
oc get po -n cert-manager
Test an application.
Create a test applciation in a new namespace.
oc new-project testapp oc new-app --docker-image=docker.io/openshift/hello-openshift -n testapp
Expose the test application Service.
Let’s create a Route to expose the application from outside the cluster, and annotate the Route to give it a new Certificate.
oc create route edge --service=hello-openshift testroute --hostname hello.apps.$DOMAIN -n testapp oc annotate route testroute -n testapp cert-manager.io/issuer-kind=ClusterIssuer cert-manager.io/issuer-name=letsencryptissuer
It will take a 2-3 minutes for the Certificate to be created. The renewal of the certitificate will automatically be managed by the cert-manager compoenents as it approaches expiry.
Access the application Route.
Do a curl test (or any http client of your preference) to confirm there are no certificate related errors.
Output should print “Hello OpenShfit!”, and you should also notice a line that says “subjectAltName: host hello.apps.$DOMIAN” matched cert’s “hello.apps.$DOMIAN”
curl -vv https://hello.apps.$DOMAIN
Debugging
Please note that while creating certificates, the validation process usually take upto 2-3 minutes to complete.
If you face issues during certificate create step, run ‘oc describe’ against each of - ‘certificate,certificaterequest,order & challenge’ resources to view the events/reasons that’ll mostly help with identifying the cause of the issue.
oc get certificate,certificaterequest,order,challenge
This is a very helpful guide in debugging certificates as well.
You may also use the cmctl CLI tool for various certificate management activities such as to check the status of certificates, testing renewals etc.