Overview

With the release of Red Hat Advanced Cluster Management for Kubernetes (RHACM) version 2.3, Ansible integration is supported in the governance lifecycle of the product. This feature provides a way for you to configure an Ansible Tower job to be run in Ansible Tower, when a policy violation occurs on one or more managed clusters. This blog provides an example of how this integration works and how you can use it in your environment.

The scenario to be explored in this blog post is a web application that runs on Red Hat OpenShift Container Platform (RHOCP) and requires a SSL/TLS certificate for HTTPS connections. The certificate is stored as an RHOCP secret, and is mounted in the container for the Apache web server to use. Since certificates often expire in relatively short time frames, it is critical for you to be notified about the certificate expiration. With the usage of a certificate policy in RHACM, the expiration of the SSL/TLS certificate can be monitored and Ansible automation can be associated with it. Visit the blog, How to use the Certificate Policy Controller to Identify Risks in Red Hat Advanced Cluster Management for Kubernetes, to learn more about certificate policies.

In more detail, I share the configuration of a CertificatePolicy resource, where a policy violation is reported when the SSL/TLS certificate is set to expire within 30 days. The policy violation that is listed in the RHACM console initiates an Ansible Tower job to create a ServiceNow incident (i.e. ticket), notifying you of the impending SSL/TLS certificate expiration. From there, you can renew the SSL/TLS certificate of the web application and update the RHOCP secret with the renewed SSL/TLS certificate. Though this is not explored in this blog post, Ansible can also be used to perform the renewal and replacement of the certificate automatically.

Please note that many of these steps can be performed with either the command line interface (CLI) or from the console. This blog uses a mixture based on whichever is easier to demonstrate.

Prerequisites

The following is required to perform the actions in this blog post:

Even if you don’t have access to the previously mentioned requirements, continue to follow along to learn how this all works.

Setting Up the Demo Application

As stated previously, let's walk through creating a demo web application that serves HTTPS using a custom SSL/TLS certificate. From the CLI, create a self-signed SSL/TLS certificate (not recommended for production). The created SSL/TLS certificate is expected to expire 25 days from now, which is necessary to initiate a RHACM policy violation later on. Your SSL/TLS certificate may be created using the following commands:

mkdir tls

openssl req \
-new \
-newkey rsa:4096 \
-days 25 \
-nodes \
-x509 \
-subj "/C=US/ST=NC/L=Raleigh/O=Example/CN=www.example.com" \
-keyout tls/tls.key \
-out tls/tls.crt

From the CLI, log in to an RHOCP cluster that is either managed by the RHACM hub cluster, or is a managed cluster. For this example, I use the hub cluster. Then create an RHOCP namespace for the demo application to reside in. Run the following commands:

oc login

oc create ns acm-grc-ansible-example

Next, create an RHOCP secret to contain the self-signed SSL/TLS certificate previously generated. Notice that the tls.crt secret key name is used. This is because it is the default key name that RHACM checks for when checking for CertificatePolicy violations. To change the secret key name, see the Updating certificate policies documentation.

oc -n acm-grc-ansible-example create secret generic certs \
--from-file=tls.key=tls/tls.key \
--from-file=tls.crt=tls/tls.crt

Now it’s time to deploy the example web application. Run the following command to create the RHOCP objects:

oc -n acm-grc-ansible-example apply -f \
https://raw.githubusercontent.com/open-cluster-management/grc-ansible-integration-blog/main/openshift/app.yml

After you run this command, the following RHOCP objects are created:

  • An ImageStream that points to the quay.io/centos7/httpd-24-centos7 container image.

  • A DeploymentConfig that creates a container from the ImageStream with the certificate mounted and using the previously created secret.

  • A Service, which exposes port 8443 in the container as port 443.

  • A Route that points to the Service.

    Note: In a production use-case, the Route should be configured with a custom, fully-qualified domain name that matches the SSL/TLS certificate, but to simplify things, let's use the fully-qualified domain name that is generated by RHOCP.

To verify that the web application is deployed and using the self-signed certificate, run the following commands. The commands require a Linux or Mac system, but an alternative is explained later in this section. Note that it may take several seconds for RHOCP to get the demo web application running.

export ROUTE=$(oc -n acm-grc-ansible-example get route | tail -1 | tr -s ' ' | cut -f 2 -d ' ')
echo | openssl s_client -showcerts -connect "$ROUTE:443" 2>/dev/null | openssl x509 -inform pem -noout -text

Here is a snippet of what should be displayed:

Issuer: C = US, ST = NC, L = Raleigh, O = Example, CN = www.example.com
Validity
Not Before: Jul 15 16:33:29 2021 GMT
Not After : Aug 9 16:33:29 2021 GMT
Subject: C = US, ST = NC, L = Raleigh, O = Example, CN = www.example.com

Alternatively, you can visit the URL of the RHOCP Route that is created in the acm-grc-ansible-example RHOCP namespace with your web browser and examine the certificate.

Setting Up Ansible Tower

In order to be able to create a ServiceNow incident when the certificate is near expiration, you must create an Ansible automation, specifically in Ansible Tower. The open-cluster-management/grc-ansible-integration-blog GitHub repository contains an example Ansible playbook that is used for this blog. This playbook uses a local Ansible connection to create a temporary Python virtual environment, installs the Python dependencies in it that are required for the snow_record Ansible module, and creates a ServiceNow incident using the aforementioned Ansible module. See, Working With Modules for more information.

Creating the Fork

In order to configure the Ansible playbook to use your ServiceNow instance, start by creating a fork of the repository on a Git forge (e.g. GitHub, internal GitLab, etc.) that your Ansible Tower instance has access to read. Once you have done so, proceed to complete the following steps:

  1. Clone the fork locally.

  2. From the CLI, create an Ansible vault (encrypted file) at ansible/vaults/secret-vars.yml, in the directory of the cloned forked repository. This vault must include the variables snow_host, snow_password, and snow_username. After creating the vault, store the vault password securely. If a Linux or Mac system is being used to create the file, the command might resemble the following example:

    cat <<EOT >> ansible/vaults/secret-vars.yml
    snow_host: dev86540.service-now.com
    snow_password: admin_password
    snow_username: admin
    EOT
    ansible-vault encrypt ansible/vaults/secret-vars.yml
  3. Commit the vaulted file using git and push it to the main branch.

Configuring Ansible Tower

Project

Now that the Ansible configuration is all set, Ansible Tower needs to be configured to be able to run it. Start by creating a new Ansible Tower project. For this blog, the SCM URL field value is the URL of the forked repository. The SCM UPDATE OPTIONS section is optional, but is relied upon in this blog post. View the following image of the Ansible Tower project named, GRC Ansible Integration Blog:

1

Inventory

Once the project is created, create a new inventory and inventory source. This inventory contains a group called create_ticket and is set to use a local Ansible connection, so no external host is required to run the playbook.

Once the inventory is created, sync the project from the Projects page. Note that the UPDATE OPTIONS section is not required, but is relied upon in this blog post. If the inventory file is not shown in the drop-down menu, type it in manually and hit the Enter key, as shown in the following images:

2

3

Vault Credential

Next, we need to create an Ansible Vault credential. This is so that Ansible Tower is set to decrypt the Ansible vault that was previously created. The Vault credential should resemble the following image:

4

Job Template

At this point, Ansible Tower is configured to know about the playbook and inventory in the forked repository. It is also configured to decrypt the Ansible vault you previously created. Next, Ansible Tower needs to be configured to run the Ansible playbook. To do this, create an Ansible Tower job template. Note that the credential in the CREDENTIALS section is the Ansible Tower vault credential previously created. Also, the PROMPT ON LAUNCH checkbox next to the EXTRA VARIABLES section is required. This is because RHACM provides extra variables by default and also supports custom extra variables. View the following image as a reference:

5

User Access Token

Lastly, RHACM requires the permission to launch the Ansible Tower job template that was previously created. To do so, create a token that grants you access to run the job template and securely store it.

Note: It is best practice to use a separate Ansible Tower service account with access to run that specific job template.

Setting Up RHACM

Configuring the Policy

Now that there is a demo web application running and Ansible Tower is configured for the RHACM governance integration, it’s time to create a policy! This example scenario requires a certificate policy that detects when a SSL/TLS certificate is within 30 days of expiring in the acm-grc-ansible-example RHOCP namespace.

To do so, create the appropriate Policy, PlacementBinding, and PlacementRule objects with the following command:

oc apply -f \
https://raw.githubusercontent.com/open-cluster-management/grc-ansible-integration-blog/main/openshift/policy.yml

Note: The PlacementRule object matches every managed cluster, including the hub cluster. It’s recommended to make this more specific for a production use-case. This also utilizes the acm-grc-ansible-example RHOCP namespace previously created. If the demo web application is deployed on a managed cluster and not the hub cluster, an RHOCP namespace needs to be created on the hub cluster.

The interesting portion of this policy is how specific you can set your policy configuration. Notice how include is set specifically to the acm-grc-ansible-example RHOCP namespace. This is so that the Ansible automation, that is later configured, is only ever initiated for a certificate that is expiring in this RHOCP namespace. Additionally, the minimumDuration value is set to 720h, which means that there is only a policy violation if the certificate is expiring in less than 30 days. View a portion of the created certificate policy with the aforementioned configuration:

policy-templates:
- objectDefinition:
apiVersion: policy.open-cluster-management.io/v1
kind: CertificatePolicy
metadata:
name: ansible-example-certificatepolicy
spec:
namespaceSelector:
include:
- acm-grc-ansible-example
exclude:
- kube-*
remediationAction: inform
severity: medium
minimumDuration: 720h

When you examine the policies from the Governance page, your view might resemble the policy violation in the following image:

6

Configuring the Ansible Integration

Ansible Tower Credential

In order for RHACM to be able to connect to Ansible Tower to run the Ansible Tower job template previously created, a credential must be created within RHACM. Complete the following steps:

  1. From the navigation menu, click on Credentials and then click the Add credentials button. In the Automation & other credentials section, select Red Hat Ansible Automation Platform.

  2. Fill in the form with the following values:

    • Credentials name: ansible-tower
    • Namespace: acm-grc-ansible-example
    • Ansible Tower host: https://ansible-tower.example.com
      • Replace this value with the actual URL to the Ansible Tower instance.
    • Ansible Tower token: ansible-tower-token-generated-earlier
      • Replace this value with the actual token that has access to run the job template.
  3. Click Create.

Alternatively, you can create the credential from the CLI. For an example where the host and token values are replaced with the base64 of the actual values, see the ansible-tower-credential.yml.

Connecting the Policy to the Ansible Tower Job Template

Now it’s finally time to put all that hard work to use and configure the policy to run the configured Ansible Tower job template on policy violations. From the Governance page in RHACM, a policy that has a cluster violation should be displayed as shown in the following image:

06-01

You’ll notice that there is a column named Automation. Click on the Configure link to view the side-panel. For the Credential section, select the ansible-tower credential created earlier. For the Job template field, select GRC Ansible Integration Blog. For the extra variables section, add policy_name: ansible-example-certificatepolicy and target_namespace: acm-grc-ansible-example. The configuration from the Automation violation policy side-panel might resemble the following image:

7

In the Schedule automation section, select Run once mode. This mode runs the Ansible Tower job template upon the first policy violation. Afterwards, it is immediately set to disabled until a RHACM administrator re-enables it. Finally, click the Save button. Your configuration form might resemble the following image:

08-01

Behind the scenes, this created a PolicyAutomation RHOCP object. Alternatively, you can use the CLI to create this with the following command:

oc apply -f \
https://raw.githubusercontent.com/open-cluster-management/grc-ansible-integration-blog/main/openshift/policy-automation.yml

When you examine the file being applied, notice that the PolicyAutomation RHOCP object creates an AnsibleJob RHOCP object when initiated. This object is what is picked up by the Ansible Automation Platform Resource operator to initiate the Ansible job in Ansible Tower:

---
apiVersion: policy.open-cluster-management.io/v1beta1
kind: PolicyAutomation
metadata:
name: ansible-example-certificatepolicy-policy-automation
namespace: acm-grc-ansible-example
spec:
automationDef:
extra_vars:
policy_name: ansible-example-certificatepolicy
target_namespace: acm-grc-ansible-example
name: GRC Ansible Integration Blog
secret: ansible-tower
type: AnsibleJob
mode: once
policyRef: ansible-example-certificatepolicy

Examining the Automation Job

At this point, it is verified that the policy violation caused an AnsibleJob object to be created in the RHOCP namespace, where the Policy object was previously created in. Run the following commands to view the AnsibleJob but replace cdw4g in ansible-example-certificatepolicy-policy-automation-once-cdw4g with the value you should see on your end:

❯ oc -n acm-grc-ansible-example get AnsibleJob
NAME AGE
ansible-example-certificatepolicy-policy-automation-once-cdw4g 101s
❯ oc describe -n acm-grc-ansible-example AnsibleJob ansible-example-certificatepolicy-policy-automation-once-cdw4g
Name: ansible-example-certificatepolicy-policy-automation-once-cdw4g
Namespace: acm-grc-ansible-example
Labels: tower_job_id=1760
Annotations: <none>
API Version: tower.ansible.com/v1alpha1
Kind: AnsibleJob
...
Spec:
extra_vars:
target_clusters:
- local-cluster
policy_name: ansible-example-certificatepolicy
target_namespace: acm-grc-ansible-example
job_template_name: GRC Ansible Integration Blog
tower_auth_secret: ansible-tower
Status:
Ansible Job Result:
Changed: true
Elapsed: 22.781
Failed: false
Finished: 2021-07-15T19:37:33.371986Z
Started: 2021-07-15T19:37:10.590709Z
Status: successful
URL: https://ansible-tower.example.com/#/jobs/playbook/1760
...
k8sJob:
Created: true
Env:
Secret Namespaced Name: default/ansible-tower
Template Name: GRC Ansible Integration Blog
Verify SSL: false
Message: Monitor the job.batch status for more details with the following commands:
'kubectl -n default get job.batch/ansible-example-certificatepolicy-policy-automation-once-cdw4g'
'kubectl -n default describe job.batch/ansible-example-certificatepolicy-policy-automation-once-cdw4g'
'kubectl -n default logs -f job.batch/ansible-example-certificatepolicy-policy-automation-once-cdw4g'

When you examine the output, there are few interesting things to note:

  • The first is that the URL to the initiated Ansible Tower job is shown.
  • If you notice in the extra_vars section, it in fact contains the extra variable of target_namespace that was previously configured, but there is also the additional variable of target_clusters. This variable is automatically supplied by RHACM, and it contains the names of the clusters that violate the configured policy.
  • Lastly, there is a message that explains that an RHOCP Job object was created to actually run the Ansible Tower job template and wait for it. Run the following command to check the logs and view the progress but replace cdw4g in ansible-example-certificatepolicy-policy-automation-once-cdw4g with the value retrieved from the previous commands:
❯ oc -n default logs -f job.batch/ansible-example-certificatepolicy-policy-automation-once-cdw4g

PLAY [localhost] ***************************************************************

TASK [job_runner : Read AnsibleJob Specs] **************************************
ok: [localhost]

TASK [job_runner : awx.awx.tower_job_launch] ***********************************
changed: [localhost]

TASK [job_runner : Update AnsibleJob definition with Tower job id] *************
changed: [localhost]

TASK [job_runner : Update AnsibleJob status with Tower job status and url] *****
changed: [localhost]

TASK [job_runner : tower_job_wait] *********************************************
ok: [localhost]

TASK [job_runner : Update AnsibleJob status with Tower job result] *************
changed: [localhost]

PLAY RECAP *********************************************************************
localhost : ok=6 changed=4 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0

You can also verify that the job was initiated successfully from Ansible Tower:

9

If the Ansible Tower job failed due to a Python import error, you may need to use an Ansible Tower Ansible virtual environment that uses Python 3. Another alternative is to use an Ansible Tower container execution environment. If you do not have access to configure the Ansible Tower instance that you are using, you can set the extra variable python_path_override: /var/lib/awx/venv/awx/lib64/python3.6 on the job template, but it is not a recommended solution.

Note Be sure to use the correct Python version for your environment.

The ServiceNow incident is now created. The Description field includes the text from the extra variables (policy_name, target_namespace, and target_clusters) supplied by RHACM to Ansible Tower.

10

Next Steps

At this point, imagine that the incident is assigned to one or more maintainers of the web application to renew the SSL/TLS certificate, and update the RHOCP secret. Note that after the Ansible Tower job template is initiated, the automation mode associated with the configured policy is set to disabled. Therefore, after the RHOCP secret is updated with the renewed SSL/TLS certificate, the automation mode must be reset to once by an RHACM administrator or using GitOps for the next time that the certificate nears expiration.

Cleanup

To clean up the demo web application, policy, and Ansible Tower credential in RHOCP, run the following command to delete the RHOCP namespace that was created as part of this blog:

oc delete ns acm-grc-ansible-example

Other Use Cases

Although this demo showcased Ansible automation being initiated from a CertificatePolicy, this can be done with any policy type. For example, you can create a ServiceNow incident for an IamPolicy violation when the number of cluster administrators exceeds the expected amount. The steps are similiar to what is outlined in this blog except using a different policy, and requiring a different short_description value in the playbook. View the following example of what the difference may look like:

diff --git a/ansible/playbooks/create_ticket.yml b/ansible/playbooks/create_ticket.yml
index 9f4bd2f..4afeb1a 100644
--- a/ansible/playbooks/create_ticket.yml
+++ b/ansible/playbooks/create_ticket.yml
@@ -54,8 +54,7 @@
caller_id: "{{ snow_username }}"
short_description: "ACM {{ policy_name }} violation"
description:
- "{{ policy_name }} violation: one or more certificates are expiring soon in a secret in
- the namespace {{ target_namespace }} on the clusters:
+ "The number of cluster admins exceeds the expected amount on the clusters:
{{ target_clusters | join(', ') }}"
priority: 2
severity: 2

Additionally, you can choose to perform any automation that Ansible is capable of instead of creating a ServiceNow incident. This flexibility enables many automation scenarios to fit your needs.