Introduction

Red Hat Advanced Cluster Management (RHACM) provides users a policy framework to check the compliance of their managed clusters, where they have the option to automatically remediate many violations. In the previously RHACM version, users could check which policies were generating violations by viewing the statuses of each policy template. Currently in RHACM 2.4, violations are collected and aggregated by a PolicyReport, a custom resource that is created on every managed cluster. These resources were introduced in RHACM 2.3 to store violations created by the insights client, but now it also automatically pulls in governance violations as well. Policy reports can be viewed to provide a quick overview of all violations found in a managed cluster, and also produce metrics and alerts that can be used to configure those violations to be sent to incident management systems.

PolicyReport Integration with RHACM Governance

Let's take a look at a sample PolicyReport that is created on a cluster with one policy that has a violation. In the following example, the policy-pod policy, is configured to look for a nginx-pod pod that is not present on the cluster. The policy is available as a template in the Specifications drop-down menu on the policy creation page:

apiVersion: wgpolicyk8s.io/v1alpha2
kind: PolicyReport
metadata:
name: local-cluster-policyreport
namespace: local-cluster
results:
- category: PR.PT Protective Technology
message: 'NonCompliant; violation - pods not found: [nginx-pod] in namespace default missing'
policy: default.policy-pod
properties:
created_at: "2021-10-07T17:37:13Z"
total_risk: "1"
result: fail
source: grc
timestamp:
nanos: 1666061482
seconds: 1633628707
scope:
kind: cluster
name: local-cluster
namespace: local-cluster
summary:
error: 0
fail: 1
pass: 0
skip: 0
warn: 0

All violations on the cluster generate an item in the results list as seen in the example. There is one entry for each non-compliant policy on the cluster; entries for compliant policies are ignored to avoid cluttering in the PolicyReport. Let's dive deeper into the specific fields of the result object:

  • category: The category specified in the original parent policy where the violation is coming from.
  • message: The violation that is causing the policy to be flagged as non-compliant.
  • policy: The name and namespace of the parent policy that generated the violation.
  • properties.created_at: The creation timestamp of the policy, not the time when the violation occured.
  • properties.total_risk: This field is used by the PolicyReport metrics collector to determine the severity of a policy, which can be set with the following values:
    • low severity: total_risk=1
    • medium severity: total_risk=2
    • high severity: total_risk=3
    • critical severity: total_risk=4
  • result: This field can be ignored because the PolicyReport only picks up policies that are generating violations, it is expected to be set to fail.
  • source: This is used to distinguish whether a violation is coming from Insights (expected value is insights) or RHACM Governance (expected value is grc).
  • timestamp: The time when the violation was added to the PolicyReport.

By default, the insights client is set to poll for violations every 30 minutes, so policy violations may take up to 30 minutes to appear in the PolicyReport results field. If you want the violations for a cluster to be processed more frequently, you can specify an integer in the POLL_INTERVAL environment variable in the insights client deployment (policyreport-xxxxx-insights-client).

Configuring Violations to be Sent to Incident Management Systems

Description of policyreport_info metric

In addition to viewing the PolicyReport object from the CLI, the insights client exposes the items in the results list as a metric called policyreport_info. This metric can be viewed from the Metrics tab in the Openshift console for a cluster, and in Prometheus. View the following policy report sample:

acmpolicyreportmetricblog

The insights client passes the following fields to the metric:

  • managed_cluster_id: The name of the managed cluster where the violation is reported.
  • category: The category of the policy.
  • policy: The name of the policy that reported the violation.
  • result: Similar to the results field of the PolicyReport, the parameter value is always fail.
  • severity: This is mapped to the total_risk field in the PolicyReport results.

Integration with RHACM Observability and Incident Management Systems

The RHACM observability component is already set up to process PolicyReports; the alert feature for PolicyReports was initially added in RHACM 2.3. With the integration of the governance framework and PolicyReports in 2.4, users are now able to configure alerts on policy violations that can be sent to incident management systems, like Slack. In order to alert on any violations produced by a policy, set the severity of the policy to critical and follow the steps previously outlined in in the alerting blog to set up alerting on your cluster.

Conclusion

In this blog, we explored how to view governance violations in PolicyReports, and how those PolicyReport violations generate metrics that can be picked up by an Alertmanager. With the information in this blog and alerting set up on your cluster, you can create policies and have their violations be sent to incident management systems.