Introduction
Red Hat Advanced Cluster Management (RHACM) provides users a policy framework to check the compliance of their managed clusters, where they have the option to automatically remediate many violations. In the previously RHACM version, users could check which policies were generating violations by viewing the statuses of each policy template. Currently in RHACM 2.4, violations are collected and aggregated by a PolicyReport
, a custom resource that is created on every managed cluster. These resources were introduced in RHACM 2.3 to store violations created by the insights client, but now it also automatically pulls in governance violations as well. Policy reports can be viewed to provide a quick overview of all violations found in a managed cluster, and also produce metrics and alerts that can be used to configure those violations to be sent to incident management systems.
PolicyReport Integration with RHACM Governance
Let's take a look at a sample PolicyReport
that is created on a cluster with one policy that has a violation. In the following example, the policy-pod
policy, is configured to look for a nginx-pod
pod that is not present on the cluster. The policy is available as a template in the Specifications drop-down menu on the policy creation page:
apiVersion: wgpolicyk8s.io/v1alpha2
kind: PolicyReport
metadata:
name: local-cluster-policyreport
namespace: local-cluster
results:
- category: PR.PT Protective Technology
message: 'NonCompliant; violation - pods not found: [nginx-pod] in namespace default missing'
policy: default.policy-pod
properties:
created_at: "2021-10-07T17:37:13Z"
total_risk: "1"
result: fail
source: grc
timestamp:
nanos: 1666061482
seconds: 1633628707
scope:
kind: cluster
name: local-cluster
namespace: local-cluster
summary:
error: 0
fail: 1
pass: 0
skip: 0
warn: 0
All violations on the cluster generate an item in the results
list as seen in the example. There is one entry for each non-compliant policy on the cluster; entries for compliant policies are ignored to avoid cluttering in the PolicyReport
. Let's dive deeper into the specific fields of the result
object:
category
: The category specified in the original parent policy where the violation is coming from.message
: The violation that is causing the policy to be flagged as non-compliant.policy
: The name and namespace of the parent policy that generated the violation.properties.created_at
: The creation timestamp of the policy, not the time when the violation occured.properties.total_risk
: This field is used by thePolicyReport
metrics collector to determine the severity of a policy, which can be set with the following values:low
severity:total_risk=1
medium
severity:total_risk=2
high
severity:total_risk=3
critical
severity:total_risk=4
result
: This field can be ignored because thePolicyReport
only picks up policies that are generating violations, it is expected to be set tofail
.source
: This is used to distinguish whether a violation is coming from Insights (expected value isinsights
) or RHACM Governance (expected value isgrc
).timestamp
: The time when the violation was added to thePolicyReport
.
By default, the insights client is set to poll for violations every 30 minutes, so policy violations may take up to 30 minutes to appear in the PolicyReport
results
field. If you want the violations for a cluster to be processed more frequently, you can specify an integer in the POLL_INTERVAL
environment variable in the insights client deployment (policyreport-xxxxx-insights-client
).
Configuring Violations to be Sent to Incident Management Systems
Description of policyreport_info metric
In addition to viewing the PolicyReport
object from the CLI, the insights client exposes the items in the results
list as a metric called policyreport_info
. This metric can be viewed from the Metrics tab in the Openshift console for a cluster, and in Prometheus. View the following policy report sample:
The insights client passes the following fields to the metric:
managed_cluster_id
: The name of the managed cluster where the violation is reported.category
: The category of the policy.policy
: The name of the policy that reported the violation.result
: Similar to theresults
field of thePolicyReport
, the parameter value is alwaysfail
.severity
: This is mapped to thetotal_risk
field in thePolicyReport
results
.
Integration with RHACM Observability and Incident Management Systems
The RHACM observability component is already set up to process PolicyReports
; the alert feature for PolicyReports
was initially added in RHACM 2.3. With the integration of the governance framework and PolicyReports
in 2.4, users are now able to configure alerts on policy violations that can be sent to incident management systems, like Slack. In order to alert on any violations produced by a policy, set the severity of the policy to critical
and follow the steps previously outlined in in the alerting blog to set up alerting on your cluster.
Conclusion
In this blog, we explored how to view governance violations in PolicyReports
, and how those PolicyReport violations generate metrics that can be picked up by an Alertmanager. With the information in this blog and alerting set up on your cluster, you can create policies and have their violations be sent to incident management systems.
Categories