Using the AWS Cloud Watch agent to publish metrics to CloudWatch in ROSA
This content is authored by Red Hat experts, but has not yet been tested on every supported configuration.
This document shows how you can use the AWS CloudWatch Agent to scrape Prometheus endpoints and publish metrics to CloudWatch in a Red Hat OpenShift Service on AWS (ROSA) cluster.
It pulls from the AWS documentation for installing the CloudWatch Agent to Kubernetes and publishes metrics for the Kubernetes API Server and provides a simple dashboard to view the results.
Currently the AWS CloudWatch Agent does not support pulling all metrics from the Prometheus federated endpoint, but the hope is that when it does we can ship all cluster and user workload metrics to AWS CloudWatch.
Prerequisites
- A Red Hat OpenShift Service on AWS (ROSA) cluster
- The OpenShift CLI (
oc
) - The
jq
command-line interface (CLI) - The Amazon Web Services (AWS) CLI (
aws
)
Setting up your environment
Ensure you are logged into your cluster with the OpenShift CLI (
oc
) and your AWS account with the AWS CLI (aws
).Configure the following environment variables:
export ROSA_CLUSTER_NAME=$(oc get infrastructure cluster -o=jsonpath="{.status.infrastructureName}" | sed 's/-[a-z0-9]\{5\}$//') export REGION=$(rosa describe cluster -c ${ROSA_CLUSTER_NAME} --output json | jq -r .region.id) export OIDC_ENDPOINT=$(oc get authentication.config.openshift.io cluster -o json | jq -r .spec.serviceAccountIssuer | sed 's|^https://||') export AWS_ACCOUNT_ID=`aws sts get-caller-identity --query Account --output text` export AWS_PAGER="" export SCRATCH="/tmp/${ROSA_CLUSTER_NAME}/cloudwatch-agent-metrics" mkdir -p ${SCRATCH}
Ensure all fields output correctly before moving to the next section:
echo "Cluster: ${ROSA_CLUSTER_NAME}, Region: ${REGION}, OIDC Endpoint: ${OIDC_ENDPOINT}, AWS Account ID: ${AWS_ACCOUNT_ID}"
Preparing your AWS account
Create an IAM role trust policy for the CloudWatch Agent service account to use:
cat <<EOF > ${SCRATCH}/trust-policy.json { "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_ENDPOINT}" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "${OIDC_ENDPOINT}:sub": "system:serviceaccount:amazon-cloudwatch:cwagent-prometheus" } } }] } EOF
Create an IAM role for the CloudWatch Agent to assume:
ROLE_ARN=$(aws iam create-role --role-name "${ROSA_CLUSTER_NAME}-cloudwatch-agent" \ --assume-role-policy-document file://${SCRATCH}/trust-policy.json \ --query Role.Arn --output text) echo ${ROLE_ARN}
Attach the AWS-managed
CloudWatchAgentServerPolicy
IAM policy to the IAM role:aws iam attach-role-policy --role-name "${ROSA_CLUSTER_NAME}-cloudwatch-agent" --policy-arn "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy"
Deploy the AWS CloudWatch Agent
Create a project for the AWS CloudWatch Agent:
oc new-project amazon-cloudwatch
Create a ConfigMap with the Prometheus CloudWatch Agent config:
cat << EOF | oc apply -f - apiVersion: v1 kind: ConfigMap metadata: name: prometheus-cwagentconfig namespace: amazon-cloudwatch data: cwagentconfig.json: | { "agent": { "region": "${REGION}", "debug": true }, "logs": { "metrics_collected": { "prometheus": { "cluster_name": "${ROSA_CLUSTER_NAME}", "log_group_name": "/aws/containerinsights/${ROSA_CLUSTER_NAME}/prometheus", "prometheus_config_path": "/etc/prometheusconfig/prometheus.yaml", "emf_processor": { "metric_declaration": [ {"source_labels": ["job", "resource"], "label_matcher": "^kubernetes-apiservers;(services|daemonsets.apps|deployments.apps|configmaps|endpoints|secrets|serviceaccounts|replicasets.apps)", "dimensions": [["ClusterName","Service","resource"]], "metric_selectors": [ "^etcd_object_counts$" ] }, {"source_labels": ["job", "name"], "label_matcher": "^kubernetes-apiservers;APIServiceRegistrationController$", "dimensions": [["ClusterName","Service","name"]], "metric_selectors": [ "^workqueue_depth$", "^workqueue_adds_total$", "^workqueue_retries_total$" ] }, {"source_labels": ["job","code"], "label_matcher": "^kubernetes-apiservers;2[0-9]{2}$", "dimensions": [["ClusterName","Service","code"]], "metric_selectors": [ "^apiserver_request_total$" ] }, {"source_labels": ["job"], "label_matcher": "^kubernetes-apiservers", "dimensions": [["ClusterName","Service"]], "metric_selectors": [ "^apiserver_request_total$" ] } ] } } }, "force_flush_interval": 5 } } EOF
Create a ConfigMap for the Prometheus scrape config:
cat << EOF | oc apply -f - apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config namespace: amazon-cloudwatch data: # prometheus config prometheus.yaml: | global: scrape_interval: 1m scrape_timeout: 10s scrape_configs: - job_name: 'kubernetes-apiservers' kubernetes_sd_configs: - role: endpoints namespaces: names: - default scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] action: keep regex: kubernetes;https - action: replace source_labels: - __meta_kubernetes_namespace target_label: Namespace - action: replace source_labels: - __meta_kubernetes_service_name target_label: Service EOF
Create a service account for the CloudWatch Agent to use and annotate it with the IAM role we created earlier:
cat << EOF | oc apply -f - apiVersion: v1 kind: ServiceAccount metadata: name: cwagent-prometheus namespace: amazon-cloudwatch annotations: eks.amazonaws.com/role-arn: "${ROLE_ARN}" EOF
Create a cluster role and role binding for the service account:
cat << EOF | oc apply -f - apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: cwagent-prometheus-role rules: - apiGroups: [""] resources: - nodes - nodes/proxy - services - endpoints - pods verbs: ["get", "list", "watch"] - apiGroups: - extensions resources: - ingresses verbs: ["get", "list", "watch"] - nonResourceURLs: ["/metrics"] verbs: ["get"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: cwagent-prometheus-role-binding subjects: - kind: ServiceAccount name: cwagent-prometheus namespace: amazon-cloudwatch roleRef: kind: ClusterRole name: cwagent-prometheus-role apiGroup: rbac.authorization.k8s.io EOF
Allow the CloudWatch Agent to run with the
anyuid
security context constraint:oc -n amazon-cloudwatch adm policy add-scc-to-user anyuid -z cwagent-prometheus
Deploy the CloudWatch Agent pod:
cat << EOF | oc apply -f - apiVersion: apps/v1 kind: Deployment metadata: name: cwagent-prometheus namespace: amazon-cloudwatch spec: replicas: 1 selector: matchLabels: app: cwagent-prometheus template: metadata: labels: app: cwagent-prometheus spec: containers: - name: cloudwatch-agent image: amazon/cloudwatch-agent:1.300040.0b650 imagePullPolicy: Always resources: limits: cpu: 1000m memory: 1000Mi requests: cpu: 200m memory: 200Mi env: - name: CI_VERSION value: "k8s/1.3.23" - name: RUN_WITH_IRSA value: "True" volumeMounts: - name: prometheus-cwagentconfig mountPath: /etc/cwagentconfig - name: prometheus-config mountPath: /etc/prometheusconfig volumes: - name: prometheus-cwagentconfig configMap: name: prometheus-cwagentconfig - name: prometheus-config configMap: name: prometheus-config terminationGracePeriodSeconds: 60 serviceAccountName: cwagent-prometheus EOF
Verify the CloudWatch Agent pod is
Running
:oc get pods -n amazon-cloudwatch
Example output
NAME READY STATUS RESTARTS AGE cwagent-prometheus-54cd498c9c-btmjm 1/1 Running 0 60m
Create Sample Dashboard in AWS CloudWatch
Download the Sample Dashboard
wget -O ${SCRATCH}/dashboard.json https://raw.githubusercontent.com/rh-mobb/documentation/main/content/rosa/metrics-to-cloudwatch-agent/dashboard.json
Update the Sample Dashboard
sed -i .bak "s/__CLUSTER_NAME__/${ROSA_CLUSTER_NAME}/g" ${SCRATCH}/dashboard.json sed -i .bak "s/__REGION_NAME__/${REGION}/g" ${SCRATCH}/dashboard.json
Create a Dashboard, and name it “Kubernetes API Server”
Click Actions and View/edit source
Run the following command and copy the JSON output into the text area:
cat ${SCRATCH}/dashboard.json
After 5-10 minutes, view the dashboard and see the data flowing into CloudWatch: