Red Hat blog

Observability at the Edge with Red Hat Advanced Cluster Management for Kubernetes

October 27, 2022Joydeep Banerjee, Scott Berens

As Edge devices proliferate and the needs of the AppSRE and admin teams grow to accommodate these devices, new challenges arise with gaining visibility into the health of these environments. As the scale grows, so does the complexity of the job to administer and view holistically from data center to the edge.

First, let's start with a bit of table-setting

With Red Hat Advanced Cluster Management for Kubernetes (RHACM) version 2.4 and later, Red Hat provides centralized observability of the fleet, which is primarily focused on displaying cluster health metrics that can readily describe control plane health, cluster optimization, and cluster utilization. For example, admins can see API latency across the fleet and compare clusters for CPU/memory under utilization.

In addition, alerts are configured for centralized management, ensuring that responders are engaged directly in the tools they are expecting, such as Slack and PagerDuty. Specific alert rules can be put in place to ensure only critical alerts fire into appropriate channels.

These capabilities provide the starting point for your Observability experience in RHACM, and while the starting point is robust and feature-full, many customers have asked for the ability to build their own dashboards (we provided), customize the allowList for cluster metrics (we provided), and expose Service Level Objective (we provided). We would be remiss to mention that the capabilities also extend to OpenShift 3.11 and OCP 4.x, along with Amazon EKS, Google Cloud GKE, Microsoft Azure AKS and IBM Cloud IKS.

We didn't want to stop there

Customers are moving workloads closer to their users, and taking advantage of edge computing to enrich their customers' experiences, and drive higher satisfaction. Our customers also started to narrow in on a specific use case that involved the need to monitor a single node OpenShift (SNO) cluster that was driving a highly specialized container workload, and desired to do all of that with one single monitoring instance. See Meet single node OpenShift for more information.

Edge devices are generally resource constrained and do not have the same access to elastic compute and memory resources that might be available, for example, in the public or private cloud model. With that in mind, customers do not want to sacrifice any additional compute and memory towards the infrastructure operators running within the cluster. We now offer the ability to surface both the platform and user workload metrics with one monitoring stack.

Taking note of this requirement, RHACM and OpenShift Monitoring worked together to ensure our customers have the ability to monitor their workload on single node OpenShift by leveraging the on-cluster Prometheus, and a ServiceMonitor to push the necessary workload metrics through the platform. Admins and developers can take advantage of the custom allow list to centrally collect this metric at the hub, leverage PromQL to explore the metric, and even make use of a sandbox Grafana environment to build custom visualizations.

But wait, there's more!

Starting with the release of RHACM 2.5, the following features were introduced to further enrich the edge observability experience:

Dynamic metrics for SNO clusters: Dynamic metrics collection supports automatic metric collection based on certain conditions. By default, a SNO cluster does not collect pod and container resource metrics. Once a SNO cluster reaches a specific level of resource consumption, the defined granular metrics are collected dynamically. When the cluster resource consumption is consistently less than the threshold for a period of time, granular metric collection stops.
Export metrics to external endpoints: Export monitoring metrics into existing corporate monitoring tools. By exporting Kubernetes cluster metrics from RHACM into streaming platforms like Kafka, and combining them with other system metrics such as event management logs, audit logs, and SNMP logs, customers get a complete picture for security and troubleshooting applications.
Arm (ARM, aarch64, Advanced RISC Machines) support: Did we mention you can do it all for Arm64 managed clusters, too? Even the hub is capable of running on Arm, further enabling the management reach into lower power and lower cost consumption models.

What's next?

As we continue to iterate on these features and fine tune the capacity and scale aspects, expect us to deliver enriched capabilities that help admins and developers to establish "right sized" requests and limits for their applications, further ensuring that edge workloads only use what they need in these ever more constrained environments. Monitoring for Hosted Control Planes (HyperShift) and additional deliverables in User Workload Monitoring will help you and your teams successfully observe and manage the platforms, and applications, in your growing estate of clusters.

Platform products

Try & buy

Featured cloud services

By category

By organization type

By customer

Services

Training & certification

Featured

Topics

Articles

More to explore

For customers

For partners

About us

Open source

Company details

Communities

Recommendations

Select a language

Select a language

Observability at the Edge with Red Hat Advanced Cluster Management for Kubernetes

First, let's start with a bit of table-setting

We didn't want to stop there

But wait, there's more!

What's next?

About the authors

Joydeep Banerjee

Scott Berens

4 use cases for AI in cyber security

Connect hybrid cloud Kubernetes with F5 multicloud networking and Red Hat OpenShift for optimized security footprints

Deploying SAS Viya on HPE GreenLake and Red Hat OpenShift

Products

Tools

Try, buy, & sell

Communicate

About Red Hat

Select a language

Red Hat legal and privacy links

Red Hat legal and privacy links