Observability for Istio Multicluster Service Mesh in Red Hat Advanced Cluster Management for Kubernetes

July 19, 2022Morven Cao

In my previous blog, Set up an Istio Multicluster Service Mesh with Submariner in Red Hat Advanced Cluster Management for Kubernetes, I discussed Istio multicluster service mesh with a central control plane, how it could be set up with Submariner in Red Hat Advanced Cluster Management for Kubernetes (RHACM), and how to build a central management entrance for the whole mesh.

With Istio multicluster service mesh in RHACM, Istio can be extended. Given that user workloads are injected into Istio sidecars, the Istio sidecars are network proxies that intercept the traffic between the services within a service mesh, and generate detailed telemetry data or golden signals about the service behavior, eg. service latency, traffic, errors, etc. If this data can be gathered, then you can create a global view of the whole mesh. The problem is that this data is located in different managed clusters, so how can we centralize this data?

That's exactly what the metrics-collector does. Metrics-collector is an important component of Observability service in RHACM, which is deployed into each managed clusters that have the observability add-on enbled, and used to collect metrics from local Prometheus instance with configured interval and push back to Thanos in hub cluster.

By leveraging the observability stack in RHACM, you can create an end-to-end view of traffic flow and monitoring for all services in the whole service mesh across multiple managed clusters, which empowers operators to troubleshoot, maintain, and optimize their applications. Even better, you can get almost all of this instrumentation without requiring application changes.

Before moving forward with the installation of this blog, let's first take a look at the architecture of observability for the multicluster service mesh:

Prerequisites

Make sure you read the previous blog and set up a multicluster service mesh by following the instructions in that blog.

You also need to follow these instructions to enable the RHACM Observability service before you begin the installation.

Installation

After you enable the observability service in RHACM, Thanos stack and Grafana are deployed on the hub cluster, metrics-collector is deployed in each managed cluster in the service mesh. Verify that the observability service is enabled by using the following commands:

$ oc --context=${CTX_HUB_CLUSTER} -n open-cluster-management-observability get sts,deployment
NAME                                            READY   AGE
observability-alertmanager                      3/3     20m
observability-grafana                           1/1     20m
observability-thanos-compact                    1/1     20m
observability-thanos-query-frontend-memcached   3/3     20m
observability-thanos-receive-default            3/3     20m
observability-thanos-rule                       3/3     20m
observability-thanos-store-memcached            3/3     20m
observability-thanos-store-shard-0              1/1     20m
observability-thanos-store-shard-1              1/1     20m
observability-thanos-store-shard-2              1/1     20m
$ oc --context=${CTX_MC1_CLUSTER} -n open-cluster-management-addon-observability get pod -l component=metrics-collector
NAME                                           READY   STATUS    RESTARTS   AGE
metrics-collector-deployment-9496686fc-q9v87   1/1     Running   0          20m
$ oc --context=${CTX_MC2_CLUSTER} -n open-cluster-management-addon-observability get pod -l component=metrics-collector
NAME                                            READY   STATUS    RESTARTS   AGE
metrics-collector-deployment-765f486b47-dnfs8   1/1     Running   0          20m

Now, let's begin the installation for this blog.

Install Istio Add-on

In this step, install the Istio add-ons onto the hub cluster so that you can get central view of different aspects of the multicluster service mesh. Given that the observability service in RHACM installs Grafana in hub cluster, it can be reused to visualize service mesh metrics. Install Jaeger and Kiali. Jaeger is a distributed tracing system to monitor and troubleshoot application transactions across clusters, while Kiali is a console for Istio service mesh to manage, visualize, validate and troubleshoot the service mesh by monitoring traffic flow to infer the topology and report errors. Complete the following steps:

Deploy Jaeger and Kiali on the hub cluster and create Openshift Route resources for the services, so that they can be accessed externally. You also need to export the Jaeger service with ServiceExport so that it can be accessed from other managed clusters:

oc --context=${CTX_HUB_CLUSTER} -n istio-system apply \
  -f https://raw.githubusercontent.com/istio/istio/release-1.11/samples/addons/jaeger.yaml
oc --context=${CTX_HUB_CLUSTER} -n istio-system apply \
  -f https://raw.githubusercontent.com/istio/istio/release-1.11/samples/addons/kiali.yaml
oc --context=${CTX_HUB_CLUSTER} -n istio-system expose svc/tracing --port http-query
oc --context=${CTX_HUB_CLUSTER} -n istio-system expose svc/kiali --port http
cat << EOF | oc --context=${CTX_HUB_CLUSTER} apply -n istio-system -f -
apiVersion: multicluster.x-k8s.io/v1alpha1
kind: ServiceExport
metadata:
  name: zipkin
  namespace: istio-system
EOF

Verify that the pod and service for Jaeger and Kiali are up and running:

$ oc --context=${CTX_HUB_CLUSTER} -n istio-system get pod,svc
NAME                                        READY   STATUS    RESTARTS   AGE
pod/istio-ingressgateway-86464c97f5-tp7mr   1/1     Running   0          15m
pod/istiod-98d586c48-zgt46                  1/1     Running   0          15m
pod/jaeger-5d44bc5c5d-wrx2j                 1/1     Running   0          1m
pod/kiali-fd9f88575-xn8zz                   1/1     Running   0          1m

NAME                           TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                                      AGE
service/istio-ingressgateway   LoadBalancer   172.30.251.10    <pending>     15021:32440/TCP,80:31976/TCP,443:32119/TCP   15m
service/istiod                 ClusterIP      172.30.93.244    <none>        15010/TCP,15012/TCP,443/TCP,15014/TCP        15m
service/jaeger-collector       ClusterIP      172.30.205.229   <none>        14268/TCP,14250/TCP,9411/TCP                 1m
service/kiali                  ClusterIP      172.30.140.130   <none>        20001/TCP,9090/TCP                           1m
service/tracing                ClusterIP      172.30.23.103    <none>        80/TCP,16685/TCP                             1m
service/zipkin                 ClusterIP      172.30.75.236    <none>        9411/TCP                                     1m

Enable OpenShift Monitoring for Istio Traffic

By default, Istio sidecars generate the metrics about the application traffic, but the metrics data is not scraped. In order to make OpenShift monitoring to scrape the metrics, you need to create extra RBAC and podMonitor resources with the following steps:

Create the following RHACM policy on the hub cluster. The policy creates extra RBAC and podMonitor resources to scrape the metrics in each managed cluster. Run the following command:
```
oc create ns mcsm
oc -n mcsm apply -f https://raw.githubusercontent.com/morvencao/mcsm/master/mcsm-policy/mcsm-monitoring-policy.yaml
```
By default, the Istio sidecar container doesn't have the http-monitoring port, which is used to serve Istio metrics. You need to edit the istio-sidecar-injector ConfigMap in the istio-system namespace of hub cluster to add the http-monitoring port for Istio sidecars. Add the following containerPort to the istio-proxy container template in istio-sidecar-injector ConfigMap:
```
oc --context=${CTX_HUB_CLUSTER} -n istio-system edit cm istio-sidecar-injector
...
          containers:
          - name: istio-proxy
...
            ports:
            # add the following section in ports
            - containerPort: 15020
              protocol: TCP
              name: http-monitoring
            - containerPort: 15090
              protocol: TCP
              name: http-envoy-prom
```
Note: There are two places that contain the istio-proxy container template in the istio-sidecar-injector ConfigMap. Make sure they are both added to the container port.
Restart the application pods in each managed cluster with the following command, so that the new istio-proxy container configuration can be injected:
```
oc --context=${CTX_MC1_CLUSTER} -n istio-apps delete pod --all
oc --context=${CTX_MC2_CLUSTER} -n istio-apps delete pod --all
```

Enable Metrics Collector to Collect Istio Metrics

In order to make metrics collector collect the Istio metrics and push back to Thanos in hub cluster, you need to create a ConfigMap that contains custom metrics allow list in the hub cluster by using the following command:

cat << EOF | oc --context=${CTX_HUB_CLUSTER} apply -n open-cluster-management-observability -f -
kind: ConfigMap
apiVersion: v1
metadata:
  name: observability-metrics-custom-allowlist
data:
  metrics_list.yaml: |
    names:
    - istio_request_bytes_bucket
    - istio_request_bytes_count
    - istio_request_bytes_sum
    - istio_request_duration_milliseconds_bucket
    - istio_request_duration_milliseconds_count
    - istio_request_duration_milliseconds_sum
    - istio_requests_total
    - istio_response_bytes_bucket
    - istio_response_bytes_count
    - istio_response_bytes_sum
    - istio_tcp_connections_closed_total
    - istio_tcp_connections_opened_total
    - istio_tcp_received_bytes_total
    - istio_tcp_sent_bytes_total
EOF

Create Grafana Dashboards to Visualize Istio Metrics

Given that there is a focus on the metrics of Istio applications, create the following three dashboards:

Istio Mesh Dashboard - This dashboard gives the global view of the mesh along with services and workloads in the mesh.
Istio Service Dashboard - This dashboard gives details about metrics for the service.
Istio Workload Dashboard - This dashboard gives details about metrics for each workload.

Complete the following steps to create Grafana dashboards:

Create the three Grafana dashboards on the hub cluster to visualize different aspects of the service mesh by using the following command:

oc --context=${CTX_HUB_CLUSTER} apply -n open-cluster-management-observability \
  -f https://gist.githubusercontent.com/morvencao/59183c082a855ee5e6fa81e3bee1b26b/raw/4fc4ae87e7c9d67d0f4591b7fb69c2f1c0cef062/istio-mesh-grafana-dashboards.yaml

Retrieve the Grafana console address with the following command:

MULTICLOUD_CONSOLE=$(oc --context=${CTX_HUB_CLUSTER} -n open-cluster-management get route multicloud-console -o jsonpath="{.spec.host}")
echo "https://${MULTICLOUD_CONSOLE}/grafana/dashboards"

Verify the Grafana dashboards are created successfully by accessing the address returned in the previous step, it should look similar to the following diagram:

Update Kiali Configuration

To make Kiali render the mesh graph with metrics data from Thanos, update the Kiali configuration by using the following command:

cat << EOF | oc --context=${CTX_HUB_CLUSTER} apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    app: kiali
    app.kubernetes.io/instance: kiali
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: kiali
    app.kubernetes.io/part-of: kiali
    app.kubernetes.io/version: v1.38.0
    helm.sh/chart: kiali-server-1.38.0
    version: v1.38.0
  name: kiali
  namespace: istio-system
data:
  config.yaml: |
    auth:
      openid: {}
      openshift:
        client_id_prefix: kiali
      strategy: anonymous
    deployment:
      accessible_namespaces:
      - '**'
      additional_service_yaml: {}
      affinity:
        node: {}
        pod: {}
        pod_anti: {}
      hpa:
        api_version: autoscaling/v2beta2
        spec: {}
      image_name: quay.io/kiali/kiali
      image_pull_policy: Always
      image_pull_secrets: []
      image_version: v1.38
      ingress_enabled: false
      instance_name: kiali
      logger:
        log_format: text
        log_level: info
        sampler_rate: "1"
        time_field_format: 2006-01-02T15:04:05Z07:00
      namespace: istio-system
      node_selector: {}
      override_ingress_yaml:
        metadata: {}
      pod_annotations:
        sidecar.istio.io/inject: "false"
      pod_labels: {}
      priority_class_name: ""
      replicas: 1
      resources: {}
      secret_name: kiali
      service_annotations: {}
      service_type: ""
      tolerations: []
      version_label: v1.38.0
      view_only_mode: false
    external_services:
      prometheus:
        thanos_proxy:
          enabled: true
        url: "http://observability-thanos-query-frontend.open-cluster-management-observability.svc.cluster.local:9090"
      grafana:
        in_cluster_url: "http://grafana.open-cluster-management-observability.svc.cluster.local:3001"
        url: "http://grafana.open-cluster-management-observability.svc.cluster.local:3001"
      tracing:
        in_cluster_url: "http://tracing.istio-system:16685/jaeger"
      custom_dashboards:
        enabled: true
    identity:
      cert_file: ""
      private_key_file: ""
    istio_namespace: istio-system
    login_token:
      signing_key: CHANGEME
    server:
      metrics_enabled: true
      metrics_port: 9090
      port: 20001
      web_root: /kiali
EOF

Then restart the Kiali pod for the new configuration to take effect:

oc --context=${CTX_HUB_CLUSTER} -n istio-system delete pod -l app=kiali

Visualizing Service Mesh Metrics with Grafana

Send traffic to the Bookinfo application to have the metrics available in Grafana. Refresh the bookinfo application page a few times or run the following command to generate a small amount of traffic:

$ export GATEWAY_URL=$(oc --context=${CTX_HUB_CLUSTER} \
    -n istio-system get route istio-ingressgateway \
    -o jsonpath="{.spec.host}")
$ for i in $(seq 1 100); do curl -s -o /dev/null http://${GATEWAY_URL}/productpage"; done

Refresh Grafana dashboards page again with the following address, https://${MULTICLOUD_CONSOLE}/grafana/dashboards. Click Istio Mesh Dashboard to enter the global view of the service mesh along with services and workloads in the mesh. View the following screen capture:
To view details about services and workloads navigate to their specific dashboards. For example, to view the metric details for the service, client workloads (workloads that are calling this service), and service workloads (workloads that are providing this service) for that service, navigate to Istio Service Dashboard. View the following screen capture:
To view the metric details for the workload, inbound workloads (workloads that are sending request to this workload) and outbound workloads (workloads to which this workload send requests) for that workload, navigate to Istio Workload Dashboard. It should appear similarly as the following screen capture:

Verifying Service Mesh Traces with Jaeger

Send traffic to Bookinfo application to verify traces in the Jaeger console. To view a trace, you need to send more requests. The number of requests depends on the Istio sampling rate. The default sampling rate is 1%, which means you have to send at least 100 requests before the first tracing is visible. To send a 100 requests to the Bookinfo application, run the following command:
```
for i in $(seq 1 100); do curl -s -o /dev/null http://${GATEWAY_URL}/productpage"; done
```

Access the Jaeger console with the following address by from your browser:

JAEGER_HOST=$(oc --context=${CTX_HUB_CLUSTER} -n istio-system get route tracing -o jsonpath="{.spec.host}")
echo "http://${JAEGER_HOST}"

From navigation panel of the page, select productpage.istio-apps from the Service drop-down list and click Find Traces. View the following screen capture:
Click one of the found traces to view the details corresponding to the request to the Bookinfo application:
The whole trace is composed of a set of spans, where each span corresponds to a Bookinfo service that is invoked during the start of request.

Visualizing Service Mesh Graph with Kiali

In this section, use Kiali to view the service graph of the entire mesh across clusters. The graph represents traffic flowing through the service mesh for a period of time. It is generated using Istio traffic metrics. Complete the following steps:

Send traffic to the Bookinfo application to create enough traffic metrics for the Kiali graph. Refresh the Bookinfo application page a few times or the following command to generate a small amount of traffic:
```
for i in $(seq 1 100); do curl -s -o /dev/null http://${GATEWAY_URL}/productpage"; done
```

Access the Kiali console with the following address from your browser:

KIALI_HOST=$(oc --context=${CTX_HUB_CLUSTER} -n istio-system get route kiali -o jsonpath="{.spec.host}")
echo "http://${KIALI_HOST}"

To view a namespace graph, select the Graph option in the navigation menu and then select istio-apps from the Namespace drop-down menu. If Empty Graph is shown, click the Display Idle Nodes button and make sure to select the correct time period, then the Bookinfo graph should appear. The page should look similar to the following screen capture:
To view service mesh using different graph types, select a graph type from the Graph Type drop-down menu. To view which cluster each service is deployed in, select the Cluster Boxes from the Display drop-down menu. The page should look similar to the following image:

As you can notice from previous diagram, the graph represents traffic flowing through the Bookinfo application during a selected time period and the cluster boxes represent which cluster is running for each Bookinfo service.

Summary

By leveraging Istio add-ons (Jaeger and Kiali) and the observability service in RHACM, you can get global views of different aspects of the whole multicluster service mesh, which empowers operators to troubleshoot, maintain, and optimize their applications.

About the author

Morven Cao

Browse by channel

Explore all channels

Platform products

Try & buy

Featured cloud services

By category

By organization type

By customer

Featured

Topics

Articles

More to explore

For customers

For partners

About us

Open source

Company details

Communities

Recommendations

Select a language

Select a language

Observability for Istio Multicluster Service Mesh in Red Hat Advanced Cluster Management for Kubernetes

Prerequisites

Installation

Install Istio Add-on

Enable OpenShift Monitoring for Istio Traffic

Enable Metrics Collector to Collect Istio Metrics

Create Grafana Dashboards to Visualize Istio Metrics

Update Kiali Configuration

Visualizing Service Mesh Metrics with Grafana

Verifying Service Mesh Traces with Jaeger

Visualizing Service Mesh Graph with Kiali

Summary

About the author

Morven Cao

More like this

Browse by channel

Products

Tools

Try, buy, & sell

Communicate

About Red Hat

Select a language

Red Hat legal and privacy links

Red Hat legal and privacy links