Red Hat blog

Workload Characterization: OpenShift Metrics with OpenShift Container Storage

April 23, 2020Christopher Blum

Application Introduction

Prometheus is a free software application used for event monitoring and alerting. It records real-time metrics in a time series database built using a HTTP pull model, with flexible queries and real-time alerting. (Source: Wikipedia)

The main positive aspects of this software are that it is able to store metrics very efficiently and that it is very easy to run and maintain, even in large deployments. With the new PromQL language, it is in addition equally powerful as InfluxDB.

Prometheus is not meant for long-term storage though and other projects are able to collect the Prometheus data via the so-called remote-writer functionality.

Prometheus Deployment Options and Trade-offs

By default, OpenShift deploys two Prometheus instances, which work independently of each other and are based on EmptyDir volumes. Therefore theoretically data is not lost when one Prometheus instance is down, since it can be queried from the other instance.

The EmptyDir volumes store data for the Pod for as long as it exists on the local disk of the node. If the Pod is deleted or the Node is lost, the collected metrics are lost with it and Prometheus will start with an empty data set. It is possible to change the backend volume to a memory-based EmptyDir, which theoretically improves performance, but costs a lot more and data is lost on every OpenShift node reboot. What we saw in this test is that the performance of the memory-based EmptyDir is the same as the regular EmptyDir. The other alternative is to base the Prometheus Time Series Database (TSDB) volume on OCS-backed storage, which has similar performance characteristics as the regular EmptyDir, while the resilience is improved.

	Simple Query	Query with one PromQL Function	Multiple PromQL Functions	Summary
Test configuration	1000 queries, 100 in parallel	100 queries, 10 in parallel	600 queries, 100 in parallel
OpenShift Container Storage	Requests/sec: 65.22 Mean time per request: 15.33 ms	Requests/sec: 1.65 Mean time per request: 604.56 ms	Requests/sec: 0.77 Mean time per request: 1,294.13 ms	Performance: 👍 Resilience: 👍👍👍 Cost: 👍
EmptyDir	Requests/sec: 71.41 Mean time per request: 14 ms	Requests/sec: 1.76 Mean time per request: 569.56 ms	Requests/sec: 0.86 Mean time per request: 1,165.30 ms	Performance: 👍👍 Resilience: 👍 Cost: 👍👍
EmptyDir based on ramdisk	Requests/sec: 70.68 Mean time per request: 14.15 ms	Requests/sec: 1.69 Mean time per request: 590.02 ms	Requests/sec: 0.83 Mean time per request: 1,209.35 ms	Performance: 👍👍 Resilience: 👎 Cost: 👎

Key Measures of Perf and Resilience for Prometheus

We captured the following key measures of performance and resilience to inform this brief:

Query performance with a simple query of node_load1
Query performance with one PromQL function
Query performance with a complex interaction of multiple PromQL functions

Workload Benchmarking Results Summary

Key observations of Prometheus performance.

Appendix

Benchmark Overview

For the automatic provisioning of Prometheus we used three different deployments that each deploy a single Prometheus instance with the different storage backends.

To measure the performance of the Prometheus TSDB, we used the ApacheBench software. This software was developed to measure the performance of websites and has many useful features for us. Since Prometheus has an HTTP API, we point ApacheBench to prepared URLs which trigger a TSDB lookup. ApacheBench will then tell us how long each lookup took. To decrease any effects by networking, we ran ApacheBench on Pods in the same OpenShift cluster and connected to Prometheus via the OpenShift service address.

For every database there are some queries that are simpler to run and some that are harder to run. For our test run we prepared three different queries and asked Prometheus to give us all metrics in a 9 day time window. We had to increase the sample size to 10 minutes to be below the Prometheus data point limit that a single query could return. Instead, we increased the total count of queries and the number of parallel queries to stress Prometheus and the underlying storage enough.

Benchmark Environment Summary

Software

OCP Version	v4.2
OCP Infra	VMware
Master Nodes	3 x
Compute nodes	3 x 16 vCPU & 64GB RAM
OCS Storage Nodes	3 x 16vCPU & 64GB RAM
OCS Storage Devices	3 x 1 TB vSAN based PVCs on NVMes
OCS Version	v4.2

Table 1 : OCP and OCS Infra Details

Prometheus version	2.14.0 (Container image prom/prometheus:latest)
ApacheBench version	2.3 (Container image jordi/ab)

Table 2: Deployed versions details

Measurements:

Raw material available here: https://gist.github.com/mulbc/33d25cfd3b31fff307c7ce23352f1efd

Additional Resources

OpenShift Container Storage: openshift.com/storage
OpenShift | Storage YouTube Playlist
OpenShift Commons ‘All Things Data’ YouTube Playlist

Feedback

To find out more about OpenShift Container Storage or to take a test drive, visit https://www.openshift.com/products/container-storage/.

If you would like to learn more about what the OpenShift Container Storage team is up to or provide feedback on any of the new 4.3 features, take this brief 3-minute survey.

About the author

Christopher Blum

Read full bio

Platform products

Try & buy

Featured cloud services

By category

By organization type

By customer

Services

Training & certification

Featured

Topics

Articles

More to explore

For customers

For partners

About us

Open source

Company details

Communities

Recommendations

Select a language

Select a language

Workload Characterization: OpenShift Metrics with OpenShift Container Storage

Application Introduction

Prometheus Deployment Options and Trade-offs

Key Measures of Perf and Resilience for Prometheus

Workload Benchmarking Results Summary

Appendix

Benchmark Overview

Benchmark Environment Summary

Software

Measurements:

Additional Resources

Feedback

About the author

Christopher Blum

Products

Tools

Try, buy, & sell

Communicate

About Red Hat

Select a language

Red Hat legal and privacy links

Red Hat legal and privacy links