Introduction:

This document captures OpenShift Data Foundation (ODF) RADOS block device (RBD) performance characteristics when used as back-end storage on local disks and consumed by Container-Native Virtualization (CNV) virtual machines.

In this document, I will demonstrate what kind of trade-off is expected and the impact on the storage effective capacity with various workloads pending the number of cores per OSD with two of the most popular compression algorithms, Snappy and ZSTD.

Terminology:

  • Hyper-converged infrastructure - a software-defined, unified system that combines all the elements of a traditional data center: storage, compute, networking, and management.

  • ODF - OpenShift Data Foundation, running as a Kubernetes service, ODF is a provider of persistent storage for OpenShift.

  • OSD - object storage daemon for the ODF distributed file system. It is responsible for storing objects on a local file system and providing access to them over the network., When deploying ODF, each physical disk used within ODF will be managed

  • RBD - RADOS block device -  simple block devices that are striped over objects and stored in a RADOS object store

The trade-off:

Data compression is a trade-off of performance for storage capacity. The job will still get done; it will only take longer to get there. Sacrificing CPU for compression might result in lower overall performance, depending on the type of workload but with a more effective disk utilization.

Why use compression:


When the storage capacity in a cluster is starting to run out, things become a bit more tricky, because physical disks will have to either be added by one this options:

  1. Add (if there are open slots) or replace them with a larger capacity, which usually involves downtime.

  2. Add new nodes to the cluster which will not involve any downtime but is more costly.

  3. When data has to be moved across the network compressed data has great potential to increase the transfer rate and reduce bandwidth usage across the network.

The defaults:

When deploying ODF on OpenShift, OSDs will get deployed with a default value of 2 cores per OSD. There is a common misconception that ODF OSD’s should remain at that limit in order to save those cores for other processes.  

Note that for ODF to use the additional OSD cores, one should have disks that will support that increase in throughput/IOPS.

Hardware:

Nodes hardware:

Dell R740xD.
RAM - 12 * 16GB DDR4 2666 MT/s (192 GiB total).

DISK -  Dell Express Flash PM1725a 6.4TB AIC  (partitioned into 4 partitions)

CPU - Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz.

 

VM configuration

CPU - 8 core.

RAM - 16 GiB.

 

ODF config
4 OSD’s per node

Replica = 2

38TB XFS filesystem.

Compression algorithms:

BlueStore OSD backend allow us to use  "on-the-fly data compression".
Compression can be enabled or disabled on each Ceph pool created by ODF.

The compression algorithm and modes can be changed at all time, however any data that got written to the pool prior to compression enablement will not be compressed

Snappy used settings:

ceph osd pool set ocs-storagecluster-cephblockpool compression_required_ratio .875
ceph osd pool set ocs-storagecluster-cephblockpool compression_algorithm snappy
ceph osd pool set ocs-storagecluster-cephblockpool compression_mode aggressive
ceph osd pool set ocs-storagecluster-cephblockpool compression_min_blob_size 4096
ceph osd pool set ocs-storagecluster-cephblockpool compression_max_blob_size 131072

ZSTD used settings:

ceph osd pool set ocs-storagecluster-cephblockpool compression_required_ratio .875
ceph osd pool set ocs-storagecluster-cephblockpool compression_algorithm zstd
ceph osd pool set ocs-storagecluster-cephblockpool compression_mode aggressive
ceph osd pool set ocs-storagecluster-cephblockpool compression_min_blob_size 4096
ceph osd pool set ocs-storagecluster-cephblockpool compression_max_blob_size 131072

Snappy

Snappy, an algorithm formerly known as Zippy, is based on the LZ77 compression algorithm. It is a popular compression algorithm used by various databases, such as MariaDB, Hadoop, InfluxDB, and more. Though it offers relatively low compression, its impact on performance is also light-medium and can be completely negated with additional CPU. Its additional downside is that it is not doing a really good job with small blocks.

ZSTD

Zstandard compression offers excellent compression, but with a heavy cost in performance. It is less effective with blocks smaller than 32KiB, but the performance impact is correspondingly less.

Testing overview:

Testing setup:

  • Running on a 6 node BareMetal cluster - 3 master plus 3 worker nodes
  • Masters are schedulable.
  • Using 24 VM’s - 4 per node and simulating various workloads
  • The workload is running simultaneously on all 24 VM’s.
  • Dataset size Is 24 * 120GB = 1.88 TB.
  • The generated IO is 50% compressible; more info about the workload that I am running and how to run it can be found HERE.

Testing flow:

I start by measuring a baseline for every workload per specific configuration, which meant:

  1. Running the workload with compression disabled on the Ceph RBD pool.
  2. Rerunning the workload with compression enabled.

After each test, I deleted all VMs and started the next test with “clean” VMs, after making sure Ceph deleted all the previous related objects.

 The testing duration is limited by the amount of generated data.  The test ended once 100 GiB of total data was generated per VM.

Testing metrics:

  • Latency - Latency (ms) difference in percentage; higher value is better
  • IOPS - the difference in percentage; higher value is better.
  • Capacity saved - disk capacity saved in percentage;  higher value is better.

Snappy results:

2 Cores per OSD:

The Snappy algorithm offers the least compression with 16-23% of saved capacity, but it also has the lower performance impact with  12-38% performance degradation depending on the application:

Application type

Latency

IOPS

Capacity Saved

OLTP1

-11.20%

-12.05%

16.00%

OLTP2

-34.08%

-30.96%

20.00%

OLTPHW

-29.37%

-29.14%

23.08%

ODSS2

-36.16%

-38.38%

23.08%

ODSS128

-37.98%

-38.16%

23.08%

3 Cores per OSD:

With 3 cores per OSD, the performance penalty is gone for patterns that use mostly small blocks ( =< 8KB) . And the additional unused CPU is boosting the performance by 5-33%, while patterns that use big blocks ( > 32KB)  are still showing 7% degradation:

Application type

Latency (ms)

IOPS

Capacity Saved

OLTP1

37.39%

+33.62%

16.00%

OLTP2

-0.77%

+3.33%

20.00%

OLTPHW

6.18%

+5.63%

23.08%

ODSS2

-0.52%

-7.59%

23.08%

ODSS128

-4.10%

-7.13%

23.08%

4 Cores per OSD:
With 4 cores per OSD, the performance penalty is eliminated for all patterns, and again the additional CPU boosts performance even further with 21-74% performance gains:.

Application type

Latency (ms)

IOPS

Capacity Saved

OLTP1

81.44%

+74.40%

16.00%

OLTP2

29.88%

+34.69%

20.00%

OLTPHW

40.67%

+38.96%

23.08%

ODSS2

26.27%

+21.34%

23.08%

ODSS128

23.09%

+23.21%

23.08%

ZSTD results:

2 Cores per OSD

The ZSTD  algorithm offers the best compression with 36-50% of saved capacity, but it also has a hefty performance impact with 21-66% performance degradation depending on the application:

Application type

Latency (ms)

IOPS

Capacity Saved

OLTP1

-20.09%

-21.07%

36.00%

OLTP2

-62.47%

-61.01%

44.00%

OLTPHW

-63.05%

-61.80%

50.00%

ODSS2

-60.17%

-64.31%

50.00%

ODSS128

-64.67%

-65.97%

50.00%

3 Cores per OSD

With 3 cores per OSD, the performance penalty is gone for OLTP1  since it uses small blocks ( 4KB). And the additional unused CPU is boosting the performance by 18%, while the other patterns’ performance impact is reduced to 41-49% down from 61-65% than with 2 cores.

Application type

Latency (ms)

IOPS

Capacity Saved

OLTP1

21.97%

+18.64%

36.00%

OLTP2

-44.98%

-41.89%

44.00%

OLTPHW

-44.39%

-42.77%

50.00%

ODSS2

-41.29%

-47.71%

50.00%

ODSS128

-49.61%

-49.39%

50.00%

4 Cores per OSD

With 4 cores per OSD, OLTP1 gains 59%, while the other workloads penalty reduced further to 21-32%:

Application type

Latency (ms)

IOPS

Capacity Saved

OLTP1

65.76%

+59.58%

36.00%

OLTP2

-26.59%

-21.88%

44.00%

OLTPHW

-25.24%

-23.26%

50.00%

ODSS2

-24.16%

-31.13%

50.00%

ODSS128

-32.16%

-32.64%

50.00%

Conclusion:

The best strategy will always be differentiating different data types to their respective  pools.

Latency sensitive applications can be placed on pools without compression.

However, if that data can be efficiently compressed, it might be a good option to increase the number of cores per OSD to compress the data and keep a minimum impact on the performance.

Even if some applications are not as storage-consuming as others, it is always a good idea to understand how compressible the data is and what the expected capacity growth over time is in order to plan ahead.

When data has to be moved across the network compressed data has great potential to increase the transfer rate and reduce bandwidth usage across the network.