Preface
When looking into storage performance such as Local Disk/OpenShift Container Storage on a hyper-converged infrastructure or traditional storage, these are the most common types of performance benchmarking:
- Applications performance - a mix of different IO operations that can contain a variety of block sizes
- Generic performance - a single block size that does a specific operation
Just for clarification, by IO operation, I mean read/write/mix operations with a random/sequential/cache hit IO pattern. Before approaching any type of benchmarking, we must know the capabilities of our hardware first:
Hardware
- Disk speed and number of disks, which RAID if exist (RAID penalty)
- CPU and RAM CLOCK and architectures
- How many lanes and BUS speed which will affect your overall possible bandwidth
- BIOS version/DIMM layout should be configured according to vendor best practices.
- NIC speeds and latency.
Application
- How many CPU cores does it consume while idle/peak/bursts?
- Is it NUMA aware (QPI traffic)?
- How much RAM does it consume?
- How efficient is the application itself?
- How much throughput/IOPS does it generate?
Common Example for Hardware and Application Interaction
You just upgraded your hardware to the latest gen CPUs. You got two sockets on board, and your application dropped 30% in performance. Although the sockets are much faster than they were, it is still much more expensive (up to 7x more) to do remote calls to the remote sockets on current latest architectures, so just by pinning your application to a specific NUMA node, you may greatly increase application performance and turn that -30% to a +30% (assuming CPU bottleneck).
Application Modeling
I strongly believe that modeling your application workloads and then scaling them is the best methodology to use, rather than just doing classic performance testing. In this document, I will demonstrate how I modeled different applications workloads using VDbench. That being said, that does not mean that classic workloads do not have their place. It is still a great tool to use when you need to find those corner cases.
Tooling
VDbench, similar to FIO, is a well-known IO generator within the storage community. However, VDbench has a lot of useful features that make it ideal for modeling applications. It also supports a variety of Operating Systems, which makes it a great tool for doing an apples-to-apples comparison on different Operating Systems. It is a free tool, and it can be downloaded here
Databases and Application Pattern Modeling
Different applications generate different block sizes that are running various operations. Therefore, profiling the patterns currently is a key factor in modeling your application. There are various ways to do application profiling, but it is not within the scope of this document. The table below shows the breakdown of the application patterns I am currently using. These are not tied to a specific application, but more of common ground between various databases/applications that I profiled over the years:
Database/App type |
Used in |
IO SIZE |
random/sequential |
read/write |
percentage |
OLTP1 |
Mail applications |
4KB |
random |
read hit |
10 |
4KB |
random |
read |
35 |
||
4KB |
random |
write |
35 |
||
4KB |
sequential |
read |
5 |
||
4KB |
sequential |
write |
15 |
||
OLTP2 |
Small Oracle applications Small weight transactions |
8KB |
random |
read hit |
20 |
8KB |
random |
read |
45 |
||
8KB |
random |
write |
15 |
||
64KB |
sequential |
read |
10 |
||
64KB |
sequential |
write |
10 |
||
OLTPHW |
Large Oracle applications heavyweight transactions |
8KB |
random |
read hit |
10 |
8KB |
random |
read |
35 |
||
8KB |
random |
write |
35 |
||
64KB |
sequential |
read |
5 |
||
64KB |
sequential |
write |
15 |
||
ODSS2 |
Data warehouse applications |
4KB |
random |
read |
15 |
4KB |
random |
write |
5 |
||
64KB |
sequential |
read |
70 |
||
64KB |
sequential |
write |
10 |
||
ODSS128 |
Streaming applications Backup applications |
64KB |
random |
read hit |
18 |
64KB |
random |
read |
18 |
||
64KB |
random |
write |
4 |
||
128KB |
sequential |
read |
48 |
||
128KB |
sequential |
write |
12 |
Short Disclaimer:
Note that in my modeling, I only used the main block sizes that were used.
For example, for OLTP1, I used five streams of the 4KB block, but in fact when profiling workloads such as OLTP1, you will see that there a lot more block sizes such as 8,16,28,64,120,512 KB and also fractions of blocks such as 0.2,0.44,0.68, KB and others. But the occurrence of those blocks is inconsistent and extremely varied, so to get repeatable consistent results, I am only using the main blocks in play. That also applies to any other application pattern on the table above.
The block sizes I decided to use are the majority of the workload. For example, in OLTP1, 4KB operations accounted for ~ 90% of the workload in total. Real applications will also be compressible and dedupable by x amount, which will also affect your performance, if you use any of those. But that is a completely different topic. I will just point out that VDbench supports compressible and dedupable data generation.
Simple Application Pattern Modeling
Not just databases can be profiled and simulated. Sometimes you will have many users that are using desktop applications, such as Microsoft office. Since it is a simple application, we can also accurately predict the amount of RAM and CPU it will consume on average. In my case, I calculated the following consumption:
1 core per Microsoft Excel instance and 200MiB + (3 * file size) (a user working on 7 MiB file) will consume 1 core and 221 MiB of RAM.
Here is an example of a profiled Microsoft Excel pattern:
Application |
Used in |
IO SIZE |
random/sequential |
read/write |
percentage |
Microsoft Excel |
User’s Desktops |
52k |
random |
write |
55 |
64KB |
random |
write |
40 |
||
6MiB |
random |
read |
5 |
Config Files Examples
VDbench Databases Config Files for Filesystem
These config files are currently set to run on windows, but to run it on a UNIX/LINUX-based OS, just modify the path on the hd ( Host definition) and fsd flags.
A few nits:
- The first test that runs is actually a “fillup,” which fills up the files with random data. That is done for the application patterns to have real data to read.
- All application patterns are set to run with threads=1, which for my setup will ensure that queue depth will be low, to yield the lowest latency.
The above workload examples can be found here.
VDbench Databases Config Files for RAW Disk
Note that RAW config files are extremely different from the filesystem config. The above workload examples can be found here.
VDbench Microsoft Excel Config File for Filesystem
Note that the fwdrate annotation (file system operations per second) is set to 1, meaning that I am currently simulating a single user. Also the fsd annotation (filesystem storage definition name) is set to 7, meaning our user is working on 7 different files each is 40MiB in size. The above workload examples can be found here.
VDbench Generic Config Files for Filesystem
A few examples for running generic performance for both blocks specific and mixed workloads can be found here.
FIO Databases Config Files
The above database workloads can also be run with fio. However, note that to replicate the same percentage ratio of different blocks within the patterns, I used the “flow” flag, which is a bit buggy and is currently not working properly. The above workload examples can be found here.
Categories