storage workload testing with fio

In my day to day activities in Technical Marketing, I get asked a lot to help with storage performance benchmarking and workload testing. I’ve been doing this for over a decade at multiple companies and found that there are a variety of tools and techniques for doing this as well as several standards and standards organizations contributing to the subject.

The wide variety of solutions available can actually create complexity when trying to achieve a simple task of estimating “how much” a particular platform or configuration might be capable of providing.

I like to keep things simple. That is why most of the time I choose the fio tool kit to run through any basic storage testing I find interesting. There are a number of resources out there to help you get started.

To make this a bit easier, a couple of us at Datrium have configured a simple CentOS7 Linux VM to help with workload generation on target storage devices and wrapped it up in an easy to deploy ova file. For now, please contact your nearest Datrium team member for access to this ova.

The workload VM has been configured with a simple setup to run the fio workload tool with predefined scripts that are contained in /home/datrium directory.  Login as root with password “datrium#1”.

There are two easy ways to run this workload VM:

  1. automatically through crontab
  2. manually with CLI

Method 1

To run the workload VM on a continuous basis, simply verify (or add) this line to the /etc/crontab file where the initial “*/5” field indicates a restart every 5 minutes. Longer or shorter runs are possible with easy edits to the the crontab entry and corresponding runtime script values as described below.

*/5 *  *  *  * root /home/datrium/do-work

With the automatic crontab driven approach, simply start the VM and wait – up to 5 minutes for the first IO pass to start. The VM should then run the script over and over until powered off.

Method 2

To run the workload manually, first remove or comment out the datrium specific entry in /etc/crontab. This will keep any secondary fio jobs from running during the manually invoked runs.

From the root user home directory simply type the following command:

./do-work

This will run the control script which in turn calls the fio tool with the prescribed workload. Note that there will not be any output to the CLI until the job has finished. The job is being run with the “–minimal” option to provide a terse output in file fio.txt that can be post processed with a simple perl script listed later.

The fio script – “worker-config” – details are shown here:

[global]
#
# general setup parameters - typically unchanged
#
direct=1
ioengine=libaio
group_reporting
time_based
numjobs=1
filename=/dev/sdb

#
# data variation control
# using something other than the defaults
#
buffer_compress_percentage=50
dedupe_percentage=50
randseed=50

#
# these parameters can be easily modified
#

# this needs to be less than the size of the 
# vmdk attached to the VM
filesize=100G

# greater queue depth may lead to higher latencies
iodepth=24

# match this to crontab interval for automatic runes
# or to the desired test length if manual run
runtime=5M

[worker-config]
#
# IO workload profile parameters
# match these to your test objective
#
bs=8K          # blocksize
rw=randrw
rwmixread=70   # read percentage

The perl script is included as an example of post processing the fio output.

#
# simple perl script to extract IOPS, throughput (MBs) and latency values from fit minimal output data
# Mike McLaughlin (@storageidealist) 12/17
#

use strict;
# row counter for csv output control and calculations
my $row=1; 

#
# this script takes a file name, 
# opens the file, and 
# prints the selected contents in csv format
#
if($#ARGV != 0) {
 print STDERR "You must specify exactly one argument.\n";
 exit 4;
 }

# Open the file
open(INFILE, $ARGV[0]) or die "Cannot open $ARGV[0]: $!.\n";

# print csv header row
print "Job name, MB/s, IOPS, R-lat(ms), W-lat(ms), Read BW(KB/s), Read IOPS, Read C-lat(usec), Write BW(KB/s), Write IOPS, Write C-lat(usec)\n";

while(my $line = <INFILE>) {
 # split the line into fields
 my @fields = split /;/, $line;

# increment row counter used in cell calculations
 ++$row;

# notes: 
 # converting KB/s to MB/s for total throughput sum
 # converting usec to ms time unit for latency display

print "$fields[2],=\(\(f$row\/1024\)+\(i$row/1024\)\),=\(g$row+j$row\),=h$row\/1000,=k$row\/1000, $fields[6], $fields[7], $fields[15], $fields[47], $fields[48], $fields[56]\n";
 };

close INFILE;
# end of script

Leave a Reply

Your email address will not be published. Required fields are marked *