Apache 2 mod_deflate Benchmark

Introduction

Background

Are you ready to take a look at a fairly new technology that promises you to save bandwidth? Maybe you're even more interested when the promises range from a 50% to a 80% amount of savings? Jump in, and take the ride to see if it works out as well as you were promised. I'm going to take a walk down Apache 2 server lane and benchmark mod_deflate in a real life situation instead of a synthetic setup.

This is a real story, because I got inspired by reading lots of good news about mod_deflate. We'll take a look at a bit of the background, activating the server module, benchmarking the savings and drawing conclusions.

HTTP Compression

Short history

With the introduction of HTTP protocol version 1.1, a new property of the HTTP request was introduced called Accept-encoding. See this RFC at the W3 for detailed information. Basically this property allows a client's browser to let the webserver know whether or not it supports compressed data transfers.

Under the previous protocol version, 1.0, transfers would always have to be uncompressed, which simply means the server's reply, the requested page, would be sent one-on-one over the wire. Since the main function of a webserver is to serve (hyper)text content, most requested pages are in a human readable form, as HTML formatted documents.

HTML documents can easily be compressed using standard algorithms employed by compression utilities like zip, bzip2 and gzip. To get back to the original point: the mod_deflate server module for Apache enables HTTP/1.1 compression so that all requests sent back to compliant clients will be compressed transparently.

The initially promised bandwidth savings come simply from the fact that the compressed version of the HTML pages takes up less bandwidth than the one-on-one uncompressed version. The savings mentioned, fifty to eighty percent, are the compression ratios possible on HTML documents, which now directly relate to a decrease in network usage.

Great and all, but why use it?

Bandwidth saving's a great thing, but it might not directly be obvious why. There are two common scenarios in which bandwidth savings can make a huge difference. The first one, most obviously, is the fact that if you have to pay a bill for the amount of data transferred, it is going to make a clear difference in getting your bill down. However, if your total transferred amount of data isn't a problem, there's a second, more subtle advantage.

Besides a limit on the amount of data, there's usually also a limit on the speed of your transfers, expressed in kilobits or megabits per second. Compression can help you push out more pages simultaneously because you won't saturate your network link as early. Suppose compression cuts your page sizes in half, this means you can push twice as much pages down the pipe in a given time period than before.

Before we start

Compression takes CPU power to apply, so it doesn't come for free for most dynamic sites. You will need to find a balance between available CPU time and network speed. In my case, bandwidth is severely limited and CPU time is plentiful. I've made this very simple table giving you a little advice on what to do based on your situation:

 Little CPU timePlenty of CPU time
Little bandwidthMinimal compressionMaximal compression
Plenty of bandwidthNo compressionMinimal compression

Of course you are free to deviate from this scheme if you desire.

Activating Apache 2 mod_deflate

Bring out the text editor, 'cause GUI is going bye-bye

Getting Apache 2 installed shouldn't pose too much of a problem, either you use your distribution's package manager, or compile it from source. Make sure you do compile the gzip support, otherwise mod_deflate won't be available for use. This might also requiring enabling the loading of shared object libraries, consult your documentation if you are unsure how to do this.

Keep in mind that I was in the position to dive directly into the task at hand, update the server configuration and restart Apache. Please be careful executing each any any of the following commands, especially if you are trying this on a production server. If you're able to do so, test the changes you're about to make in a controlled environment before uploading them to the actual server.

The first step is to find your Apache 2 configuration file. I'm using a Gentoo Linux installation, on which the path is:

cd /etc/apache2/conf

What I'm going to do is apply compression to the site-wide configuration. My server hosts several domains and websites, but I want them all compressed in the same way, because of my limited bandwidth. You can limit compression on several levels, like virtual hosts.

The complete configuration story

Read this section in the Apache manual for an explanation of the options we're going to use. It's important you acquaint yourself a bit with the possible options and exceptions of dynamically compressing your content. But don't worry, I'll provide a working example right after this sentence, so you're not on your own.

I fired op vi to edit to configuration and jumped right down to the bottom to add the mod_deflate directives. So, if you want a site-wide compression enabled, add the following lines to your apache2.conf:

AddOutputFilterByType DEFLATE text/html text/plain text/xml
BrowserMatch ^Mozilla/4 gzip-only-text/html
BrowserMatch ^Mozilla/4\.0[678] no-gzip
BrowserMatch \bMSI[E] !no-gzip !gzip-only-text/html
Header append Vary User-Agent env=!dont-vary

If you followed my advice, these lines will look familiar., as they're taken straight out of the Apache documentation. Let's skim over each line to see what it does:

  • AddOutputFilterByType ..., this line enables the mod_deflate compression for the three given content types text/html, text/plain and text/xml. This also prevents images, documents and multimedia files from being recompressed by mod_deflate when requested.
  • BrowserMatch ..., these three lines which look alike prevent mod_deflate from sending compressed content to a few known browsers which do not support compression.
  • Header append ..., this line makes sure that the compression will work together with proxies without any problems.

After adding these lines, compression will already work. But don't restart your Apache just yet, let's add some icing on the cake.

Knowledge is power

Restarting Apache would mean that the content would now be sent in a compressed form, but how will we ever know what savings we got? This really interesting aspect kind of forms the basis for this article, so let's look into adding a logfile which will give us the mod_deflate compression ratio. Add the following three lines to the Apache 2 configuration file:

DeflateFilterNote deflate_ratio
LogFormat "%v %h %l %u %t \"%r\" %>s %b mod_deflate: %{deflate_ratio}n pct." vhost_with_deflate_info
CustomLog logs/deflate_access_log vhost_with_deflate_info

It's entirely possible one of the lines above is wrapped, please make sure you only add three lines, beginning with DeflateFilterNote, LogFormat and CustomLog.

Let's again take a look at what each line does:

  • DeflateFilterNote ..., this signals mod_deflate to store it's compression rate in the variable deflate_ratio, which we use in the logfile.
  • LogFormat ..., you should match this line to your normal access_log format, and append the "mod deflate: ${deflate_ratio}n pct." part to it. This defines the format for the lines in the extra logfile.
  • CustomLog ..., this line adds a new logfile in the Apache directory, containing the compression statistics.

After adding these lines, you should be able to safely restart Apache. But check the configuration first by issuing the command (the # indicates you should run the command as the superuser):

#apache2ctl -t
Syntax OK

If you do not get the "Syntax OK" reply, please review the changes you made to the configuration files. Once everything's set, restart the Apache server by using one of these commands:

/etc/init.d/apache2 restart (Gentoo Linux)
service httpd restart (Redhat Linux)

Congratulations, you've enabled mod_deflate compression!

Benchmarking Savings

Initial findings

Use a decent browser, like Firefox to open pages on your webserver, and closely watch the deflate_access_log logfile. You will see HTTP/1.1 requests streaming in with a compression ratio alongside them. The interesting part of the logfile entries is the last few characters, after the HTTP/1.1 mark, like in these examples:

HTTP/1.1" 200 8125 mod_deflate: 34 pct.
HTTP/1.1" 200 5665 mod_deflate: 35 pct.
HTTP/1.1" 200 24477 mod_deflate: - pct.

The 200-code indicates successful requests, the number after the 200 indicates the number of bytes sent to the client, and the mod_deflate percentage indicates how large the compressed data was compared to the original data. The first line thus indicates that 8125 bytes were sent to the client, which was 34% of the original size, so we saved (100 - 34 =) 66% on that document.

The last example, however, was not compressed, it shows a dash as ratio. The client probably did not send an Accept-encoding, so we couldn't send a compressed version of that page.

Counting back

So, after a while, you'll have a nice logfile containing the compression ratios achieved by mod_deflate. It also reports the pages which were not compressed, so we can construct a total overview of savings based on this sole logfile. Before I present you a nice script to automate the process for you, let's take a quick look at some of the math involved.

One of the most interesting things is not only to see how much percent of compression we achieved on individual pages, but also the total amount of bytes saved. Given the size of the compressed page and the compression ratio, we can calculate the amount of bytes we saved using a simple formula:

Bytes uncompressed = Bytes compressed x ( 100 / Ratio )

We only need to round this number down, because you can't really have a non-integer amount of bytes in the original page.

Stage one: filtering logfile data

To automate the process of gathering mod_deflate data, I wrote two scripts, one simply in Bash and the other one in PHP. The Bash script filters the logfile data and saves an intermediate temporary file which contains the page sizes and compression ratios. The PHP script then aggregates this data and presents it on the system console. The Bash script calls the PHP script, so analyzing your logfile is as simple as starting only one script.

Here's the first script for Bash, download it here or copy-n-paste it (not recommended since the formatting is lost):

#!/bin/bash
LOG="/var/log/apache2/deflate_access_log"
TMP="savings.tmp"
LOCALNET="10.0.0."
NICE="nice -n 19"

$NICE cat $LOG | grep -v $LOCALNET | \
grep -oaE "[^ ]+ mod\_deflate: [^ ]+" | \
cut -f 1,3 -d " " > $TMP
$NICE php -q calcsavings.php $TMP
$NICE rm -f $TMP

The variables you'll wish to change are mentioned at the top of the script, in uppercase. LOG depicts the deflate logfile to scan, TMP is a temporary file to pass data between the Bash and PHP script, LOCALNET is the subnet to filter from the statistics and NICE enables you to run the script only in idle time, which is advised as the script consumes quite a lot of resources.

Stage two: calculating the statistics

The PHP script which does the actual calculation can be downloaded here, or copy-n-paste it (not recommended since the formatting is lost):

<?
$fp = fopen($argv[1], 'r');
while(!feof($fp)) {
$in = explode(' ', trim(fgets($fp)));

$newsize = $in[0];
$percent = $in[1];

if($percent == '-') {
$base += $newsize;

$requestsbase++;
} else if($percent != '') {
$oldsize = $newsize * (100 / $percent);
$totalold += $oldsize;
$totalnew += $newsize;

$requestsnew++;
}
}
fclose($fp);

$totalold = floor($totalold);
$totalnew = floor($totalnew);
$saved = $totalold - $totalnew;
$savedperc = floor(100 - 100 * ($totalnew / $totalold));
$newperc = 100 - $savedperc;

$totaloldbase = $base + $totalold;
$totalnewbase = $base + $totalnew;
$savedpercbase = floor(100 - 100 * ($totalnewbase / $totaloldbase));
$newpercbase = 100 - $savedpercbase;

$requeststotal = $requestsbase + $requestsnew;
$reqnewperc = floor(100 * $requestsnew / $requeststotal);

echo "Uncompressed Base Included, $requeststotal request(s)\n";
echo "Original: $totaloldbase\n";
echo "Compressed: $totalnewbase ($newpercbase%)\n";
echo "Saved: $saved ($savedpercbase%)\n";
echo "\n";
echo "Uncompressed Base Excluded, $requestsnew request(s) ($reqnewperc%)\n";
echo "Original: $totalold\n";
echo "Compressed: $totalnew ($newperc%)\n";
echo "Saved: $saved ($savedperc%)\n";
?>

If you downloaded the files, do not forget to unpack them and make the Bash script executable:

tar xvfz calcsavings.tar.gz
chmod +x calcsavings.sh

Running the script

So, now that all scripts are in place, let's run it! Start the calcsavings.sh by issuing this command:

./calcsavings.sh

The script will run for a moment (initially this will be well within a second) and output some statistics. To be able to serve this article better, I waited a bit longer before so that the numbers are more interesting. Output on my server after a couple of hours running mod_deflate was as like this:

Uncompressed Base Included, 19179 request(s)
Original: 158956883
Compressed: 143494640 (91%)
Saved: 15462243 (9%)

Uncompressed Base Excluded, 2430 request(s) (12%)
Original: 21849897
Compressed: 6387654 (30%)
Saved: 15462243 (70%)

There are two portions in the output, the top half is about the full logfile (including uncompressed content), the bottom half is about only the compressed content. Let's see what each of the portions mean.

Output dissection: total logfile

Uncompressed Base Included, 19179 request(s)
Original: 158956883
Compressed: 143494640 (91%)
Saved: 15462243 (9%)

In the top half, the first line shows the total amount of requests in the deflate logfile, in this case it's 19179 requests. The second line shows the amount of data that would have been transferred if mod_deflate wasn't used, the third line shows the actual amount of transferred data. Lastly, the fourth line shows the savings gained by using mod_deflate. The third and fourth line also show the percentage related to the total amount of uncompressed bytes on line one.

Applying this knowledge to the numbers given, this means that in total, I saved 9% of bandwidth by using mod_deflate. That's not quite like the promises I was made. But lets look a bit further, in the bottom half of the output.

Output dissection: compressed content

Uncompressed Base Excluded, 2430 request(s) (12%)
Original: 21849897
Compressed: 6387654 (30%)
Saved: 15462243 (70%)

This part of the output only contains statistics on the compressed pages, the first line shows the amount of requests that have been compressed, 2430 in this example, which is 12% of the total amount of requests. The second line shows how much bytes would have been sent without compression, the third line shows the amount of actually transferred bytes. Lastly, the fourth line shows again the number of bytes saved by mod_deflate. The third and fourth number are also expressed as a percentage of the first one.

Looking at these numbers we see that mod_deflate indeed offers 70% of savings for the content it processes, so the promises weren't false but not totally true either.

Conclusion

All the promises we break

Analyzing the numbers in my case meant that there wasn't a miraculous saving going on, but as usual your mileage may vary. The upside is that I don't lose anything besides some spare CPU time, but still have a slight 10% gain in network performance. If you're tight on network speed, for instance limited to 512 kbit per second upstream, you instantly have a "virtual" increase to about 564 kbit per second.

An interesting thing to note is that some web search engines, like Google's GoogleBot support HTTP/1.1 compression. This makes you save a lot of network utilization when GoogleBot comes around. I noticed that my compression rates went up during GoogleBot's visits, purely because it requests compressed pages, this also means that regular visitors will be able to request your pages faster since GoogleBot doesn't take away too much of their bandwidth.

Take a look at the re-run of the script at a later time, when GoogleBot had just been around for a while:

Uncompressed Base Included, 38118 request(s)
Original: 308738797
Compressed: 277508802 (90%)
Saved: 31229995 (10%)

Uncompressed Base Excluded, 5451 request(s) (14%)
Original: 44900254
Compressed: 13670259 (31%)
Saved: 31229995 (69%)

Another point is of course the format of your source material. I tend to serve up quite a lot of screenshots, which are already compressed using image compression techniques. Serving text content will make mod_deflate much more efficient and useful, since text can be compressed up to 70% on average, as seen in this article.

Second opinion

When I initially wrote this article, I tested compression for all sites hosted on the server, which range from graphics intensive pages to some plain text ones. In February 2006 a simple new plain text site I'm maintaining grew bigger and bigger and I wanted to relieve my network connection from the load of traffic it generated.

I had disabled mod_deflate shortly after my initial analysis of a meager 10% savings, but I decided to apply it again just for this single site which only has plaintext content. The results were amazing. I ran the calculation script after a few hours and it presented me with this:

Uncompressed Base Included, 63899 request(s)
Original: 764353147
Compressed: 215481248 (29%)
Saved: 548871899 (71%)

Uncompressed Base Excluded, 50601 request(s) (79%)
Original: 663856300
Compressed: 114984401 (18%)
Saved: 548871899 (82%)

Wow, just look at that. Of the total traffic, only 29% is actually sent over the wire, already saving more than 500 megabytes of traffic. Using mod_deflate I only had to upload about 215 Mb of data where previously it would have been 764 Mb. This means I can serve almost triple the amount of pages in the same time frame.

As you can see, this situation really benefits from compression and I'm bound to leave it enabled.

What's the story

There's no definite conclusion available, but if you've got the idle CPU time, just go ahead and enable mod_deflate, you will not regret it. Using the configuration given in the article, it will not make the situation worse for any of your site's visitors.

About this article

This article was added to the site on the 23rd of October 2005 and updated on the 4th of March 2006, adding the paragraph "Second opinion".

Other Linux articles

Back to top