Attention! Helicon Tech Blog has moved to www.helicontech.com/articles/

Thursday, January 22, 2009

Introduction to mod_gzip

Every day millions of web-servers around the world receive billions of bytes of network traffic. Each year the speed of Internet connections increases. Hosting providers offer perfect tariffs. It seems the mankind is going to forget about traffic saving problem and sink it to oblivion. But even with HDD volume growth the users still haven’t forgotten about archivers. The same thought can be applied to web traffic. You can say: “I have 10 Mbit unlimited connection and the problem with traffic saving isn’t mine”. Yeah, 10 Mbit is very well. But what will you say if you get to know that it is possible to save more than 60% of the traffic? First of all lots of users have less than 10 Mbit connection. Indeed the growing popularity of mobile devices selects a main role for the traffic saving. A lot of PDA, cellphone and smartphone users would say ‘thank you’ if your web-server is saving their traffic and money. To sum up I’d like to say that traffic saving is timely and important process in modern web-server technologies.

Until recently HeliconTech had one specialized solution for content compression - HeliconJet. We have decided to include its functionality to our new product - Helicon Ape , accounting for its importance. So far as Ape stands for APache Emulation, it’s very important not to invent new syntax nd directives but use existing Apache assets.

There are 2 popular compression modules - conventional mod_deflate and mod_gzip. The last one is written by third party developer and is not supplied with Apache. We have decided to implement both modules because users are using them to the equal degree. At the moment only basic
mod_gzip functionality is realized but we are planing to extend it in the nearest future. Technically Ape will have one compression module which will be able to support both mod_gzip and mod_deflate syntaxes. Our primary goal is to give you an ability to easily use existing Apache configuration without any changes.

Let’s have a look at basic content compression principles and mod_gzip operation. This module applies GZIP format which uses Deflate compression algorithm. The module is based on .NET version of the popular library ZLib. Please note, Helicon Ape is written in managed code only!

Web-client (browser) exchanges technical information (so-called HTTP headers) with web-server. These headers contain important information helping client and server get mutual understanding. Client can point to accessible data type and needed content. Taking into account client abilities the server prepares and sends the content. After that technical information helps client understand what to do with the server response.

But we are not gonna dive deep into HTTP protocol subtleties as there are tons of info on this topic in the Internet. Lets recur to mod_gzip . General scheme of its operation is given below:



As you can see not only server takes part in considering whether to compress content or not. It is easy to understand ’cause if browser isn’t capable of uncompressing GZIP, then all mod_gzip operation will be senseless and the user will get rubbish. Web-client must send Accept-Encoding header with gzip, x-gzip or deflate value to let mod_gzip know whether the client supports compression.

In its turn, if the module makes a decision to compress content, it sets Content-Encoding: gzip header to inform the client that GZIP uncompression must be used. So, each chain on the scheme above plays
important role.

But to better understand mod_gzip logic, please have a look at this flowchart:



The sequence is used by mod_gzip to make compress/not compress decision. We’ll now give a brief explanatin of each stage:

  • When request comes to the server mod_gzip (if it’s ON) can start its “dirty”
    work.
  • Firstly, the module defines whether the content is already compressed. If it is, mod_gzip
    leaves things as is.
  • If it’s not, the module analyses request headers sent by the client. mod_gzip
    can move on only if there’s Accept-Encoding header with gzip, x-gzip
    or deflate value.
  • On the next step the module performs check set by
    specific directives inside configuration files. Based on results of these
    checks decision about content compressino is made.
  • If it’s necessary to use GZIP, the module will SET Content-Encoding: gzip header, ’cause
    otherwise the client may fail to process server response correctly.
  • Besides, there’s a special Vary header in which mod_gzip specifies what its actions depend on
    (Vary: Accept-Encoding). This header is used for caching, so it’s detailed description will appear in the
    upcoming articles.

It’s possible that in next versions will have slightly different logic, but we’ll surely inform you about that.

Resume

This article is just a brief introduction to Helicon Ape mod_gzip module.
We are thinking of writing much more material on that and other topics to help you use our little agile monkey (Ape) easily and efficiently.

Best wishes,
HeliconTech Team

No comments:

Post a Comment