Attention! Helicon Tech Blog has moved to www.helicontech.com/articles/

Wednesday, January 28, 2009

HotlinkBlocker 1.0.0.63 Released

New build of HotlinkBlocker acquired one important feature: from now on you may exclude client IP address from the process of signature generation.

Hotlink2

This allows to prevent issues with AOL browser (not only with it actually) that sends requests to html and images from different IPs.

Thursday, January 22, 2009

Introduction to mod_gzip

Every day millions of web-servers around the world receive billions of bytes of network traffic. Each year the speed of Internet connections increases. Hosting providers offer perfect tariffs. It seems the mankind is going to forget about traffic saving problem and sink it to oblivion. But even with HDD volume growth the users still haven’t forgotten about archivers. The same thought can be applied to web traffic. You can say: “I have 10 Mbit unlimited connection and the problem with traffic saving isn’t mine”. Yeah, 10 Mbit is very well. But what will you say if you get to know that it is possible to save more than 60% of the traffic? First of all lots of users have less than 10 Mbit connection. Indeed the growing popularity of mobile devices selects a main role for the traffic saving. A lot of PDA, cellphone and smartphone users would say ‘thank you’ if your web-server is saving their traffic and money. To sum up I’d like to say that traffic saving is timely and important process in modern web-server technologies.

Until recently HeliconTech had one specialized solution for content compression - HeliconJet. We have decided to include its functionality to our new product - Helicon Ape , accounting for its importance. So far as Ape stands for APache Emulation, it’s very important not to invent new syntax nd directives but use existing Apache assets.

There are 2 popular compression modules - conventional mod_deflate and mod_gzip. The last one is written by third party developer and is not supplied with Apache. We have decided to implement both modules because users are using them to the equal degree. At the moment only basic
mod_gzip functionality is realized but we are planing to extend it in the nearest future. Technically Ape will have one compression module which will be able to support both mod_gzip and mod_deflate syntaxes. Our primary goal is to give you an ability to easily use existing Apache configuration without any changes.

Let’s have a look at basic content compression principles and mod_gzip operation. This module applies GZIP format which uses Deflate compression algorithm. The module is based on .NET version of the popular library ZLib. Please note, Helicon Ape is written in managed code only!

Web-client (browser) exchanges technical information (so-called HTTP headers) with web-server. These headers contain important information helping client and server get mutual understanding. Client can point to accessible data type and needed content. Taking into account client abilities the server prepares and sends the content. After that technical information helps client understand what to do with the server response.

But we are not gonna dive deep into HTTP protocol subtleties as there are tons of info on this topic in the Internet. Lets recur to mod_gzip . General scheme of its operation is given below:



As you can see not only server takes part in considering whether to compress content or not. It is easy to understand ’cause if browser isn’t capable of uncompressing GZIP, then all mod_gzip operation will be senseless and the user will get rubbish. Web-client must send Accept-Encoding header with gzip, x-gzip or deflate value to let mod_gzip know whether the client supports compression.

In its turn, if the module makes a decision to compress content, it sets Content-Encoding: gzip header to inform the client that GZIP uncompression must be used. So, each chain on the scheme above plays
important role.

But to better understand mod_gzip logic, please have a look at this flowchart:



The sequence is used by mod_gzip to make compress/not compress decision. We’ll now give a brief explanatin of each stage:

  • When request comes to the server mod_gzip (if it’s ON) can start its “dirty”
    work.
  • Firstly, the module defines whether the content is already compressed. If it is, mod_gzip
    leaves things as is.
  • If it’s not, the module analyses request headers sent by the client. mod_gzip
    can move on only if there’s Accept-Encoding header with gzip, x-gzip
    or deflate value.
  • On the next step the module performs check set by
    specific directives inside configuration files. Based on results of these
    checks decision about content compressino is made.
  • If it’s necessary to use GZIP, the module will SET Content-Encoding: gzip header, ’cause
    otherwise the client may fail to process server response correctly.
  • Besides, there’s a special Vary header in which mod_gzip specifies what its actions depend on
    (Vary: Accept-Encoding). This header is used for caching, so it’s detailed description will appear in the
    upcoming articles.

It’s possible that in next versions will have slightly different logic, but we’ll surely inform you about that.

Resume

This article is just a brief introduction to Helicon Ape mod_gzip module.
We are thinking of writing much more material on that and other topics to help you use our little agile monkey (Ape) easily and efficiently.

Best wishes,
HeliconTech Team

Wednesday, January 21, 2009

Guide: Example of mod_cache application

In the previous articles we told you what cache is and how it works in Helicon Ape. Now it’s time to use obtained knowledge in practice. Today we gonna apply caching for PHP application called qdig that helps organize images web-gallery. Read how to register PHP on IIS7 in our article about WordPress.

Creating online photo album

Let’s create photos folder in site root and fill it with our photos. Now we are downloading qdig. To make it simpler we’ll extract only one index.php file and put it into the same directory.


The gallery is already working: http://localhost/photos/index.php

Measuring performance

To measure request rate we’ll use ab.exe application:
ab.exe -n 200 -c 2 "http://localhost/photos/index.php?Qwd=.&Qif=DSC00410.JPG&Qiv=name&Qis=M"
The result is a bit more than 16 requests per second.

Switching on mod_cache and mod_expires

To enable necessary modules, let’s uncomment the following lines in Helicon Ape httpd.conf file:
LoadModule expires_module    modules/mod_expires.so
LoadModule cache_module modules/mod_cache.so

Analyzing cached request

To make mod_cache cache not all requests but only unique ones, let’s figure out what qdig request parameters mean and how request uniqueness depends on them:
  • Qwd - folder where image files reside - AFFECTS request uniqueness;
  • Qif - file name - AFFECTS request uniqueness;
  • Qiv - mode of file names representation - AFFECTS request uniqueness;
  • Qis - image size - DOESN’T AFFECT request uniqueness;
  • Qtmp - representation mode - DOESN’T AFFECT request uniqueness;
Thus, cache key will use only Qwd, Qif and Qiv parameters.
The piece of config for mod_cache will look like:
<Files index.php>
CacheEnable mem
CacheVaryByParams Qwd Qif Qiv
</Files>

Expiration time

index.php script does not set Cache-Control and Expires headers, but, as we already know, they are really important for successful caching. So we’ll set these headers by ourselves. And for that purpose we’ll use mod_expires functionality:
ExpiresActive On
ExpiresByType text/html "access 1 hour"
Above directives set expiration time to 1 hour.
The resulting .htaccess is as follows:

Measuring performance once again

ab.exe -n 200 -c 2 "http://localhost/photos/index.php?Qwd=.&Qif=DSC00410.JPG&Qiv=name&Qis=M"
And now the result is about 94 requests per second!

That’s all you need to do to achieve sixfold performance growth.
This example clearly demonstrates the ease and efficiency of Helicon Ape caching feature.

Monday, January 19, 2009

Improvements in Helicon Ape release version

For those still hesitating whether to try release version of Helicon Ape or not we want to outline the mian differences between release and beta.

So, release (unlike beta) includes:
  • mod_cache module to reduce traffic by caching HTTP content
  • mod_gzip module to compress HTTP responses to further minimize server load
  • complete docs
  • enhanced Helicon Ape Manager
  • important bug fixes
Release version of Helicon Ape can now boast 19 Apache-compatible modules and easy-to-use manager for comfortable Apache emulation on IIS7.

Friday, January 16, 2009

Helicon Ape released!

It has happened! Helicon Ape has been released and is now available for download and purchase!
Enjoy all Apache advantages on your IIS servers!

Best wishes,
HeliconTech Team

Monday, January 12, 2009

How mod_cache works?

Helicon Ape release (coming very-very soon) will contain mod_cache module. And as we promised in our previous article we are now giving you more thorough description of mod_cache operation.

mod_cache starts working

After authentication/authorization events but prior to request handler execution mod_cache comes out on the scene. At this stage the module performs the following:
  • checks whether it's possible to use cached response for the current request
  • if yes, generates a key and searches cached response using this key
  • if the response is found in cache, the module gives it back to the client and request processing is over — request handler is not invoked.

Cacheable or not cacheable: request check


Response may be cached if request meets the following requirements:
  • request method is GET
  • request does not contain Authorization header
  • Cache-Control request header must not be no-cache. This condition is ignored if CacheIgnoreCacheControl On is used
  • Pragma request header must not be no-cache. This condition is ignored if CacheIgnoreCacheControl On is used

mod_cache attempts to save response

When request handler has completed its job and all defined filters have been applied to response, mod_cache starts to operate. At this stage the module performs the following:
  • estimates the capability of response caching
  • checks if CacheEnable is set for this request
  • generates cache key
  • defines the period of time to store response in cache (absolute expiration time)
  • saves response in cache according to the key

Cacheable or not cacheable: response check


The following conditions are considered when deciding whether response is cacheable (all must be met at a time):
  • request method is GET
  • response status is 200 (200, 203, 300, 301 or 410 in Apache)
  • Expires response header contains valid "future" date
  • responses containing expiration time (i.e. Expires or Cache-Control: max-age=XX headers), Etag header or Last-Modified header. This condition is ignored if CacheIgnoreNoLastMod is used

    • if request has a QueryString, only those responses containing expiration time are cached (i.e. Expires or Cache-Control: max-age=XX headers). This condition is ignored if CacheIgnoreQueryString On is used
  • Cache-Control request header must not be no-cache. This condition is ignored if CacheStoreNoStore On is used
  • Cache-Control request header must not be private. This condition is ignored if CacheStorePrivate On is used
  • request does not contain Authorization header (for Apache: if Cache-Control contains s-maxage, must-revalidate or public)
  • Vary response header does not contain "*".

Cache key generation

Response is saved in cache according to the key. This key includes:
  • normalized (canonical) request URI without QueryString or, in case of proxy request, normalized proxy request URL;
  • all QueryString parameters and their values in alphabetical order (default behavior)

    • CacheIgnoreQueryString On directive cancels addition of request parameters to the cache key
    • CacheVaryByParams param1 param2 ... directive defines parameters to be included into cache key
  • all request headers specified in CacheVaryByHeaders header1 header2 ... directive. Headers are not included to the cache key by default.
  • If response contains Vary header, all request headers specified in it are included into cache key.

When cached response dies

HTTP response is stored in cache for a specific period of time that is computed in the following way:
  • If response contains Expires header and its value is valid and does not refer to the past, cached response will be stored till the time specified in it.
  • If response contains Cache-Control header with either max-age=X or s-maxage=X, cached response will be stored in cache for X seconds.
  • If response contains Last-Modified header, cached response will be stored in cache until: expiry date = date + min((date - lastmod) * factor, maxexpire), where date - current date, lastmod - value of Last-Modified header, factor - float value set via CacheLastModifiedFactor directive (default value = 0,1), maxexpire - value set via CacheMaxExpire directive (default value = 86400 seconds = 1 day).
  • If mod_cache was unable to calculate expiration date using one of aforementioned methods (this is possible if response doesn't have Expires, Cache-Control, Last-Modified headers BUT has Etag header), it (date) is equated to default value of 1 hour that may be reset using CacheDefaultExpire directive.
This load of text might look a little unclear for you at a glance, but in reality this is a well-composed and highly efficient scheme. And our upcoming article will convince you in this.

Friday, January 9, 2009

Web Caching: what is it?

What is that and what’s it for?

Web cache is a vital instrument to build lightning-fast web apps. Web cache stores HTTP responses that may be provided to the user without making a request to the server, i.e. no ASP/PHP scripts execution and database queries are necessary. And that’s cool!
Web-caching allows to substantially reduce response time — time the server needs to give the response — as reading from cache is much faster than processing request with PHP handler.
Web-caching minimizes traffic — if one uses intermediate caches (gateway or proxy cache), request won’t reach the origin server — response will be given back by an intermediate caching server.

Cache breeds

Server cache

This cache works on the origin server. Applications and server itself use it to store parts of responses (e.g. web pages) or complete responses. Server cache may be used on application (e.g. memcached + php or HttpRuntime.Cache + ASP.NET) or HTTP server level (e.g. mod_cache in Apache, OutputCache in IIS7).

Proxy cache

It lives between clients and origin servers and may only store public representations that do not require authorization (unlike private representations). Proxy cache is widely used by providers to reduce traffic.

Browser cache

It lives in browser and is capable of storing private data. Browser cache is used for example for Back button operation.

How does Server Cache work?

Cacheless configuration

Cacheless configuration forces server to process each incoming request and generate new response even if the same resource is requested several times running. That is senseless time- and resources-consuming operation that puts excessive load on the server.

First request to cache-enabled server

When the specific resource is requested from the server for the first time caching system checks if it’s possible to cache the response, then it looks for response in cache and fails to find it. Request moves further along the server pipeline triggering necessary handlers and filters. When the response is ready caching system saves it to cache before sending to the client.

Subsequent requests to cache-enabled server

Upon further requests to this resource caching system checks if it’s possible to cache the response, then it looks for response in cache and this time finds it! Then the response is retrieved from cache and sent to the client. And that’s it! No server handlers and filters are executed.
Responses are stored in cache for a certain period of time. When this time elapses cached response is labeled as not valid and is removed from cache. Next request to that same resource is processed as if it is requested from the server for the first time (see “First request to cache-enabled server’).

Conclusion

As you could see, Server Cache favors lower server load and faster response time. In the next article concerning cache we’ll give more thorough explanation of this process and illustrate it with examples.