Attention! Helicon Tech Blog has moved to www.helicontech.com/articles/

Monday, August 31, 2009

Helicon Ape mod_proxy: proxy-server inside IIS

What is proxy-server?

Proxy-server is a network service empowering clients to perform indirect requests to other network services. Proxy-server may be considered an intermediary. The brief description of proxy-server operation is as follows:

  • client connects to proxy-server (front-end server)
  • asks proxy-server for some resource located on another server
  • proxy-server connects to the specified server (back-end server)
  • gets requested resource
  • gives out resource to the client

And the client may be ignorant that the requested resource was delivered from another server.

What is HTTP-proxy

HTTP-proxy is an implementation of proxy service for HTTP protocol. HTTP-proxy may be either reverse or forward.

Reverse HTTP-proxy usually lives between external network and internal network, it resolves external namespace into internal one, it is a barrier between external clients and live web-servers on the Intranet. The example is given below. Reverse HTTP-proxy is used to disguise internal network infrastructure, balance load among back-end servers, caching and HTTP responses compression. As a rule external clients have no idea that they are getting response from reverse proxy server.

Forward HTTP-proxy (aka Web-proxy) is used to reside between internal network and external network (Internet) and restrict access to specific HTTP resources, HTTP responses caching and web surfing. To make use of forward proxy the client shall explicitly specify its address (e.g. in browser settings). HTTP requests to forward proxy look like:
GET http://example.com/ HTTP/1.1
Host: example.com
Accept: */*
User-Agent: Mozilla

Note! The peculiarity of forward proxy request in comparison with direct request is that the path after GET (and any other HTTP method) is a fully qualified URL (including protocol and host part) and not just the local path to destination (starting with /).

Helicon Ape mod_proxy

Helicon Ape owns a mod_proxy module that implements both reverse and forward proxy functionality. All basic aspects of this module along with examples may be found in the docs.

Forward proxy in Helicon Ape is enabled by ProxyRequests On directive. Before enabling you need to secure your server so that only authorized users could access the proxy.

Reverse proxy is enables by ProxyPass directive. For example:

ProxyPass /app/ http://backend.domain.com/

or (the first parameter may be omitted when the directive is used inside <Location> section or .htaccess):

<Location /app/>
  ProxyPass http://backend.domain.com/
</Location>

The above config will proxy all requests starting with /app/ to backend.domain.com previously removing /app part from the path:
/app/item/33/ -> http://backend.domain.com/item/33/.

To make HTTP response headers change when reverse proxying (e.g. Location header upon redirect) ProxyPassReverse directive may be used, and to change domain names and paths in cookies the following directives are used: ProxyPassReverseCookieDomain and ProxyPassReverseCookiePath.

Now we'll illustrate you an example of non-trivial proxy application.

Example: load balancing

Given: front-end server example.com visible from external network.

Goal: Realize load balancing among three back-end application servers accounting for their performance and two back-end servers storing static files (images, CSS, etc.). Say, the second and the third back-end application servers are twice as productive as the first one, and the second back-end for static is thrice as powerful as the first one.

Solution. The reverse proxy configuration in httpd.conf will be:

<VirtualHost *:80>

ProxyPass /static/ balancer://cluster-static/ lbmethod=bytraffic

<Proxy balancer://cluster-static>
  BalancerMember http://static1.example.com/ loadfactor=1
  BalancerMember http://static2.example.com/ loadfactor=3
</Proxy>

ProxyPass / balancer://cluster-app/ lbmethod=byrequests

<Proxy balancer://cluster-app>
  BalancerMember http://app1.example.com/ loadfactor=1
  BalancerMember http://app2.example.com/ loadfactor=2
  BalancerMember http://app3.example.com/ loadfactor=2
</Proxy>

</VirtualHost>

The search of ProxyPass directive to match current request is performed subsequently, so directives with shorter matching patterns should be put lower in the config. balancer: protocol in ProxyPass directive tells that requests will be forwarded to the URLs specified in subsequent BalancerMember directives. lbmethod=byrequests parameter indicates that balancing will be based on the number of requests to back-end server; bytraffic value means that load balancing will depend on the quantity of bytes transmitted from back-end.

Compression and caching

To accelerate your proxy-server responses from the back-end may be compressed and cached. To do that we add the following line into the VirtualHost section of our htpd.conf:

# enable compression
SetEnv gzip 

# enable caching
CacheEnable mem http://app1.example.com/
CacheEnable mem http://app2.example.com/
CacheEnable mem http://app3.example.com/

Please notice that caching will only work if the response from back-end contains expiration headers; e.g., Cache-Control: max-age=60.

Conclusion

As you could see Helicon Ape mod_proxy module possesses full-fledged proxy functionality to satisfy the most exacting needs.

Best wishes,
HeliconTech Team

Tuesday, August 18, 2009

Go for SEO with Helicon Ape mod_linkfreeze

Intro

SEO, SE-friendly, search engine marketing — these words are driving mad lots of people today. Everyone wants to be SE-friendly. According to the Wikipedia, search engine optimization (SEO) is the process of improving the volume or quality of traffic to a web site from search engines via «natural» («organic» or «algorithmic») search results. In other words, SEO simplifies search robots' job thus bringing the web site to the top of the search results. Having got the idea, you would probably like to use that technique everywhere. Any new web-site of yours will be optimized for search engine. And that’s pretty good, but not good enough. We've missed something here, huh? What about existent sites? What if they are really huge and require loads of code modifications. Are they doomed to have stupid links like index.php?id=123? Do you need to spend loads of hours tinkering with a source code? Definitely NO! Right here we are going to show you an easy and powerful way to let SEO in your server without significant effort. Luckily Helicon Ape just got all needed features.

mod_linkfreeze

Like its elder brother mod_linkfreeze provides extended toolset for changing links on pages to SE-friendly format. So, in general «freezing» idea is based on HTML content modification. Once you've written special rules, the module would carefully modify every hyperlink inside each web-site page if the link matches the rules pattern(s). In a word, mod_linkfreeze turns dynamic links to the static ones. It is a primary idea of the module and at the same time a good SEO practice as long as search robots work much better with static references. And the most interesting thing here — you don’t need to modify any part of existent code. Basic concept is depicted on this scheme:
The scheme reflects the whole life cycle of a request. As you can see user goes to a web-site and IIS serves the request through the web-site engine. It doesn’t matter what the engine is. It can be Wordpress, CakePHP or even simple HTML. The important part is that the engine returns HTML response within hyperlinks and then mod_linkfreeze catches and processes the response according to its rules. Dynamic links become static and the user finally gets requested page. Without mod_linkfreeze the step inside the orange rectangle won’t exist.

You may be wondering why we expect only IIS7 on the server (see the scheme). Bad news for Windows 2003 owners — mod_linkfreeze doesn’t work under IIS6. It's just technically impossible and we hope you will enjoy mod_linkfreeze on Windows 2008 Server.

Well, let the theoretical stuff blow away and make something real — move further and demonstrate the module dealing with a real web application.

Freezing phpBB — the online forum engine

We decided to play with a forum engine because forums usually contain tons of dynamic links. Although we've taken phpBB, you may use another web-application as far as the article describes mod_linkfreeze in general.

The forum links right after installation look like this:

As you can see, there are dynamic links to PHP scripts which we want to make static. Let's enable mod_linkfreeze. First of all we have to make sure the following line is uncommented in the server configuration (httpd.conf):
LoadModule linkfreeze_module   modules/mod_linkfreeze.so
Then we should enable linkfreeze filter. The easiest way is to write the following in httpd.conf:
SetOutputFilter linkfreeze
You may try mod_mime as well, setting up the filter on a specific extension only:
AddOutputFilter linkfreeze .php
Well, now we should edit .htaccess file in the root folder of phpBB and update it as follows:
LinkFreezeEngine on
Although that wasn't actually required. LinkFreezeEngine is switched on by default. However you may use this directive to disable the module in specific context. For example you may use mod_linkfreeze for several locations only:
LinkFreezeEngine Off
<Location /foo/>
        LinkFreezeEngine on
        ...
</Location>

<Location /bar/>
        LinkFreezeEngine on
        ...
</Location>
LinkFreezeRule is the magic and power of mod_linkfreeze. This directive controls the whole process of «freezing». Basic syntax is described in the documentation. Let's see what happens if we try this:
LinkFreezeEngine on
LinkFreezeRule --- php=html

Wow, great! It works. Two simple lines and we have static links. They don't look pretty enough yet, but firstly we will sort out what's going on and then will try to get better result.

So there are three hyphens straight after LinkFreezeRule. Do you remember that dynamic links have three separators as usual? The first is a question mark (?) for query string separation, the second is ampersand (&) for query string arguments separation and the third is equality sign (=) which separates arguments names and their values. Three hyphens are needed exactly to replace these separators in the following order: ?&=. We decided to use hyphens but it wasn't necessary.

Note! It is advisable to use rarely used symbols as replacement characters, otherwise conflicts with the same characters in the URLs are inevitable. We recommend the following combinations: ---, ~~~, !!!, |||, ===, ///. The characters may also be combined, e.g.: -=-, !/=, etc.

Next part of the rule is php=html. Obviously it means extension replacing from ‘php’ to ‘html’. You might decide to use something more funny:
LinkFreezeEngine on
LinkFreezeRule --- php=aspx
The links are static but look very weird. Query string arguments go right after the extension. Moreover we want to be sure all references have lower case. For these cases LinkFreezeRule supports optional flags. MoveExt moves file extension and LowerCase casts a link into lower case. Let's try it:
LinkFreezeEngine on
LinkFreezeRule --- php=aspx [MoveExt, LowerCase]
Oh no. Our session expired while we were writing previous paragraph and phpBB added some strange argument ‘sid’. It has gone after second page reloading, but we should expect the argument again. What can we do? Happily mod_linkfreeze prepared another useful flag for us — Params. It allows to specify only needed query string arguments for «freezing». In our case it would be like that:
LinkFreezeEngine on
LinkFreezeRule --- php=html [MoveExt, LowerCase, Params="u|g|f|p|mode|id|search_id"]
See we have only two lines of code. We didn't modify any part of phpBB, we didn't tweak IIS. We just wrote a few magic words. If sometime you realize you don't need mod_linkfreeze anymore, you may just replace on with off. Huge effort, huh? :)

We've finished with phpBB. Our site is configured and ready to go on air.

The last tricks

When you setup mod_linkfreeze on your server some users will still have old bookmarks. As long as links became SE-friendly you would probably like to redirect old references to the new ones. This is beneficial for SEO as it allows to eliminate the duplicate content problem and consequent penalty in search engines. To enable redirection, please use Redirect flag:
LinkFreezeEngine on
LinkFreezeRule --- php=html [Redirect, MoveExt, LowerCase, Params="u|g|f|p|mode|id|search_id"]
Left-side screenshot displays response without Redirect flag. User gets 200. When we add the flag, user gets 301 (right screenshot).

The last thing we want to show is performance tweaking.
LinkFreezePageSizeLimit directive allows to restrict the maximum size of pages to process. mod_linkfreeze won't do anything with the part of page exceeding the LinkFreezePageSizeLimit value. By the way Google also doesn't enjoy parsing huge pages till the end. The value of directive should be specified in kilobytes:
LinkFreezePageSizeLimit 4096
And finally, NoCheckFile flag tells mod_linkfreeze not to check requested file for existence. For example if you go to http://site.com/static-link.html, by default mod_linkfreeze will check whether static-link.html exists on the disk and if it does, the module won't de-freeze the link but will return the file content instead. Omitting these checks is a good idea to boost the performance BUT only in case you're sure it won't harm you.

Summary

Well, we've done a lot with tiny effort. We described how to enable mod_linkfreeze and turn dynamic links to static ones. We've also explained the basic idea of «freezing» and told you some useful tricks.
Hope you will enjoy Helicon Ape and mod_linkfreeze!

Yours sincerely,
HeliconTech Team

Wednesday, August 12, 2009

Guide: Building FarCry CMS permalinks with Helicon Ape on IIS7

FarCry CMS is a popular content management solution built with FarCry Core (a web application framework based on the ColdFusion language). As this software is quite popular these days, we want to illustrate to those interested how to operate SEO-friendly URLs in FarCry with the help of Helicon Ape.
Prerequirements: Windows 2008/Vista, IIS7, MySQL, ColdFusion, Helicon Ape.

Step 1. MySQL

After MySQL installation run MySQL Command Line Client and execute the following command:
create database farcrydatabase;

Step 2. ColdFusion

Install Adobe ColdFusion. Make sure ColdFusion is registered in IIS:
  • open IIS Manager
  • open Handler Mappings snap-in
  • check if required handlers were added

Step 3. FarCry CMS

Download the latest version of FarCry CMS.

Unzip the installation package to C:\inetpub\wwwroot\farcry.
Create your project.
Specify project name, project folder and locale.
Select the FarCry CMS Datasource previously created in ColdFusion Administrator area.
Note: create your Datasource in ColdFusion Administrator area as shown below:
Standard FarCry URL looks like: index.cfm?objectid=E689D722-06DF-6D24-56726E44740068B5. Not a friendly URL at all…
After installation login to FarCry Administrator area, choose the Site section and Current Friendly URLs tab in the right side menu.
Click Manage button and add your SEO URL. Set Type of redirection to None and Redirect to to To the default FU because we don’t need a redirect, we only need to show the content of the real page.
We've just added friendly URL for our Support page but FarCry CMS says it is still not working.
We can check it by entering our SEO-friendly URL into the browser's address bar:
Now we'll eliminate this problem using Helicon APE. Download and install the latest version of Helicon APE. Launch Helicon APE and add these lines to .htaccess file in the root of your site:

RewriteEngine On
RewriteCond %{REQUEST_URI} !(^/farcry|^/webtop|^/flex2gateway|^/flashservices|^/cfide)($|/)
RewriteRule ^([^.]+)$ index.cfm?furl=/$1 [L,NC,QSA]

Save the config.
To make FarCry CMS apply these rules ColdFusion server needs to be restarted. So, go to ColdFusion installation folder (C:\ColdFusion8\) -> /bin/ and launch cfstop.bat file. Then start your ColdFusion server by launching cfstart.bat file.
 Now in Administrator area we can see that FarCry CMS has applied SEO-friendly URLs.
Visit your site to make sure that it is really true.
If the result resembles the one above, extend our congratulations – you’ve just set up SEO-friendly URLs for your FarCry site.

Regards, 
HeliconTech Team

Tuesday, August 4, 2009

Protecting image gallery with Helicon Ape mod_hotlink

What is mod_hotlink? It's a Helicon Ape module you'll undoubtedly enjoy. Why? 'Cause it'll help you avoid the headache when thinking of traffic leechers. It will do it all for you. And now we'll illustrate this ingenious process taking Gallery2 photo gallery as an example. To start we need the following ingredients:
  • IIS7-driven website (we use www.helicon_test.com)
  • Gallery2
  • Helicon Ape
Now we are ready to start cooking our healing soup. Firstly, you need to prepare (download and install) the Gallery2 product. Make sure it's fresh (working fine).
Next, we need to create a link to this gallery on our IIS7-driven site (we put it on the main page). That's the component we'll experiment with today.
Take a species mix called Helicon Ape from the shelf and be ready to use it in a moment (install Helicon Ape). Take a pinch of mod_hotlink species and add it to the pot (just uncomment one line).
Stir it all slowly. To reveal the whole bouquet of the dish (protect only necessary folder - in our case it's /gallery2/), we add
HotlinkProtect /gallery2/
and
SetOutputFilter hotlink
to feel the most delicate notes of taste (to enable links replacing mechanism).
After that the aroma (link to Gallery2) will change for the better (a dynamically generated signature will be appended).
Now all links from our site pointing to Gallery2 folder are dynamically signed and the signature is unique for each client (individual approach to each person is a key to success!), i.e. there's no way to get the content without this signature or fabricate it (the recipe is our top secret!).
Everyone who behaves badly and is not allowed to taste the dish or who tries to guess it's components (everyone who attempts to access protected resource with incorrect signature) will get 403 Forbidden response or will be turned out - redirected to the specified URL (RedirectURL parameter).
That is, mod_hotlink makes sure the user obtined the link from our site only (authentic and inimitate one, made acoounting for his preferences). And we have nothing to do with the site - all links on pages are transformed automatically on the fly (as if prepared in a microwave oven)!

Ok, let me see... Mmmm... Today our mod_hotlink-based dish is particularly delicious.
Bon appetit!