Attention! Helicon Tech Blog has moved to www.helicontech.com/articles/

Tuesday, August 18, 2009

Go for SEO with Helicon Ape mod_linkfreeze

Intro

SEO, SE-friendly, search engine marketing — these words are driving mad lots of people today. Everyone wants to be SE-friendly. According to the Wikipedia, search engine optimization (SEO) is the process of improving the volume or quality of traffic to a web site from search engines via «natural» («organic» or «algorithmic») search results. In other words, SEO simplifies search robots' job thus bringing the web site to the top of the search results. Having got the idea, you would probably like to use that technique everywhere. Any new web-site of yours will be optimized for search engine. And that’s pretty good, but not good enough. We've missed something here, huh? What about existent sites? What if they are really huge and require loads of code modifications. Are they doomed to have stupid links like index.php?id=123? Do you need to spend loads of hours tinkering with a source code? Definitely NO! Right here we are going to show you an easy and powerful way to let SEO in your server without significant effort. Luckily Helicon Ape just got all needed features.

mod_linkfreeze

Like its elder brother mod_linkfreeze provides extended toolset for changing links on pages to SE-friendly format. So, in general «freezing» idea is based on HTML content modification. Once you've written special rules, the module would carefully modify every hyperlink inside each web-site page if the link matches the rules pattern(s). In a word, mod_linkfreeze turns dynamic links to the static ones. It is a primary idea of the module and at the same time a good SEO practice as long as search robots work much better with static references. And the most interesting thing here — you don’t need to modify any part of existent code. Basic concept is depicted on this scheme:
The scheme reflects the whole life cycle of a request. As you can see user goes to a web-site and IIS serves the request through the web-site engine. It doesn’t matter what the engine is. It can be Wordpress, CakePHP or even simple HTML. The important part is that the engine returns HTML response within hyperlinks and then mod_linkfreeze catches and processes the response according to its rules. Dynamic links become static and the user finally gets requested page. Without mod_linkfreeze the step inside the orange rectangle won’t exist.

You may be wondering why we expect only IIS7 on the server (see the scheme). Bad news for Windows 2003 owners — mod_linkfreeze doesn’t work under IIS6. It's just technically impossible and we hope you will enjoy mod_linkfreeze on Windows 2008 Server.

Well, let the theoretical stuff blow away and make something real — move further and demonstrate the module dealing with a real web application.

Freezing phpBB — the online forum engine

We decided to play with a forum engine because forums usually contain tons of dynamic links. Although we've taken phpBB, you may use another web-application as far as the article describes mod_linkfreeze in general.

The forum links right after installation look like this:

As you can see, there are dynamic links to PHP scripts which we want to make static. Let's enable mod_linkfreeze. First of all we have to make sure the following line is uncommented in the server configuration (httpd.conf):
LoadModule linkfreeze_module   modules/mod_linkfreeze.so
Then we should enable linkfreeze filter. The easiest way is to write the following in httpd.conf:
SetOutputFilter linkfreeze
You may try mod_mime as well, setting up the filter on a specific extension only:
AddOutputFilter linkfreeze .php
Well, now we should edit .htaccess file in the root folder of phpBB and update it as follows:
LinkFreezeEngine on
Although that wasn't actually required. LinkFreezeEngine is switched on by default. However you may use this directive to disable the module in specific context. For example you may use mod_linkfreeze for several locations only:
LinkFreezeEngine Off
<Location /foo/>
        LinkFreezeEngine on
        ...
</Location>

<Location /bar/>
        LinkFreezeEngine on
        ...
</Location>
LinkFreezeRule is the magic and power of mod_linkfreeze. This directive controls the whole process of «freezing». Basic syntax is described in the documentation. Let's see what happens if we try this:
LinkFreezeEngine on
LinkFreezeRule --- php=html

Wow, great! It works. Two simple lines and we have static links. They don't look pretty enough yet, but firstly we will sort out what's going on and then will try to get better result.

So there are three hyphens straight after LinkFreezeRule. Do you remember that dynamic links have three separators as usual? The first is a question mark (?) for query string separation, the second is ampersand (&) for query string arguments separation and the third is equality sign (=) which separates arguments names and their values. Three hyphens are needed exactly to replace these separators in the following order: ?&=. We decided to use hyphens but it wasn't necessary.

Note! It is advisable to use rarely used symbols as replacement characters, otherwise conflicts with the same characters in the URLs are inevitable. We recommend the following combinations: ---, ~~~, !!!, |||, ===, ///. The characters may also be combined, e.g.: -=-, !/=, etc.

Next part of the rule is php=html. Obviously it means extension replacing from ‘php’ to ‘html’. You might decide to use something more funny:
LinkFreezeEngine on
LinkFreezeRule --- php=aspx
The links are static but look very weird. Query string arguments go right after the extension. Moreover we want to be sure all references have lower case. For these cases LinkFreezeRule supports optional flags. MoveExt moves file extension and LowerCase casts a link into lower case. Let's try it:
LinkFreezeEngine on
LinkFreezeRule --- php=aspx [MoveExt, LowerCase]
Oh no. Our session expired while we were writing previous paragraph and phpBB added some strange argument ‘sid’. It has gone after second page reloading, but we should expect the argument again. What can we do? Happily mod_linkfreeze prepared another useful flag for us — Params. It allows to specify only needed query string arguments for «freezing». In our case it would be like that:
LinkFreezeEngine on
LinkFreezeRule --- php=html [MoveExt, LowerCase, Params="u|g|f|p|mode|id|search_id"]
See we have only two lines of code. We didn't modify any part of phpBB, we didn't tweak IIS. We just wrote a few magic words. If sometime you realize you don't need mod_linkfreeze anymore, you may just replace on with off. Huge effort, huh? :)

We've finished with phpBB. Our site is configured and ready to go on air.

The last tricks

When you setup mod_linkfreeze on your server some users will still have old bookmarks. As long as links became SE-friendly you would probably like to redirect old references to the new ones. This is beneficial for SEO as it allows to eliminate the duplicate content problem and consequent penalty in search engines. To enable redirection, please use Redirect flag:
LinkFreezeEngine on
LinkFreezeRule --- php=html [Redirect, MoveExt, LowerCase, Params="u|g|f|p|mode|id|search_id"]
Left-side screenshot displays response without Redirect flag. User gets 200. When we add the flag, user gets 301 (right screenshot).

The last thing we want to show is performance tweaking.
LinkFreezePageSizeLimit directive allows to restrict the maximum size of pages to process. mod_linkfreeze won't do anything with the part of page exceeding the LinkFreezePageSizeLimit value. By the way Google also doesn't enjoy parsing huge pages till the end. The value of directive should be specified in kilobytes:
LinkFreezePageSizeLimit 4096
And finally, NoCheckFile flag tells mod_linkfreeze not to check requested file for existence. For example if you go to http://site.com/static-link.html, by default mod_linkfreeze will check whether static-link.html exists on the disk and if it does, the module won't de-freeze the link but will return the file content instead. Omitting these checks is a good idea to boost the performance BUT only in case you're sure it won't harm you.

Summary

Well, we've done a lot with tiny effort. We described how to enable mod_linkfreeze and turn dynamic links to static ones. We've also explained the basic idea of «freezing» and told you some useful tricks.
Hope you will enjoy Helicon Ape and mod_linkfreeze!

Yours sincerely,
HeliconTech Team

1 comment: