Attention! Helicon Tech Blog has moved to www.helicontech.com/articles/

Friday, December 19, 2008

Guide: URL-rewriting basics and map-files application

Lack of understanding of basic URL-rewriting concepts often leads to the problems with rules-writing. So we decided to give a brief and simple explanation of some general concepts.

URL-rewriting allows to substitute real (often ugly) URLs with pretty ones and expose them to users as well as to search engines. The idea is that the user requests for example http://www.site.com/pretty_file.htm (this link is indexed by search engines) and in reality browser shows the content of say http://www.site.com/index.aspx?id=123 (that is real physical file on your server).

Regular expressions empower you to create more complex and efficient rules and add conditions to gain better flexibility and performance.

There are several ways of writing rules; the choice depends on a specific situation.For example if you have these pages:

Real URLs            Rewritten (pretty) URLS
/index.php?q=444 => /page.html
/index.php?q=345 => /another-page.html
/index.php?q=999 => /about.html
You may EITHER use the following rules to implement rewriting functionality:
RewriteRule ^/page\.html$         /index.php?q=444 [NC,L]
RewriteRule ^/another-page\.html$ /index.php?q=345 [NC,L]
RewriteRule ^/about\.html$        /index.php?q=999 [NC,L]
OR
you may use map-files (which are preferable in this case):

Info: Map-file is a .txt file containing pairs of values written in two columns (as shown below). The first (left) column represents the value to which RewriteRule matching result will be compared, and the corresponding value in right column represents the result that will be placed into the substitution URL.

In our example we’ll create a text document (e.g. map.txt) in the folder with .htaccess and put in the following:

page           444
another-page   345
about          999

And our configuration file (.htaccess) will have the following look:

# Set a variable (“map”) to access map.txt from config
RewriteMap map txt:map.txt

# Use tolower function to convert string to lowercase
RewriteMap lower int:tolower

# Get requested file name
RewriteCond %{REQUEST_URI} ^/([^/.]+)\.html$ [NC]

# Seek file name in map-file
RewriteCond ${map:${lower:%1}|NOT_FOUND} !NOT_FOUND

# Perform rewriting if the record was found in map-file
RewriteRule .? /index.php?q=${map:${lower:%1}} [NC,L]
Helicon Ape Manager

Note! Map files are case-SENSITIVE. So, “Page” will not match “page”. That is why it is advisable to use tolower function that converts matched part to lowercase before comparing it with map-file entries. Don’t forget that in this case all map-file records should also be lowercase.

Map-files are particularly advantageous when you have to rewrite loads of URLs of the similar pattern. The first benefit is that map-file may have virtually unlimited size (up to several gigabytes); secondly, parsing of a large map-file is much faster than processing of a huge .htaccess (or httpd.conf). We don’t recommend using configuration files with more than 100-150 rules.

Hope this info made in easier for you to grasp the idea of URL-rewriting.

Looking forward to your comments and suggestions.

Sincerely Yours,
HeliconTech Team.

2 comments:

  1. Does the order of the map files have any affect on the performance of the ISAPI? Would a sorted list be faster than unsorted list (in case of string)?

    Thanks - BTW..good stuff!

    ReplyDelete
  2. The order doesn't matter. Mapfiles are processed really fast in any case.

    ReplyDelete