Attention! Helicon Tech Blog has moved to www.helicontech.com/articles/

Friday, February 20, 2009

ISAPI_Rewrite FAQ

No so long ago we’ve posted this FAQ on our forum and quite a lot of our clients found it helpful. This compilation covers some of the most tricky questions concerning ISAPI_Rewrite3 (and actually Helicon Ape mod_rewrite module). And we reckon that being here it will be help-at-hand for even more people.

Configuration file not loaded or not altered

It may happen that you get the “File not loaded” message in error.log or changes to the configuration file are not applied. Generally the issue in this case is with NTFS permissions.

How to make sure ISAPI_Rewrite is working at all?

  1. Check if there is up green arrow near ISAPI_Rewrite3 filter in IIS - Web sites properties - ISAPI filters tab.
  2. Put the following rule into httpd.conf file:
    RewriteEngine on
    RewriteRule .? - [F]

Make any request to the site. If result is "403 Forbidden", ISAPI_Rewrite works OK. If you get "404 Page not found", something might be wrong.

Logging issues

  1. Log file not created or is empty

    If you encounter such problem, you need to enable logging be putting the following lines into httpd.conf file (which resides in ISAPI_Rewrite installation folder):

    #enabling rewrite.log
    RewriteLogLevel 9
    #enabling error.log
    LogLevel debug
  2. Log not recreated after deletion

    To resolve the issue you should either

    • grant Everyone group Write permission for newly created log file (either rewrite.log or error.log);
    • grant to folder, where rewrite (and error) log are supposed to be created, Create & Write permissions for all users running application pools (by default it is IIS_WPG group).

Warning! Plesk creates new user for each application pool, so it is necessary to grant Create & Write permissions to each of them.

Don't forget to add logging directives (see clause 1 above) into your httpd.conf file.

Note: do not enable logging on live server – it is destined only for debugging purposes.

Images and CSS on pages corrupted

If this is your case, you are likely to use page-relative paths for your images and CSS and the problem occured because of altered base path.

Please consider changing the paths to root-relative (e.g. <img src="/image.jpg">) or absolute from. Or specify correct base path (e.g. <base href="/">) .

Initial query string is appended to rewritten url

For example you use the rule like:

RewriteEngine on
RewriteRule ^index\.php$ default.aspx [NC,R]

And when you request the url with query string like http://www.site.com/index.php?param=value you get the result http://www.site.com/default.aspx?param=value instead of desired http://www.site.com/default.aspx (without initial query string).

This happens because by default ISAPI_Rewrite attaches initial query string to rewritten url (if rewritten url doesn’t have its own query string). To avoid this you must add a question mark at the end of substitution pattern:

RewriteEngine on
RewriteRule ^index\.php$ default.aspx? [NC,R]

URLs with question mark don’t work

The problem is that you put a query string part of the url into RewriteRule statement and ISAPI_Rewrite 3 processes query string apart from the rest of the url in RewriteCond %{QUERY_STRING} statement.

So this won’t work:

RewriteEngine on
RewriteRule ^index\.php\?param=(\d+)$ default.asp?param2=$1? [NC,L]

And this will work:

RewriteEngine on
RewriteCond %{QUERY_STRING} ^param=(\d+)$ [NC]
RewriteRule ^index\.php$ default.asp?param2=%1? [NC,L]

Dealing with optional parameters

Sometimes you don’t now the exact number of parameters to be passed and want a universal rule to deal in all cases. Here’s a simple example for a situation when you may have either one or two parameters:

RewriteEngine on
RewriteRule ^folder/(\d+)(?:/(\d+))?/?$ index.asp?param1=$1?2¶m2=$2 [NC,L]

This rule will accept requests like http://www.site.com/folder/111/ and http://www.site.com/folder/111/222/ (with or without trailing slash) and direct them respectively to http://www.site.com/index.asp?param1=111 and http://www.site.com/index.asp?param1=111&param2=222.

Excluding specific folders from being rewritten

If it is necessary to exclude some folders from being processed by ISAPI_Rewrite, and to redirect others to for example index.asp, the following piece of code will be helpful:

RewriteEngine on
RewriteBase /
RewriteRule ^(?!(?:exfolder1|exfolder2|etc)/.*).+$ index.asp [NC,R=301,L]

Comprehensive map files insight

  1. How to lowercase matched string before comparison with map file entries?
    RewriteEngine on
    RewriteBase /
    RewriteMap mapfile txt:mapfile.txt
    RewriteMap lower int:tolower
    RewriteRule ^products/([^?/]+)\.asp$ productpage.asp?productID=${mapfile:${lower:$1}}

    Note: entries in your mapfile.txt must also be lowercase.

    Note: In builds 3.1.0.62 and higher it's possible to make case-insensitive comparison by simply adding [NC] flag after mapfile definition:

    RewriteMap mapfile txt:mapfile.txt [NC]
  2. How to check if the value is present in map file prior to processing the URL to avoid fruitless actions?

    Please add the following condition before the rule dealing with map file to accomplish this:

    RewriteCond ${mapfile:$1|NOT_FOUND} !NOT_FOUND
  3. Applying map file only if specific pattern is matched

    Please add the following condition before the rule dealing with map file to accomplish this:

    RewriteMap mymap txt:map.txt
    RewriteMap mylower int:tolower
    RewriteCond %{REQUEST_URI} ^/([^/]+)/?$
    RewriteCond ${mymap:${mylower:%1}|NOT_FOUND} !NOT_FOUND
    RewriteRule .? ${mymap:${mylower:%1}} [NC,L]

    Line-by-line explanation:
    1 line: we declare map file 'mymap' which will read content from map.txt file.
    2 line: we declare map file 'mylower', which provides access to pre-defined internal function ToLower. It converts input string to lower case.
    3 line: we are catching the request URI of the specified pattern.
    4 line: we are trying to get a value from map file, using requested URI in lower case as a key.
    5 line: if the value is found, RewriteRule fires.

  4. Using 2 map files to redirect old "ugly" URLs to pretty ones but still show their original content

    More and more people are asking for realization of the following behavior:

    Requested URL: http://www.site.com/index.asp?param1=value&param2=value2 Address bar: http://www.site.com/keyword1 Content shown: http://www.site.com/index.asp?param1=value&param2=value2

    Surely one usually needs to rewrite a large number of URLs of similar pattern, hence map files are most suitable for this situation. We'll need 2 map files with reverse content:

    mapfile.txt
    param1=value1¶m2=value2 keyword1
    etc.

    revmapfile.txt
    keyword1 param1=value¶m2=value2
    etc.

    And the rules resolving the issue look like:

    RewriteEngine on
    RewriteBase /
    RewriteMap mapfile txt:mapfile.txt [NC]
    RewriteMap revmapfile txt:revmapfile.txt [NC]
    RewriteCond %{QUERY_STRING} (.+)
    RewriteRule ^index\.asp$ ${mapfile:%1}? [NC,R=301,L]
    RewriteCond ${revmapfile:$1|NOT_FOUND} !NOT_FOUND
    RewriteRule ^([^/]+)$ index.asp?${revmapfile:$1} [NC,L]

(Un)installation problems

It's not likely, but if you encounter some troubles (un)installing ISAPI_Rewrite 3, firstly, please try to re-download the installation package (it could be corrupted during downloading) and start installation again. If this doesn't help, please run installation from the command line with the following keys to generate (un)installation log:

For installation:

msiexec /i rewriteXXX.msi /l* log.txt

For uninstallation:

msiexec /x rewriteXXX.msi /l* log.txt

This log will be helpful for investigation of your errors.

How to block specific IP ranges using ISAPI_Rewrite3?

It is often needed to prevent your site from being accessed from specific IP addresses or ranges of IP addresses. Here's an example of blocking two ranges: 203.207.64.0 - 203.208.19.255 203.208.32.0 - 203.208.63.255

ISAPI_Rewrite code is:

RewriteEngine on
RewriteCond %{REMOTE_ADDR} ^203\.20(?:7\.(?:[6-9][0-9]|\d{3})|8\.(?:1?[0-9]|4|5[0-9]|3[2-9]|6[0-3]))\.\d{1,3}$
RewriteRule .? - [F]

ISAPI_Rewrite not working under Visual Studio 2005

The issue occurs because by default Visual Studio uses it's internal web server, not IIS. To resolve the issue please right-click on your project, go to Property pages -> Start Options -> Server -> select Use custom server and in Base URL field specify the path to your site.

Why don't ISAPI_Rewrite and IISPassword get on well?

The issue is that IISPassword and ISAPI_Rewrite use the same configuration file name - .htaccess, but unlike ISAPI_Rewrite, IISPassword fails if any unknown directives are found in it. So to avoid conflicts you need to change configuration file name for either product. To change configuration file name for ISAPI_Rewrite use AccessFileName directive in httpd.conf file.

How to block web-spiders using ISAPI_Rewrite3?

Here's the solution to get rid of lots of web-spiders:

RewriteEngine on
RewriteCond %{HTTP:User-Agent} (?:WebBandit|2icommerce|\Accoona|ActiveTouristBot|
adressendeutschland|aipbot|Alexibot|Alligator|\AllSubmitter|
\almaden|anarchie|Anonymous|Apexoo|Aqua_Products|asterias|\ASSORT|
ATHENS|AtHome|Atomz|attache|autoemailspider|autohttp|b2w|bew|
\BackDoorBot|Badass|Baiduspider|Baiduspider+|BecomeBot|berts|
Bitacle|Biz360|\Black.Hole|BlackWidow|bladderfusion|BlogChecker|
BlogPeople|BlogsharesSpiders|\Bloodhound|BlowFish|BoardBot|
Bookmarksearchtool|BotALot|BotRightHere|\Botmailto:craftbot@yahoo.com|
Bropwers|Browsezilla|BuiltBotTough|Bullseye|\BunnySlippers|Cegbfeieh|
CFNetwork|CheeseBot|CherryPicker|Crescent|charlotte/|\ChinaClaw|
Convera|Copernic|CopyRightCheck|cosmos|Crescent|c-spider|curl|
Custo|\Cyberz|DataCha0s|Daum|Deweb|Digger|Digimarc|digout4uagent|
DIIbot|DISCo|DittoSpyder|\DnloadMage|Download|dragonfly|DreamPassport|
DSurf|DTSAgent|dumbot|DynaWeb|e-collector|\EasyDL|EBrowse|eCatch|ecollector|
edgeio|efp@gmx.net|EirGrabber|EmailExtractor|EmailCollector|\EmailSiphon|
EmailWolf|EmeraldShield|Enterprise_Search|EroCrawler|ESurf|Eval|
Everest-Vulcan|\Exabot|Express|Extractor|ExtractorPro|EyeNetIE|FairAd|
fastlwspider|fetch|FEZhead|FileHound|\findlinks|FlamingAttackBot|
FlashGet|FlickBot|Foobot|Forex|FranklinLocator|FreshDownload|
\FrontPage|FSurf|Gaisbot|Gamespy_Arcade|genieBot|GetBot|Getleft|GetRight|
GetWeb!|Go!Zilla|\Go-Ahead-Got-It|GOFORITBOT|GrabNet|Grafula|grub|
Harvest|HatenaAntenna|heritrix|HLoader|\HMView|holmes|HooWWWer|
HouxouCrawler|HTTPGet|httplib|HTTPRetriever|HTTrack|humanlinks|
\IBM_Planetwide|iCCrawler|ichiro|iGetter|ImageStripper|
ImageSucker|imagefetch|imds_monitor|\IncyWincy|IndustryProgram|
Indy|InetURL|InfoNaviRobot|InstallShieldDigitalWizard|InterGET|
\IRLbot|Iron33|ISSpider|IUPUIResearchBot|Jakarta|java/|JBHAgent|
JennyBot|JetCar|jeteye|jeteyebot|JoBo|\JOCWebSpider|Kapere|Kenjin|
KeywordDensity|KRetrieve|ksoap|KWebGet|LapozzBot|larbin|leech|LeechFTP|
\LeechGet|leipzig.de|LexiBot|libWeb|libwww-FM|libwww-perl|LightningDownload|
LinkextractorPro|Linkie|\LinkScan|linktiger|LinkWalker|lmcrawler|
LNSpiderguy|LocalcomBot|looksmart|LWP|MacFinder|MailSweeper|\mark.blonin|
MaSagool|Mass|MataHari|MCspider|MetaProductsDownloadExpress|
MicrosoftDataAccess|MicrosoftURLControl|\MIDown|MIIxpc|Mirror|Missauga|
MissouriCollegeBrowse|Mister|Monster|mkdb|moget|Moreoverbot|
mothra/netscan|\MovableType|Mozi!|Mozilla/22|Mozilla/3.0(compatible)|
Mozilla/5.0(compatible;MSIE5.0)|MSIE_6.0|MSIECrawler|\MSProxy|MVAClient|
MyFamilyBot|MyGetRight|nameprotect|NASASearch|Naver|Navroad|
NearSite|NetAnts|netattache|\NetCarta|NetMechanic|NetResearchServer|
NetSpider|NetZIP|NetVampire|NEWTActiveX|Nextopia|NICErsPRO|ninja|
\NimbleCrawler|noxtrumbot|NPBot|Octopus|Offline|OKMozilla|OmniExplorer|
OpaL|Openbot|Openfind|OpenTextSiteCrawler|\OracleUltraSearch|OutfoxBot|
P3P|PackRat|PageGrabber|PagmIEDownload|panscient|PapaFoto|pavuk|
pcBrowser|\perl|PerMan|PersonaPilot|PHPversion|PlantyNet_WebRobot|
playstarmusic|Plucker|PortHuron|ProgramShareware|\ProgressiveDownload|
ProPowerBot|prospector|ProWebWalker|Prozilla|psbot|psycheclone|puf|
PushSite|\PussyCat|PuxaRapido|Python-urllib|QuepasaCreep|QueryN|
Radiation|RealDownload|RedCarpet|RedKernel|\ReGet|relevantnoise|
RepoMonkey|RMA|Rover|Rsync|RTG30|Rufus|SAPO|SBIder|scooter|ScoutAbout|
script|\searchpreview|searchterms|Seekbot|Serious|Shai|shelob|
Shim-Crawler|SickleBot|sitecheck|SiteSnagger|\SlurpyVerifier|SlySearch|
SmartDownload|sna-|snagger|Snoopy|sogou|sootle|So-net"bat_bot|
SpankBot"bat_bot|\spanner"bat_bot|SpeedDownload|
Spegla|Sphere|Sphider|SpiderBot|sproose|SQWebscanner|Sqworm|
Stamina|\Stanford|studybot|SuperBot|SuperHTTP|Surfbot|SurfWalker|
suzuran|Szukacz|tAkeOut|TALWinHttpClient|\tarspider|Teleport|Telesoft|
Templeton|TestBED|TheIntraformant|TheNomad|TightTwatBot|
Titan|\toCrawl/UrlDispatcher|True_Robot|turingos|TurnitinBot|
TwistedPageGetter|UCmore|UdmSearch|\UMBC|UniversalFeedParser|
URLControl|URLGetFile|URLyWarning|URL_Spider_Pro|UtilMind|vayala|
\vobsub|VCI|VoidEYE|VoilaBot|voyager|w3mir|WebImageCollector|
WebSucker|Web2WAP|WebaltBot|\WebAuto|WebBandit|WebCapture|
webcollage|WebCopier|WebCopy|WebEMailExtrac|WebEnhancer|
WebFetch|\WebFilter|WebFountain|WebGo|WebLeacher|WebMiner|
WebMirror|WebReaper|WebSauger|WebSnake|Website|\WebStripper|WebVac|
webwalk|WebWhacker|WebZIP|WellsSearch|WEPSearch00|WeRelateBot|
Wget|WhosTalking|\Widow|WildsoftSurfer|WinHttpRequest|WinHTTrack|
WUMPUS|WWWOFFLE|wwwster|WWW-Collector|Xaldon|Xenu's|\Xenus|XGET|
Y!TunnelPro|YahooYSMcm|YaDirectBot|Yeti|Zade|ZBot|zerxbot|Zeus|ZyBorg) [NC]
RewriteRule .? - [F]

Hope this article has answered some of your questions and helped you master some of ISAPI_Rewrite basic features. We are going to add more unobvious issues to this list to make your enjoy every rule you write.

Best wishes,
HeliconTech Team

Wednesday, February 18, 2009

HTTP Authentication and Authorization

Let's start with definitions:
  • Authentication or authenticity check is a comparison of person's real credentials with the ones he enters (e.g. login and password).
  • Authorization is the process of granting rights to a user (or a group of users) to perform specific actions based on evaluation of necessary parameters.
Helicon Ape (as well as Apache) uses the following authorization and authentication modules:
For authentication types (AuthType directive):
  • mod_auth_basic (Basic authentication)
  • mod_auth_digest (Digest authentication)
For authentication providers (login/password verification):
  • mod_authn_alias
  • mod_authn_anon
  • mod_authn_dbd
  • mod_authn_dbm
  • mod_authn_default
  • mod_authn_file
  • mod_authnz_ldap
For authorizators (Require directive; verify if authenticated user is allowed to access):
  • mod_authnz_ldap
  • mod_authz_dbm
  • mod_authz_default
  • mod_authz_groupfile
  • mod_authz_owner
  • mod_authz_user

Authentication and authorization - how they work?

Authentication/authorization process happens in 3 steps:
Receipt of authentication data. On this stage mod_auth_basic or mod_auth_digest is operating. They read Authentication request header and retrieve authentication credentials. For Basic authentication it's just a username:password pair in base64 encoding. For Digest authentication it's MD5-Digest of username, password, authname and other parameters that we'll mention later in more details.
Authentication (verification of authentication data). On this step mod_authn_*** modules verify authentication data. mod_authn_file module, for instance, looks for username:password pair in the text file. The result may be: authenticated successfully (AUTH_GRANTED), access denied (AUTH_DENIED) or user not found (AUTH_USER_NOT_FOUND).
Authorization (rights granting). On this final stage mod_authz_*** modules verify if authenticated user may access the resource. For example, having Require user tomas specified in the config, mod_authz_user will concede access only to tomas user and will deny it for anyone else. If Require valid-user is set, mod_authz_user will grant to access to all successfully authenticated users.

Basic authentication: mod_auth_basic

Now we'll look more closely at mod_auth_basic module. Here's the content of .htaccess in c:\inetpub\wwwroot\private\ corresponding to URI /private/:
# Authentication type
AuthType Basic
# Name of zone authentication will be used for (aka realm)
AuthName "private zone"

# Authentication provider. Here - mod_authn_file
AuthBasicProvider file
# Info for mod_authn_file - path to password file
AuthUserFile c:\inetpub\secured\.htpasswds
# Access will be granted to authenticated user john,
# i.e. only john will be authorized
Require user john
Here comes request to /private/. During request processing mod_auth_basic verifies whether requested resource may be accessed by this user. It searches Authentication header in the request. If there's no such header, module stops request processing and server gives out "401 Unauthorized" response with WWW-Authenticate: Basic realm="private zone" header. Having received such response browser prompts to enter username and password for 'private zone'. After the necessary data was entered the browser sends the same request with Authentication header. Username and password are encoded using base64 and look as follows:
base64encode('john:secret') -> 'am9objpzZWNyZXQ='
Authentication header looks like:
Authentication: Basic am9objpzZWNyZXQ=
Now during request processing mod_auth_basic will retrieve authentication data from Authentication header:
base64decode('am9objpzZWNyZXQ=') -> 'john:secret'
These username and password are then passed by mod_auth_basic to authentication provider (in the above example it's mod_authn_file) for verification. In case of successful verification request processing goes on, in the event of failure mod_auth_basic stops request processing and server gives out 401 Unauthorized.

Authentication provider: mod_authn_file

Let's now have a look at mod_authn_file. This module is an authentication provider for mod_auth_basic and mod_auth_digest. mod_authn_file performs a search of username:password pair in the text file. The file may be created manually or using Password utility included to Helicon Ape Manager (Options -> Insert user password...).
Helicon Ape Manager
mod_auth_basic password generation
Helicon Ape .htpasswd
To enable authentication via mod_authn_file module you should specify
# for Basic
AuthBasicProvider file
# or for Digest
AuthDigestProvider file
and path to password file
AuthUserFile c:\inetpub\secured\.htpasswd
Note! Path may be absolute or .htaccess-relative.
This authentication provider is the most used one as it's fast and easy to use. Besides, password file may be edited manually (e.g. comment out some user using # character). The drawback is slow processing of large password files. For security reasons it is not advisable to put password file to the root of the site.

Authorization: mod_authz_user

mod_authz_user is used to authorize authenticated user. In other words the user that was successfully authenticated (username:password matched) is granted or prohibited access to the requested resource.
This module performs a check of Require directive. The line
Require valid-user
means that the module will authorize (grant access) all authenticated users. The line
Require user john tom
says that the module will only authorize john and tom users.

Host-dependent authorization: mod_authz_host

mod_authz_host module stands detached from other authorization modules. During request processing this module is invoked earlier than other authentication/authorization modules and is used to control access based on client host data (host name, address) and request parameters (via environment variables). This is probably the most popular Apache module for access control.
The module uses 3 directives: Order, Allow, Deny. Order directive defines the sequence of rules validation:
Order allow,deny
means that Allow rules will be checked first and Deny rules will be checked after. If no rules are specified, the default action is Deny all. Directive
Order deny,allow
means the opposite. Deny rules will be checked first and Allow rules will be checked after. Default action is Allow all. Allow and Deny directives define the rules for the check:
# allow all clients from  .org zone
Allow from .org
# three identical rules: allow from 192.168 subnet
Allow from 192.168
Allow from 192.168.0.0/16
Allow from 192.168.0.0/255.255.0.0
# deny from the following IPv6 adress
Deny from 2001:db8::a00:20ff:fea7:ccea
The rules are checked until the first match. The rule gets matched if it corresponds to the client info. If Allow rule gets matched, access is granted; if Deny one, access is denied.

User group authorization: mod_authz_groupfile

mod_authz_groupfile module provides athorization of authenticated user based on its membership in some group. Example:
Require group developers managers
Groups and their members are defined in a plain text file: each line starts with a group name: and a space- or tab-separated list of group members. Example:
# file c:\inetpub\secured\groupfile.txt
testers: tom tony
developers: jack john
managers: jane bill
File path is set by AuthGroupFile directive:
AuthGroupFile c:\inetpub\secured\groupfile.txt
The aforementioned syntax of Require directive may be rewritten in the following way:
Require user tom tony jack john jane bill
Utilization of mod_authz_groupfile allows to group users and then apply group-based access policies.

Anonymous authorization: mod_authn_anon

mod_authn_anon provides authentication of anonymous users. The username is usually represented by anonymous (but may be different) and password is user's email. This email may be saved to log. Together with other authentication providers (e.g., mod_authn_file) mod_authn_anon makes it possible to monitor access of registered users and have the site opened for non-registered users as well. Here's an example of mod_authn_anon config:
AuthName "Protected area"
# authentication type
AuthType Basic
# list of authentication providers, applied sequentially
AuthBasicProvider file anon
# path to password file for mod_authn_file
AuthUserFile c:\inetpub\secured\.htpasswd# mod_authn_anon parameters
# can the name fiels be empty or any (on or off)
Anonymous_NoUserID off
# can the password (email) be empty (on or off)
Anonymous_MustGiveEmail on
# check if the password is email (on or off)
Anonymous_VerifyEmail on
# log email (on or off)
Anonymous_LogEmail on
# list of anonymous users
Anonymous anonymous guest www test welcomeRequire valid-user
Note! mod_authn_anon can work only with Basic authentication. Email check (enabled by Anonymous_MustGiveEmail on directive) is rather trivial - the line must contain '@' and '.' characters.

Authentication and Authorization fallbacks

mod_authn_default module is the last, fallback module in authentication process. If no authentication module is configured for the requested resource (e.g., mod_auth_basic with AuthType Basic etc.), mod_authn_default simply rejects any authentication data and terminates request prosessing with the result 401 Authorization Required. This may happen if mod_auth_basic is not authoritative (AuthBasicAuthoritative Off - see next chapter) and cannot authenticate the user.
mod_authz_default module is the last, fallback module in authorization process. If no authorization module fired for the request (e.g., mod_authz_user with Require user john), and that is possible if Require contains some unidentified values (e.g., Require unknown requirement), mod_authz_default simply terminates request prosessing with the result 401 Authorization Required.

Authoritative modules

Authentication (mod_auth_basic, mod_auth_digest) and authorization (mod_authz_user, mod_authz_groupfile) modules have directives that define their authoritarianism: AuthBasicAuthoritative On|Off, AuthzUserAuthoritative On|Off, AuthzGroupFileAuthoritative On|Off. These directives entitle consequent modules to continue authentication/authorization process. By default the modules are autoritative - Auth*Authoritative On.
In mod_auth_basic authoritarianism works in the following manner. As a rule each authentication provider listed in AuthBasicProvider directive attempts to authenticate the user and if the user isn't found by the module, access will be denied saying 401 Authorization Required. Disabling module authoritarianism - AuthBasicAuthoritative Off - in such cases gives other modules (e.g. third-party modules that cannot be defined in AuthBasicProvider directive) a chance to authenticate user and not deny access immediately. Request processing order for these modules is not configured and is defined in their source code.
Disabling authoritarianism for mod_authz_user - AuthzUserAuthoritative Off - allows other modules (e.g., mod_authz_groupfile) continue authorization if mod_authz_user failed to find information about authenticated user. Example:
...
AuthzUserAuthoritative Off
Require user john group developers
mod_authz_user can't authorize tom user (as only john user is mentioned in Require statement) but as it's not authoritative, it gives mod_authz_groupfile an opportunity to check whether tom belongs to developers group.
If none of the modules (being non-authoritative) has managed to authorize the user, the last in the queue will be mod_authz_default that will give out 401 Authorization Required.

Conclusion

In this article we tried to cover all basic aspects of authentication and authorization processes, help you gain clear understanding of these matters with our easy-to-grasp examples and tell about some non-evident issues. If someone has found at least some bytes of helpful info in our article, our efforts were not vain, 'cause it's a great pleasure to be able to do something for you.

Thursday, February 12, 2009

ISAPI_Rewrite 3.1.0.60

We are happy to announce the release of new build of ISAPI_Rewrite!

We have fixes some bugs related to:

  • conversion of RewriteProxy directive from v2 syntax
  • multiple Set-Cookie: headers processing in proxy extension
  • PathInfo processing in proxy extension, that earlier caused problems with some AJAX apps and web services

and added some new features:

  • now error.log and rewrite.log correctly display Unicode characters
  • option to set/unset .htaccess Hidden attribute optionally.ConfigEditor

Notice that latest builds of ISAPI_Rewrite operate on Windows Server 2008 R2 (IIS7).

In this build we attempted to meet some of our customers’ frequently requested features. If you see any other possibilities to make our products better, please don’t hesitate to write proposals into Comments.

Wednesday, February 11, 2009

Helicon Ape Manager - 10 helpful tips

Although we tried to make Helicon Ape Manager as easy-to-use, friendly and intuitive as possible, some features may not be obvious and are worth mentioning.

1. Password generation utility

Create an empty file. Go to Options -> Insert user password... and you are at the right place.02 - Password Generator

This utility allows you to generate passwords for Basic and Digest authentication types using MD5 or SHA1 encryption or plain text. 02 - Password Generator 2

02 - Password Generator 3

If you want to create several passwords, you may use Ctrl+Enter hotkey to prevent Password generator from closing after the password is inserted.

2. Regex testing utility

When you put the cursor anywhere on RewriteRule line, you can test its operability using RegexTest utility. Just press F4 or go to Options -> RewriteRule tester...03 - Regex 03 - Regex 2

3. Magnet

This small but sometimes really useful button (next to Options menu item) allows to open other configuration file (from other location) next to the tab with currently opened config (not instead of it).04 - Magnet

4. Directives Autocompletion

Helicon Ape Manager supports directives autocompletion which is invoked using standard Ctrl+Space hotkey.01 - AutoComplition

5. Errors highlighting

If you type some directive that is unsupported in the current context, you will see it painted in red.07  - Errors highlighting

6. Open selected file

This feature allows to instantly open an existing file which name and path (absolute or relative to the current folder) are written in the config. For that purpose you may either right-click on the file name and choose "Open selected file" or use File -> Open selected file... (Ctrl+Shift+O) menu item. 05 - Open selected File

7. Open site in browser from Manager

Helicon Ape Manager allows to open any location of the website in default browser by simply right-clicking on desired resource and choosing one of supported protocols (ports).06 - Browse

8. Web.config editing

Apart from it's direct purpose to make editing of Helicon Ape configs fast and handy, Helicon Ape Manager allows to edit web.config files as well.08 - Web

9. Marked folders with .htaccess or web.config

To make it easier for you to know which folders contain .htaccess files, which contain web.configs and which both, we've used this color scheme:

  1. with .htaccess - folder name in black
  2. with web.config - folder name with light orange background
  3. with both - folder name in black with light orange background
  4. without either - folder mane in grey 09 - Tree

10. Multiple lines comment

This feature may look obvious for those using Microsoft Visual Studio, but that does not diminish its convenience. You may comment/uncomment multiple lines in Helicon Ape Manager in one click.10 - Comments

Some of the above features may not have been as obvious as we thought, but we want you to use them not to complicate simple things, that's why we've written this small but hopefully helpful article. So, enjoy!

Tuesday, February 3, 2009

Exploding myths about mod_rewrite. Part I

For many years mod_rewrite was considered enigmatic and incomprehensible voodoo art. We want to shatter this myth and illustrate that mod_rewrite can be easy and handy instrument for everyone who is at least slightly acquainted with regular expressions (more info on regular expressions may be obtained here http://www.regular-expressions.info/).

What can and what cannot be done with mod_rewrite

000 - The begining

mod_rewrite processes (verifies, changes, adds and deletes) any incoming (request) headers(highlighted in yellow).  Usually most of the work is done on URL part of request. It is divided by browser into parts that are then transferred to server in separate headers (first line of the request and Host: header). That’s why in it’s unlikely to meet the whole URL in mod_rewrite directives.

Despite its versatility mod_rewrite is NOT capable of processing response headers and response body.

mod_rewrite directives and order of processing

mod_rewrite module possesses 10 directives but 2 of them are used much more often that others: RewriteRule and RewriteCond.

Whenever you want to put RewriteRule directives into some context, you must start your config with the following directive:

RewriteEngine on

All mod_rewrite directives but RewriteCond and RewriteRule (and their extended alternatives RewriteHeader and RewriteProxy) may be written in any order and even several times, in such case the module will accept only the last (close to the bottom of the config) value.

RewriteRule directives are processed from top to bottom, RewriteCond directives refer only to one subsequent RewriteRule. (see detals below).

Simple example

As an example we'll take small .htaccess in the root of the site:

RewriteEngine on
RewriteRule robots\.txt robots.asp [NC]

and robots.asp:

<H1>Asp function "Now"</H1>
<%=now()%>

000 - Simple Rule Scheme

This rule replaces requested robots.txt (this file may not even exist on the server) with real dynamic robots.asp file. [NC] flag makes strings comparison case-insensitive. Clients requesting robots.txt have no idea of what’s happening on server, i.e. they don’t know that instead of static robots.txt they will get dynamically generated response (e.g. as on the picture below).

001 - Simple Rule - Result

Another simple example

To understand the logic of config processing let’s make it a little more complicated and add the rule rewriting default.htm with default.asp to the end of .htaccess. How will this config be processed?

RewriteEngine on
RewriteRule robots\.txt robots.asp [NC]
RewriteRule default\.htm default.asp [NC]

000 - Simple Rule

For requests to any file in the root of the site the sequence will be the following:

  1. If “default.htm” is requested, initial URL is compared with “robots.txt”. Strings don’t match, no rewriting occurs.
  2. “default.htm” is compared with “default.htm”. Strings match, URL is rewritten to “default.asp”.

or

  1. If “robots.txt” is requested, initial URL is compared with “robot.txt”. Strings match, URL is rewritten to robot.asp.
  2. Then rewritten URL - robots.asp – is compared with default.htm. Strings don’t match. (But this last comparison is obviously unnecessary!)

Processing stops.

In other words: all requested URLs are compared with all rules regardless of whether one of the rules has already matched or not. This leads to excessive actions. That’s why we’ll add [L] flag that terminates rules processing if the current rule matched.

RewriteEngine on
RewriteRule robots\.txt robots.asp [NC,L]
RewriteRule default\.htm default.asp [NC,L]
  1. If “default.htm” is requested, the sequence is the same as in the previous scenario because the rule that matches requested URL is the last rule.
  2. If “robots.txt” is requested, initial URL is compared with “robots.txt”. Strings match, URL is rewritten to robot.asp, BUT the second rule is ignored and processing stops.

001 - Rule with flag L

That’s why it’s better to place frequently used rules at the top of the config.

 Important note!

We’ll wander off the topic a little and explain one imperceptible thing about strings comparison in regular expressions. If ^ character is not specified at the beginning of match pattern, regexp mechanism will look for the substring starting from all possible positions in the string.

Example:

We request default.htm and regular expression is robots\.txt. Regexp mechanism compares:

  1. default.htm ≠ robots\.txt
  2. efault.htm ≠ robots\.txt
  3. fault.htm ≠ robots\.txt
  4. ault.htm ≠ robots\.txt
  5. ult.htm ≠ robots\.txt
  6. lt.htm ≠ robots\.txt
  7. t.htm ≠ robots\.txt
  8. .htm ≠ robots\.txt
  9. htm ≠ robots\.txt
  10. tm ≠ robots\.txt
  11. m ≠ robots\.txt

and only after that regexp mechanism will inform you that the rule was not matched!

But if we add ^ at the beginning and $ at the end of each RewriteRule,

RewriteEngine on
RewriteRule ^robots\.txt$ robots.asp [NC,L]
RewriteRule ^default\.htm$ default.asp [NC,L]

001 - Rule with flag L with right rule

comparison will only occur once:

  1. default.htm ≠ robots\.txt

that is much faster. We strongly recommend to add  ^ wherever possible.

Using conditions

It’s often necessary to apply additional conditions to the rule. RewriteCond directive is destined for such purposes.

Say we need to dynamically generate gif and jpg images and there are two scripts for that -render_gif.asp and render_jpg.asp. mod_rewrite config will look like this:

RewriteEngine on
RewriteRule ^(.*)\.gif$ render_gif.asp?file=$1 [NC,L]
RewriteRule ^(.*)\.jpg$ render_jpg.asp?file=$1 [NC,L]

Now we’ll add a condition: if requested gif or jpg file physically exists on the disk, return real file. This check may be performed using the following directive:

RewriteCond %{REQUEST_FILENAME} !-f

And the config will become:

RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)\.gif$ render_gif.asp?file=$1 [NC,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)\.jpg$ render_jpg.asp?file=$1 [NC,L]

004 - Rule with conds 2

We want to drag your attention to the following:

  1. One or several consecutive RewriteCond conditions preceding RewriteRule (RewriteProxy, RewriteHeader) directive affect this only directive. This means that if you have 2 rules and you want to apply the same conditions to both of them, these conditions should be put before ach rule (see the last piece of code above).
  2. RewriteCond directives will only be processed if the first (left) part of RewriteRule matched.

Order of processing RewriteRule with RewriteCond

Let’s take the config that allows to prevent hotlinking. This config blocks requests to gif and jpg files which don’t have referrer value or their referrer doesn’t start with http://www.example.net:

RewriteEngine on
RewriteCond %{HTTP_REFERER} .
RewriteCond %{HTTP_REFERER} !^http://www\.example\.net [NC]
RewriteRule \.(jpe?g|gif)$ - [F]

004 - Rule with conds 4

The config is processed in the following order:

  1. RewriteRule checks whether gif or jpg file is requested using this regular expression \.(jpe?g|gif)$.
  2. First RewriteCond checks if Referer: header has a value. To be exact it makes sure it contains at least one character. If it does, processing moves on.
  3. Second RewriteCond checks if Referer value starts with http://www.example.net. If it doesn’t, control goes back to RewriteRule, if it does, processing terminates and user gets requested resource.
  4. If all directives matched, then the image was requested from some unwanted site and  RewriteRule returns “403 – Forbidden”.

Comments:

  • RewriteCond directives are processed consequently from top to bottom but only AFTER the first part of RewriteRule gets matched against requested resource.
  • The second part of RewriteRule will be executed only if ALL RewriteCond directives get matched.
  • All flags (but [NC]) are taken into account and executed after RewriteRule execution. If the flags do not prohibit further processing ([L], [F], [G], [P], etc), next RewriteRule is processed.

The following picture illustrates the result of the above rule. As you can see, for request /test.jpg referrer did not start with http://www.example.com and the result was “403 - Forbidden”.

004 - Result of Rule with conds -F

Using variables in RewriteCond’s and format string

Let’s take an example allowing to emulate several sites on one IIS site while placing them into different directories:

RewriteEngine on
RewriteCond %{HTTP:Host} ^(www\.)?(.+)
RewriteRule (.*) /%2$1 [L]

Say the request is http://example.net/robots.txt

  1. /index.php is matched against (.*) and is saved in $1 variable
  2. %{HTTP:Host} represents example.net
  3. example.net is compared with ^(www\.)?(.+), matches it and is saved in %2
  4. /%2$1 represents /example.net/index.php

Now all files requested on http://example.net/ will be searched not in C:\inetpub\wwwroot\ but in C:\inetpub\wwwroot\example.net\ folder.

004 - Rule with conds 5 (2)

In green areas you may and should use the following types of variables:

  • %{ServerVariable}
  • header sent in the HTTP request %{HTTP:header}
  • map file values ${mapname:key|default}
  • environment variables %{ENV:variable}
  • back references $0-$9 (to match groups in the first part of RewriteRule) and %1-%9 (to match groups in the second part of RewriteCond). Notice that %n may be used only in left (green) part of consequent RewriteCond’s or right (green) part of RewriteRule; $n may be used anywhere inside green areas.

Yellow areas require regular expressions. $n and %n are not possible.

Conditional operators in format string

(extended functionality, don’t use in Apache)

Here’s the config allowing redirection of non-www requests to www:

RewriteEngine on
RewriteCond %{HTTPS} (on)?
RewriteCond %{HTTP:Host} ^(?!www\.)(.+)$ [NC]
RewriteCond %{REQUEST_URI} (.+)
RewriteRule .? http(?%1s)://www.%2%3 [R=301,L]

007 - Rule with optional conds

  1. RewriteRule matches any requested resource.
  2. RewriteCond checks if HTTPS is switched on for this request. If yes, on value is saved in %1 variable, if no, %1 remains empty.
  3. RewriteCond checks if Host: header not starts with www. If it doesn’t, host name is saved in %2 variable.
  4. RewriteCond simply saves URI of requested resource in %3 variable.
  5. If all RewriteCond directives matched, RewriteRule builds the substitution string.

a. This part (?%1s) checks whether %1 matched and if yes, it adds “s” character. http(?%1s) -> https

Please pay attention!

  • Check of whether the group matched or not is performed using different syntax depending on the directive: RewriteRule: ?Ntrue_string:false_string
  • RewriteCond: ?%Ntrue_string:false_string

b. Then https://www.%2%3 -> https://www.example.com%3

c. Then https://www.example.com%3 -> https://www.example.com/index.php

Dealing with QueryString

RewriteRule directive deals only with the part of the request AFTER host name and BEFORE QueryString.

E.g. the request is http://localhost/test?param=foo (only part in bold is processed).

And the rule is:

So, there are 4 variants of passing initial QueryString:

  1. By default, if no new QueryString is specified in substitution string, initial QueryString will be added to the rewritten URL. For
    RewriteRule ^/test$ /test.asp

    the result is /test.asp?param=foo

  2. If new QueryString is specified in substitution string, initial QueryString will NOT be added to the rewritten URL. So, for
    RewriteRule ^/test$ /test.asp?bar=foo

    the result is /test.asp?bar=foo.

  3. If [QSA] flag is put after the rule, new and initial QueryStrings will be joined. For
    RewriteRule ^/test$ /test.asp?bar=foo [QSA]

    the result is /test.asp?bar=foo&param=foo.

  4. If it’s necessary not to add initial QueryString to rewritten URL when no new QueryString is specified, you should add “?” character at the end of the substitution string:
    RewriteRule ^/test$ /test.asp?

    And the result will be: /test.asp.

If one needs to work with QueryString parameters more selectively, you should use RewriteCond directive and %{QUERY_STRING} variable (remember that RewriteRule directive doesn’t match QUERY_STRING?).

Example

Redirect /index.php?id=123 to /index/123. The config is:

RewriteEngine on
RewriteCond %{QUERY_STRING} ^id=(.*)$
RewriteRule ^/index.php$ /index/%1? [NC,R=301,L]
  1. RewriteRule checks if requested file is index.php.
  2. RewriteCond retrieves QuerySrting value (part of request after “?”) from %{QUERY_STRING} server variable. In our case it’s “id=123”.
  3. RewriteCond applies ^id=(.*)$ regular expression to “id=123” string and saves 123 value in %1 variable.
  4. %1 is substituted with 123 in substitution string: /index/%1? -> /index/123 As there’s a “?” character at the end of the line, initial QuerySrting is not added.
  5. [R=301] flag is processed. If absolute address is not set for redirect, by default mod_rewrite adds http:// + requested_host + requested_port/. So, /index/123 –> http://example.com/index/123.

Small remark about # character

Please notice that browser does not send anchor information to the server (everything that comes after # character). That’s why it’s absolutely impossible to write a rule that will use this info. Nevertheless, browser will correctly process relative links, ‘cause rewriting is absolutely transparent for it.

005 - Request wist params and #

Conclusion

Hope we convinced you that mod_rewrite is much easier to use that you thought. We are happy if we managed to shed some light on this scary-looking question and you got more understanding of the issue. Next article about mod_rewrite will tell you the story of context merging and distributed configurations.

Best wishes, HeliconTech Team