Attention! Helicon Tech Blog has moved to www.helicontech.com/articles/

Friday, February 20, 2009

ISAPI_Rewrite FAQ

No so long ago we’ve posted this FAQ on our forum and quite a lot of our clients found it helpful. This compilation covers some of the most tricky questions concerning ISAPI_Rewrite3 (and actually Helicon Ape mod_rewrite module). And we reckon that being here it will be help-at-hand for even more people.

Configuration file not loaded or not altered

It may happen that you get the “File not loaded” message in error.log or changes to the configuration file are not applied. Generally the issue in this case is with NTFS permissions.

How to make sure ISAPI_Rewrite is working at all?

  1. Check if there is up green arrow near ISAPI_Rewrite3 filter in IIS - Web sites properties - ISAPI filters tab.
  2. Put the following rule into httpd.conf file:
    RewriteEngine on
    RewriteRule .? - [F]

Make any request to the site. If result is "403 Forbidden", ISAPI_Rewrite works OK. If you get "404 Page not found", something might be wrong.

Logging issues

  1. Log file not created or is empty

    If you encounter such problem, you need to enable logging be putting the following lines into httpd.conf file (which resides in ISAPI_Rewrite installation folder):

    #enabling rewrite.log
    RewriteLogLevel 9
    #enabling error.log
    LogLevel debug
  2. Log not recreated after deletion

    To resolve the issue you should either

    • grant Everyone group Write permission for newly created log file (either rewrite.log or error.log);
    • grant to folder, where rewrite (and error) log are supposed to be created, Create & Write permissions for all users running application pools (by default it is IIS_WPG group).

Warning! Plesk creates new user for each application pool, so it is necessary to grant Create & Write permissions to each of them.

Don't forget to add logging directives (see clause 1 above) into your httpd.conf file.

Note: do not enable logging on live server – it is destined only for debugging purposes.

Images and CSS on pages corrupted

If this is your case, you are likely to use page-relative paths for your images and CSS and the problem occured because of altered base path.

Please consider changing the paths to root-relative (e.g. <img src="/image.jpg">) or absolute from. Or specify correct base path (e.g. <base href="/">) .

Initial query string is appended to rewritten url

For example you use the rule like:

RewriteEngine on
RewriteRule ^index\.php$ default.aspx [NC,R]

And when you request the url with query string like http://www.site.com/index.php?param=value you get the result http://www.site.com/default.aspx?param=value instead of desired http://www.site.com/default.aspx (without initial query string).

This happens because by default ISAPI_Rewrite attaches initial query string to rewritten url (if rewritten url doesn’t have its own query string). To avoid this you must add a question mark at the end of substitution pattern:

RewriteEngine on
RewriteRule ^index\.php$ default.aspx? [NC,R]

URLs with question mark don’t work

The problem is that you put a query string part of the url into RewriteRule statement and ISAPI_Rewrite 3 processes query string apart from the rest of the url in RewriteCond %{QUERY_STRING} statement.

So this won’t work:

RewriteEngine on
RewriteRule ^index\.php\?param=(\d+)$ default.asp?param2=$1? [NC,L]

And this will work:

RewriteEngine on
RewriteCond %{QUERY_STRING} ^param=(\d+)$ [NC]
RewriteRule ^index\.php$ default.asp?param2=%1? [NC,L]

Dealing with optional parameters

Sometimes you don’t now the exact number of parameters to be passed and want a universal rule to deal in all cases. Here’s a simple example for a situation when you may have either one or two parameters:

RewriteEngine on
RewriteRule ^folder/(\d+)(?:/(\d+))?/?$ index.asp?param1=$1?2¶m2=$2 [NC,L]

This rule will accept requests like http://www.site.com/folder/111/ and http://www.site.com/folder/111/222/ (with or without trailing slash) and direct them respectively to http://www.site.com/index.asp?param1=111 and http://www.site.com/index.asp?param1=111&param2=222.

Excluding specific folders from being rewritten

If it is necessary to exclude some folders from being processed by ISAPI_Rewrite, and to redirect others to for example index.asp, the following piece of code will be helpful:

RewriteEngine on
RewriteBase /
RewriteRule ^(?!(?:exfolder1|exfolder2|etc)/.*).+$ index.asp [NC,R=301,L]

Comprehensive map files insight

  1. How to lowercase matched string before comparison with map file entries?
    RewriteEngine on
    RewriteBase /
    RewriteMap mapfile txt:mapfile.txt
    RewriteMap lower int:tolower
    RewriteRule ^products/([^?/]+)\.asp$ productpage.asp?productID=${mapfile:${lower:$1}}

    Note: entries in your mapfile.txt must also be lowercase.

    Note: In builds 3.1.0.62 and higher it's possible to make case-insensitive comparison by simply adding [NC] flag after mapfile definition:

    RewriteMap mapfile txt:mapfile.txt [NC]
  2. How to check if the value is present in map file prior to processing the URL to avoid fruitless actions?

    Please add the following condition before the rule dealing with map file to accomplish this:

    RewriteCond ${mapfile:$1|NOT_FOUND} !NOT_FOUND
  3. Applying map file only if specific pattern is matched

    Please add the following condition before the rule dealing with map file to accomplish this:

    RewriteMap mymap txt:map.txt
    RewriteMap mylower int:tolower
    RewriteCond %{REQUEST_URI} ^/([^/]+)/?$
    RewriteCond ${mymap:${mylower:%1}|NOT_FOUND} !NOT_FOUND
    RewriteRule .? ${mymap:${mylower:%1}} [NC,L]

    Line-by-line explanation:
    1 line: we declare map file 'mymap' which will read content from map.txt file.
    2 line: we declare map file 'mylower', which provides access to pre-defined internal function ToLower. It converts input string to lower case.
    3 line: we are catching the request URI of the specified pattern.
    4 line: we are trying to get a value from map file, using requested URI in lower case as a key.
    5 line: if the value is found, RewriteRule fires.

  4. Using 2 map files to redirect old "ugly" URLs to pretty ones but still show their original content

    More and more people are asking for realization of the following behavior:

    Requested URL: http://www.site.com/index.asp?param1=value&param2=value2 Address bar: http://www.site.com/keyword1 Content shown: http://www.site.com/index.asp?param1=value&param2=value2

    Surely one usually needs to rewrite a large number of URLs of similar pattern, hence map files are most suitable for this situation. We'll need 2 map files with reverse content:

    mapfile.txt
    param1=value1¶m2=value2 keyword1
    etc.

    revmapfile.txt
    keyword1 param1=value¶m2=value2
    etc.

    And the rules resolving the issue look like:

    RewriteEngine on
    RewriteBase /
    RewriteMap mapfile txt:mapfile.txt [NC]
    RewriteMap revmapfile txt:revmapfile.txt [NC]
    RewriteCond %{QUERY_STRING} (.+)
    RewriteRule ^index\.asp$ ${mapfile:%1}? [NC,R=301,L]
    RewriteCond ${revmapfile:$1|NOT_FOUND} !NOT_FOUND
    RewriteRule ^([^/]+)$ index.asp?${revmapfile:$1} [NC,L]

(Un)installation problems

It's not likely, but if you encounter some troubles (un)installing ISAPI_Rewrite 3, firstly, please try to re-download the installation package (it could be corrupted during downloading) and start installation again. If this doesn't help, please run installation from the command line with the following keys to generate (un)installation log:

For installation:

msiexec /i rewriteXXX.msi /l* log.txt

For uninstallation:

msiexec /x rewriteXXX.msi /l* log.txt

This log will be helpful for investigation of your errors.

How to block specific IP ranges using ISAPI_Rewrite3?

It is often needed to prevent your site from being accessed from specific IP addresses or ranges of IP addresses. Here's an example of blocking two ranges: 203.207.64.0 - 203.208.19.255 203.208.32.0 - 203.208.63.255

ISAPI_Rewrite code is:

RewriteEngine on
RewriteCond %{REMOTE_ADDR} ^203\.20(?:7\.(?:[6-9][0-9]|\d{3})|8\.(?:1?[0-9]|4|5[0-9]|3[2-9]|6[0-3]))\.\d{1,3}$
RewriteRule .? - [F]

ISAPI_Rewrite not working under Visual Studio 2005

The issue occurs because by default Visual Studio uses it's internal web server, not IIS. To resolve the issue please right-click on your project, go to Property pages -> Start Options -> Server -> select Use custom server and in Base URL field specify the path to your site.

Why don't ISAPI_Rewrite and IISPassword get on well?

The issue is that IISPassword and ISAPI_Rewrite use the same configuration file name - .htaccess, but unlike ISAPI_Rewrite, IISPassword fails if any unknown directives are found in it. So to avoid conflicts you need to change configuration file name for either product. To change configuration file name for ISAPI_Rewrite use AccessFileName directive in httpd.conf file.

How to block web-spiders using ISAPI_Rewrite3?

Here's the solution to get rid of lots of web-spiders:

RewriteEngine on
RewriteCond %{HTTP:User-Agent} (?:WebBandit|2icommerce|\Accoona|ActiveTouristBot|
adressendeutschland|aipbot|Alexibot|Alligator|\AllSubmitter|
\almaden|anarchie|Anonymous|Apexoo|Aqua_Products|asterias|\ASSORT|
ATHENS|AtHome|Atomz|attache|autoemailspider|autohttp|b2w|bew|
\BackDoorBot|Badass|Baiduspider|Baiduspider+|BecomeBot|berts|
Bitacle|Biz360|\Black.Hole|BlackWidow|bladderfusion|BlogChecker|
BlogPeople|BlogsharesSpiders|\Bloodhound|BlowFish|BoardBot|
Bookmarksearchtool|BotALot|BotRightHere|\Botmailto:craftbot@yahoo.com|
Bropwers|Browsezilla|BuiltBotTough|Bullseye|\BunnySlippers|Cegbfeieh|
CFNetwork|CheeseBot|CherryPicker|Crescent|charlotte/|\ChinaClaw|
Convera|Copernic|CopyRightCheck|cosmos|Crescent|c-spider|curl|
Custo|\Cyberz|DataCha0s|Daum|Deweb|Digger|Digimarc|digout4uagent|
DIIbot|DISCo|DittoSpyder|\DnloadMage|Download|dragonfly|DreamPassport|
DSurf|DTSAgent|dumbot|DynaWeb|e-collector|\EasyDL|EBrowse|eCatch|ecollector|
edgeio|efp@gmx.net|EirGrabber|EmailExtractor|EmailCollector|\EmailSiphon|
EmailWolf|EmeraldShield|Enterprise_Search|EroCrawler|ESurf|Eval|
Everest-Vulcan|\Exabot|Express|Extractor|ExtractorPro|EyeNetIE|FairAd|
fastlwspider|fetch|FEZhead|FileHound|\findlinks|FlamingAttackBot|
FlashGet|FlickBot|Foobot|Forex|FranklinLocator|FreshDownload|
\FrontPage|FSurf|Gaisbot|Gamespy_Arcade|genieBot|GetBot|Getleft|GetRight|
GetWeb!|Go!Zilla|\Go-Ahead-Got-It|GOFORITBOT|GrabNet|Grafula|grub|
Harvest|HatenaAntenna|heritrix|HLoader|\HMView|holmes|HooWWWer|
HouxouCrawler|HTTPGet|httplib|HTTPRetriever|HTTrack|humanlinks|
\IBM_Planetwide|iCCrawler|ichiro|iGetter|ImageStripper|
ImageSucker|imagefetch|imds_monitor|\IncyWincy|IndustryProgram|
Indy|InetURL|InfoNaviRobot|InstallShieldDigitalWizard|InterGET|
\IRLbot|Iron33|ISSpider|IUPUIResearchBot|Jakarta|java/|JBHAgent|
JennyBot|JetCar|jeteye|jeteyebot|JoBo|\JOCWebSpider|Kapere|Kenjin|
KeywordDensity|KRetrieve|ksoap|KWebGet|LapozzBot|larbin|leech|LeechFTP|
\LeechGet|leipzig.de|LexiBot|libWeb|libwww-FM|libwww-perl|LightningDownload|
LinkextractorPro|Linkie|\LinkScan|linktiger|LinkWalker|lmcrawler|
LNSpiderguy|LocalcomBot|looksmart|LWP|MacFinder|MailSweeper|\mark.blonin|
MaSagool|Mass|MataHari|MCspider|MetaProductsDownloadExpress|
MicrosoftDataAccess|MicrosoftURLControl|\MIDown|MIIxpc|Mirror|Missauga|
MissouriCollegeBrowse|Mister|Monster|mkdb|moget|Moreoverbot|
mothra/netscan|\MovableType|Mozi!|Mozilla/22|Mozilla/3.0(compatible)|
Mozilla/5.0(compatible;MSIE5.0)|MSIE_6.0|MSIECrawler|\MSProxy|MVAClient|
MyFamilyBot|MyGetRight|nameprotect|NASASearch|Naver|Navroad|
NearSite|NetAnts|netattache|\NetCarta|NetMechanic|NetResearchServer|
NetSpider|NetZIP|NetVampire|NEWTActiveX|Nextopia|NICErsPRO|ninja|
\NimbleCrawler|noxtrumbot|NPBot|Octopus|Offline|OKMozilla|OmniExplorer|
OpaL|Openbot|Openfind|OpenTextSiteCrawler|\OracleUltraSearch|OutfoxBot|
P3P|PackRat|PageGrabber|PagmIEDownload|panscient|PapaFoto|pavuk|
pcBrowser|\perl|PerMan|PersonaPilot|PHPversion|PlantyNet_WebRobot|
playstarmusic|Plucker|PortHuron|ProgramShareware|\ProgressiveDownload|
ProPowerBot|prospector|ProWebWalker|Prozilla|psbot|psycheclone|puf|
PushSite|\PussyCat|PuxaRapido|Python-urllib|QuepasaCreep|QueryN|
Radiation|RealDownload|RedCarpet|RedKernel|\ReGet|relevantnoise|
RepoMonkey|RMA|Rover|Rsync|RTG30|Rufus|SAPO|SBIder|scooter|ScoutAbout|
script|\searchpreview|searchterms|Seekbot|Serious|Shai|shelob|
Shim-Crawler|SickleBot|sitecheck|SiteSnagger|\SlurpyVerifier|SlySearch|
SmartDownload|sna-|snagger|Snoopy|sogou|sootle|So-net"bat_bot|
SpankBot"bat_bot|\spanner"bat_bot|SpeedDownload|
Spegla|Sphere|Sphider|SpiderBot|sproose|SQWebscanner|Sqworm|
Stamina|\Stanford|studybot|SuperBot|SuperHTTP|Surfbot|SurfWalker|
suzuran|Szukacz|tAkeOut|TALWinHttpClient|\tarspider|Teleport|Telesoft|
Templeton|TestBED|TheIntraformant|TheNomad|TightTwatBot|
Titan|\toCrawl/UrlDispatcher|True_Robot|turingos|TurnitinBot|
TwistedPageGetter|UCmore|UdmSearch|\UMBC|UniversalFeedParser|
URLControl|URLGetFile|URLyWarning|URL_Spider_Pro|UtilMind|vayala|
\vobsub|VCI|VoidEYE|VoilaBot|voyager|w3mir|WebImageCollector|
WebSucker|Web2WAP|WebaltBot|\WebAuto|WebBandit|WebCapture|
webcollage|WebCopier|WebCopy|WebEMailExtrac|WebEnhancer|
WebFetch|\WebFilter|WebFountain|WebGo|WebLeacher|WebMiner|
WebMirror|WebReaper|WebSauger|WebSnake|Website|\WebStripper|WebVac|
webwalk|WebWhacker|WebZIP|WellsSearch|WEPSearch00|WeRelateBot|
Wget|WhosTalking|\Widow|WildsoftSurfer|WinHttpRequest|WinHTTrack|
WUMPUS|WWWOFFLE|wwwster|WWW-Collector|Xaldon|Xenu's|\Xenus|XGET|
Y!TunnelPro|YahooYSMcm|YaDirectBot|Yeti|Zade|ZBot|zerxbot|Zeus|ZyBorg) [NC]
RewriteRule .? - [F]

Hope this article has answered some of your questions and helped you master some of ISAPI_Rewrite basic features. We are going to add more unobvious issues to this list to make your enjoy every rule you write.

Best wishes,
HeliconTech Team

4 comments:

  1. very useful, unfortunately the end of the code lines is cut off, can you enable line wrapping?

    ReplyDelete
  2. I removed all the carriage returns on the bot list and still got "unknown expression on line #18"

    ReplyDelete
  3. Please make sure you've removed all unnecessary spaces. Also please show your "line #18" (better on our forum)

    ReplyDelete