Attention! Helicon Tech Blog has moved to www.helicontech.com/articles/

Monday, December 8, 2008

Subtleties of Unicode transformation in IIS6

Today we want to give you a brief info on what and how is being encoded in IIS6.

When IIS gets a request (e.g. GET /smth), each Unicode character of "smth" should be encoded as %xx in UTF-8 (in our case: GET /%73%6d%74%68).

However, IIS also recognizes non-encoded Unicode (UTF-8, UTF-16, etc.) text (e.g. GET /Şмŧĥ) which it will then convert to UTF-8 equivalent.

In it's turn native encoding for Windows API is UTF-16, thus the request is encoded once again (from UTF-8 to UTF-16). That's why ISAPI module also operates with UTF-16.

To return result to IIS rewritten request should be encoded back to UTF-8 and escaped as %xx because ISAPI does not provide native Unicode API.

No comments:

Post a Comment