Understanding The IIS Mod-Rewrite Log File And Work Flow

Posted September 3, 2009 at 2:33 PM

Tags: ColdFusion, Search Engine Optimization

As I have just started experimenting with URL rewriting in ColdFusion using IIS Mod-Rewrite, I have a lot of learning to do. While some of my experiments are working, I am getting some very unexpected (and frustrating) results which clearly indicate that I need to get a stronger hold on the work flow executing behind each URL rewrite. To gather some insight, I turned on logging for the rewrite process. This has to be done in the main (server-wide) configuration file, not in the override configuration files:

 Launch code in new window » Download code as text file »

  • # Define the location of the log file.
  • # NOTE: This location has to be one that the windows IIS user
  • # has write access to.
  •  
  • RewriteLog "C:\Inetpub\wwwroot\...\iis_mod_rewrite2\rewrite.txt"
  •  
  • # Define the level of detail that should be made available
  • # in the log file.
  • # Most detail: 9
  • # Least detail: 0 (no logging)
  •  
  • RewriteLogLevel 9

In order for IIS to be able to write to this log file, I needed to grant the IIS Windows user "write permissions" on the directory in which the log was be stored. Because I am running IIS6, this user showed up as IIS_WPG:


 
 
 

 
IIS Mod-Rewrite IIS User Permissions For Loggin Data.  
 
 
 

Once I had IIS Mod-Rewrite successfully logging web requests (which is, ALL requests on the server, by the way), I started to try different RewriteRule's to see how they were operating under the hood. The first one I set up was a simple rule that rewrites all incoming request to point to index.cfm:

 Launch code in new window » Download code as text file »

  • # IIS Mod-Rewrite configuration file
  • # Turn on the rewrite rules for this access file.
  • # This will will handle all requests based off of
  • # this directory.
  •  
  • RewriteEngine On
  •  
  • # Rewrite all requests to the index.cfm file and append
  • # original request as a query string parameters: original.
  •  
  • RewriteRule ^(.+)$ index.cfm?original=$1 [L,QSA]

As you can see, no matter what request comes from the outside, all traffic will be directed to index.cfm. Furthermore, the requested sub-file path will be appended to the request as the query string parameter, "original." I have also included the QSA (Query String Append) flag such that any query string values made in the original request will also be included in the rewritten URL.

With this rule in place, I called the following URL:

./foo/bar.cfm

Now, before we go into what happens, let me tell you what I would expect to happen as someone new to URL rewriting. I would expect that I'd execute the index.cfm page and my URL scope would contain the following name-value pair:

original: foo/bar.cfm

Now, here's what really happens: Yes, the index.cfm file is executed, but my URL scope has the following unexpected name-value pair:

original: index.cfm,foo/bar.cfm

Funky! Not only am I getting the original URL in the query string, I'm also getting the rewrite URL as well (or so I think).

So what happened? If you look at my regular expression and my rewrite, it would make you think that the entire string "index.cfm,foo/bar.cfm" is what is being captured; but that can't be as index.cfm was not part of the original request! To find out more, I took a look at the log file. Here is the relevant excerpt from it (with unnecessary data removed):

init rewrite engine
with requested uri

applying pattern
'/\.htaccess$' to uri '/iis_mod_rewrite2/foo/bar.cfm'

pass through
/iis_mod_rewrite2/foo/bar.cfm

init rewrite engine
with requested uri

applying pattern
'^(.+)$' to uri 'foo/bar.cfm'

rewrite
'foo/bar.cfm' -> 'index.cfm?original=foo/bar.cfm'

split
uri=index.cfm?original=foo/bar.cfm -> uri=index.cfm, args=original=foo/bar.cfm

internal redirect
with /iis_mod_rewrite2/index.cfm [INTERNAL REDIRECT]

init rewrite engine
with requested uri

applying pattern
'^(.+)$' to uri 'index.cfm'

rewrite
'index.cfm' -> 'index.cfm?original=index.cfm'

split
uri=index.cfm?original=index.cfm -> uri=index.cfm, args=original=index.cfm

internal redirect
with /iis_mod_rewrite2/index.cfm [INTERNAL REDIRECT]

Having never seen a log file like this before, it took me a good hour (and plenty of Twitter help from Emmet McGovern) to really wrap my head around what was going on. Immediately, however, I noticed that there were two calls to something marked [INTERNAL REDIRECT]. That probably had something to do with my funky resultant query string, but how? After all, I only had one rewrite rule - how could two redirects be executing? I was especially perplexed considering that I was using the "L" (Last Rule) flag on my rewrite condition.

What I came to realize was that each rewrite performed by IIS Mod-Rewrite gets reprocessed using the rewritten URL. As such, the first rewrite rule was executed on the original URL:

foo/bar.cfm

At the end of the first pass, it was rewritten to the URL:

index.cfm?original=foo/bar.cfm

At this point, the newly rewritten URL was reprocessed (sub-requested??) by the rules engine and ends up creating this new URL:

index.cfm?original=index.cfm

While this URL only has the one, "original," query string parameter, because I am using the QSA (Query String Append) flag, the query string created in the first URL rewrite (original=foo/bar.cfm) is now appended to the query string created by the second rewrite rule such that I end up with this URL:

index.cfm?original=index.cfm&original=foo/bar.cfm

Now, because I have a query parameter named, "original," showing up twice in the resultant query string, the two query parameter values get collapsed into a single, comma-delimited list:

original: index.cfm,foo/bar.cfm

As a note, if I removed the QSA (Query String Append) flag, I ended up with the name-value pair:

original: index.cfm

This happens because only the final query string created by the second rewrite rule is kept.

Ok, so tracing the logged rewrites helps us understand where the unexpected query string value comes from; but, it doesn't really help us understand why there are specifically two redirects being executed. After all, if there are no RewriteCond (rewrite conditions) on the RewriteRule, why execute 2 times, but not 3 times - or 4 times - or N times?

After some more reading and some back and forth on Twitter, I came up with a theory as to what was going on. Normally, with a URL rewriting system, something like this would create an endless loop of rewriting. To stop this from happening, you might set a limit to the number of rewrites that can take place using the MaxRewrites condition. From what I can gather, however, IIS Mod-Rewrite does not support this rewrite option; as such, I have to assume that it has built its rewrite engine with enough intelligence to see that it was about to enter an endless loop (rewrite result matches incoming URL) and halts the process. That would explain the two rewrite executions, but not a third.

In my experimentation, I tried using the NS (Do Not allow internal Subrequests) flag to prevent the first rewritten rule from being re-evaluated; but, from the IIS Mod-Rewrite documentation, it would appear that ISAPI does not support custom control flow at this level. If that interpretation is correct, then it makes even more sense as to why the rewrite engine was built to One) Always re-evaluate and, Two) Halt immediately at the beginning of an endless loop.

I am simultaneously new to all of the URL rewriting concepts as well as the IIS Mod-Rewrite product, so forgive me if there are any blatant misunderstandings in this explanation of the rewrite log file. It took me a number of hours and a lot of help to even get this far, so any constructive criticism would be greatly appreciated. From what I have concluded thus far, however, I have been able to more accurately predict other rewrite outcomes, which I'll address in later blog posts.

NOTE: Always remember to turn Off logging when you are not using it. Since every single web request (including IMG, CSS, and SCRIPT files) gets logged to this text file, it gets very big, very fast.

Download Code Snippet ZIP File

Post Comment  |  Ask Ben  |  Permalink  |  Other Searches  |  Print Page




Learning ColdFusion 9 - ColdFusion 9 tutorials, samples, examples, demos

Reader Comments

Sep 3, 2009 at 2:50 PM // reply »
5 Comments

You look at coldcourse at all, Ben?


Sep 3, 2009 at 2:57 PM // reply »
6,516 Comments

@SIGEPJEDI,

What is ColdCourse?


Sep 3, 2009 at 3:10 PM // reply »
79 Comments

@Ben

ColdCourse is an initial stab at a url-routing system for ColdFusion. It is the basis of various url-routing systems in various ColdFusion frameworks.

Unfortunately, it is not up to par with Django's or Rails' url-routing systems. It is also rather unRESTful, preferring the convenient familiarity of query-string parameters re-encoded as extra path info (the part of the path after /index.cfm) a la "/index.cfm?var1=val1&var2=val2" -> "/index.cfm/var1/val1/var2/val2".

To achieve a complex, resource-oriented website, you will probably need to roll out your own url-routing system, or help build one modelling it after the one in Rails or Django. However, this may not apply to most simple web apps.

Justice


Sep 3, 2009 at 3:14 PM // reply »
6,516 Comments

@Justice,

Ah ok. Yeah, I'm gonna shy away from using path_info to get at the rewrite stuff. That's just a personal opinion. Something about it just never sat right with me.


Sep 3, 2009 at 3:24 PM // reply »
79 Comments

@Ben,

Can you put a finger on it?

Justice


Sep 3, 2009 at 3:27 PM // reply »
6,516 Comments

@Justice,

Ha ha, no, can't put a finger on it :)


Sep 3, 2009 at 9:02 PM // reply »
45 Comments

@Ben,

Just a stab in the dark...

Can't you get the RegEx to exclude the index.cfm match? There is probably a more elegant solution, however if it works it could be used until such solution presents itself.


Sep 4, 2009 at 8:04 AM // reply »
6,516 Comments

@Andrew,

Yeah, I think are correct. I wasn't worrying about the rule, per-say, I just wanted to see what was going on the hood. I'll write some more on this topic with more rule-based thoughts.


Sep 4, 2009 at 11:39 AM // reply »
6 Comments

Small typo in post title: Understating -> should be Understanding? Or perhaps I don't understand :)


Sep 4, 2009 at 11:41 AM // reply »
6,516 Comments

@Ciaran,

Ooops, thanks for catching that! Definitely do not want to understate :)


Post Comment  |  Ask Ben

Recent Blog Comments
Nov 21, 2009 at 6:47 PM
Hal Helms - Real World Object Oriented Development, Sarasota - Day Five
@charlie griefer, Thank you.. ... read »
Nov 21, 2009 at 5:15 PM
Using ColdFusion Structures To Remove Duplicate List Values
@Jose Galdamez, Oh heh yeah I didn't paste the whole code. I should have defined the vars -- my bad. It's fixed thou. Thanks. ... read »
Nov 21, 2009 at 4:49 PM
Styling The ColdFusion 8 WriteToBrowser CFImage Output
Great work yet again Ben! Whilst I didn't use this whole code, I copied some of your regex code for a similar problem with the lack of an alt attribute and unescaped ampersands in CFIMAGE for Railo 3 ... read »
Nov 21, 2009 at 1:13 PM
My First ColdFusion Builder Extension - Encrypting And Decrypting CFM / CFC Files
@Ben, Because I am pedantic, I just want to make sure that everyone knows there is absolutely no encryption going on. There is only encoding and obfuscation. The cfencode tool only obfuscates your C ... read »
Nov 21, 2009 at 12:28 PM
Using ColdFusion Structures To Remove Duplicate List Values
@Jody I can't seem to get your code sample to work. If you are still having problems, try this code out and see if it gets you what you wanted. <!--- Comma delimited list with various duplicates ... read »
Nov 21, 2009 at 11:03 AM
Groovy Operator Overloading Does Not Work In The ColdFusion Context
Hi Ben, Thanks for this informative post. Now I am reading ur old posts too ... read »
Nov 21, 2009 at 10:56 AM
HostMySite.com Has The Best ColdFusion Hosting
@Mehul, Yes very nice people, however several downtimes per day which was not acceptable. Hence we had to move out. I am glad you are having good luck with them so far. ... read »