Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at Scotch On The Rock (SOTR) 2010 (Amsterdam) with:

Using IIS URL Rewriting And CGI.PATH_INFO With IIS MOD-Rewrite

By Ben Nadel on
Tags: ColdFusion

Previously, I explored the concept of using URL rewriting with IIS and IIS MOD-Rewrite in order to make ColdFusion's OnMissingTemplate() event handler more effective. This worked fine with some fenagling, but Justice suggested that I take a look at using PATH_INFO. I've only briefly looked into PATH_INFO before, but one thing that I do like about it is that it can be used both with and without URL rewriting. As a quick overview to PATH_INFO, if you have the following URL:

index.cfm/foo/bar/

... the value that comes after the index.cfm (/foo/bar/) is the extra PATH_INFO. Now, in ColdFusion on IIS (I say that since I cannot test it on other systems and I am told that it varies), CGI.SCRIPT_NAME and CGI.PATH_INFO are the same value unless the extra path information is provided. So, for example, if you go to the following URL:

index.cfm

... CGI will report the following values:

script_name: /index.cfm
path_info: /index.cfm

As you can see, when you access a given path, the script and the path are the same. However, if you request the following URL:

index.cfm/foo/bar/

... then CGI will report the following values:

script_name: /index.cfm
path_info: /foo/bar/

As you can see now, the two values are different. While this is a bit odd, but at least it gives us a way to determine when PATH_INFO is being used (when its value is different than the SCRIPT_NAME).

That said, approaching this URL rewriting with PATH_INFO in mind, I created the following IIS MOD-Rewrite configuration file:

  • # IIS Mod-Rewrite configuration file
  • # Turn on the rewrite rules for this access file. This will
  • # will handle all requests based off of this directory.
  •  
  • RewriteEngine On
  •  
  •  
  • # If the given file or directory exits, then don't do any
  • # redirects - simply pass the request on to the file system.
  •  
  • RewriteCond %{REQUEST_FILENAME} -f [OR]
  • RewriteCond %{REQUEST_FILENAME} -d
  • RewriteRule .? - [L]
  •  
  •  
  • # If the given file does not exist, then rewrite the request
  • # to use the front controller with the given file set as the
  • # path info for the script. We cannot make any assumptions
  • # about the file name as it might be purely a directory path
  • # without any file extension.
  • #
  • # NOTE: Because there is going to be a one-directory difference
  • # in browser path perception depending on whether or not the
  • # path info is entered manually, we have to flag that this was
  • # a rewrite for proper path calculations.
  •  
  • RewriteCond %{REQUEST_FILENAME} !-f
  • RewriteCond %{REQUEST_FILENAME} !-d
  • RewriteRule ^(.+)$ index.cfm/$1?_rewrite [NC,L,QSA]

The first rule just states that if the given file or directory exists, let it pass through. That way, if we get past the first rule, then we know that the given file doesn't exist. At that point, we are going to rewrite the request to the application's front controller (index.cfm) and append the requested script as the PATH_INFO.

When we perform this rewrite, we need to tack on a URL flag indicating that the path info was created during a rewrite process. This is necessary because PATH_INFO can be added manually by the user directly in the URL. Meaning, the following requests can and must result in the same outcome:

/foo/bar/
/index.cfm/foo/bar/

In this case, the first would be handled by the rewrite engine and the second would be manually determined by the user. Now, if you look at the two URLs, you'll notice that the big difference between them is that the first path is two directories deep while the second path is three directories deep. Although the PATH_INFO values are the same, under the hood, we will need to provide two different relative web root values. This is why the rewrite engine needs to append a flag - so that our application framework understands how to create the web root.

With that in mind, let's take a quick look at the Application.cfc file:

Application.cfc

  • <cfcomponent
  • output="false"
  • hint="I define the application settings and event handlers.">
  •  
  • <!--- Define the application. --->
  • <cfset this.name = hash( getCurrentTemplatePath() ) />
  • <cfset this.applicationTimeout = createTimeSpan( 0, 0, 5, 0 ) />
  •  
  • <!--- Define page request settings. --->
  • <cfsetting
  • requesttimeout="10"
  • showdebugoutput="false"
  • />
  •  
  •  
  • <cffunction
  • name="onApplicationStart"
  • access="public"
  • returntype="boolean"
  • output="false"
  • hint="I initialize the application.">
  •  
  • <!--- Define the local scope. --->
  • <cfset var local = {} />
  •  
  • <!---
  • As part of the application initialization, we
  • want to figure out some constants surrounding our
  • application location:
  •  
  • RootDirectory
  • The root directory of our application.
  •  
  • RootScript
  • The root script path of our application. This is in
  • the case where our application lives below the web
  • root of the server.
  •  
  • RootUrl
  • The root URL of our application.
  • --->
  •  
  • <!---
  • Determining the root path is easy - we always know
  • that it is this directory (the one containing the
  • Application.cfc component).
  • --->
  • <cfset application.rootDirectory = getDirectoryFromPath(
  • getCurrentTemplatePath()
  • ) />
  •  
  • <!---
  • To find the Root Script, we have to do a bit more
  • calculation; we need to figure out the difference
  • in the length between the root directory and
  • requested directory and then subtract that depth
  • from the requested script.
  • --->
  •  
  • <!---
  • Start off with the current script directory as the
  • root directory.
  • --->
  • <cfset application.rootScript = getDirectoryFromPath(
  • cgi.script_name
  • ) />
  •  
  • <!---
  • Comparing the expanded root script to the root
  • directory, we can now figure out how many directories
  • below the application root we are.
  • --->
  • <cfset local.scriptDepth = (
  • listLen( expandPath( application.rootScript ), "\/" ) -
  • listLen( application.rootDirectory, "\/" )
  • ) />
  •  
  • <!---
  • Based on the script depth, we can now move up the path
  • the corresponding number of steps.
  • --->
  • <cfset application.rootScript = reReplace(
  • application.rootScript,
  • "([^\\/]+[\\/]){#local.scriptDepth#}$",
  • "",
  • "one"
  • ) />
  •  
  • <!---
  • Now that we have our root script, we can easily find
  • our root URL. The only special case we need to worry
  • about is when the root script is "/". In that case,
  • we are in the root of the web directory and don't need
  • to append the script.
  • --->
  • <cfset application.rootUrl = (
  • "http://" &
  • cgi.server_name
  • ) />
  •  
  • <!---
  • Check to see if we have a script name worth appending
  • to the URL.
  • --->
  • <cfif !reFind( "^[\\/]$", application.rootScript )>
  •  
  • <!--- Append root script to URL. --->
  • <cfset application.rootUrl &= application.rootScript />
  •  
  • </cfif>
  •  
  • <!--- Return true so the page request can process. --->
  • <cfreturn true />
  • </cffunction>
  •  
  •  
  • <cffunction
  • name="onRequestStart"
  • access="public"
  • returntype="boolean"
  • output="false"
  • hint="I intialize the page request.">
  •  
  • <!--- Define arguments. --->
  • <cfargument
  • name="script"
  • type="string"
  • required="true"
  • hint="I am the requested script name."
  • />
  •  
  • <!--- Define the local scope. --->
  • <cfset local = {} />
  •  
  • <!---
  • Combine the form and url scopes into a common request
  • attributes collection so that we don't have to know
  • what scope a variable came from.
  • --->
  • <cfset request.attributes = duplicate( url ) />
  • <cfset structAppend( request.attributes, form ) />
  •  
  • <!---
  • Param the default action variable - this will be
  • what the front-controller (and sub-controllers) use
  • to figure out what scripts to execute.
  • --->
  • <cfparam
  • name="request.attributes.do"
  • type="string"
  • default=""
  • />
  •  
  • <!---
  • Split the request variable into an array such that
  • we can examine the parts of it in front-controller
  • control flow. We are going to assume that the raw
  • action variable is a dot-delimmited list of actions.
  • --->
  • <cfset request.do = listToArray(
  • request.attributes.do,
  • "."
  • ) />
  •  
  •  
  • <!---
  • Now that we have our action variable set up and based
  • of the query string, let's check to see if the current
  • page request is actually a URL Rewriting as determined
  • by a PATH_INFO that is diffrent from the SCRIPT_NAME
  • available in the CGI object (normally, these two are
  • the same value unless path info is used explicitly).
  •  
  • If so, we might have to translate the PATH_INFO into
  • a new action AND a set of query string parameters.
  • --->
  • <cfif (cgi.script_name neq cgi.path_info)>
  •  
  • <!---
  • Create a normalized version of the script as taken
  • from the path_info variable. Essentially, we are
  • removing any leading or trailing slashes.
  • --->
  • <cfset local.script = reReplace(
  • cgi.path_info,
  • "^[\\/]+|[\\/]+$",
  • "",
  • "all"
  • ) />
  •  
  • <!---
  • Now that we have our script name normalized, let's
  • use some regular expression pattern matching to
  • see if we need to update our action varaible or
  • any other URL variables.
  • --->
  • <cfif reFind( "^contact\b", local.script )>
  •  
  • <!--- Routing to contact section. --->
  • <cfset request.do = [ "contact" ] />
  •  
  • <cfelseif reFind( "^about\b", local.script )>
  •  
  • <!--- Routing to about section. --->
  • <cfset request.do = [ "about" ] />
  •  
  • <cfelseif reFind( "^blog/[\d+]", local.script )>
  •  
  • <!--- Routing to blog section. --->
  • <cfset request.do = [ "blog" ] />
  •  
  • <!--- Get ID of blog post. --->
  • <cfset request.attributes.id = listGetAt(
  • local.script,
  • 2,
  • "/"
  • ) />
  •  
  • <cfelseif reFind( "^blog\b", local.script )>
  •  
  • <!--- Routing to blog section. --->
  • <cfset request.do = [ "blog" ] />
  •  
  • <cfelse>
  •  
  • <!---
  • We could not match the requested URL against
  • any of our SES patterns. As such, this is
  • truly an invalid file request. As such, let's
  • return a true 404 error.
  • --->
  • <cfheader
  • statuscode="404"
  • statustext="Page Not Found"
  • />
  •  
  • <!---
  • Return out with false so the request of the
  • page will not get processed.
  • --->
  • <!--- <cfreturn false /> --->
  •  
  • </cfif>
  •  
  • </cfif>
  •  
  •  
  • <!---
  • Get the relative web root path from our current page
  • (this will allow our traversal path to always be
  • relative, rather than a hard-coded root path, which
  • is the ultra lame... like really really lame).
  •  
  • When calculating this path, we need to take into
  • account BOTH the current file as well as any PATH_INFO
  • value since the browsers views them both as adding to
  • the depth of the page request.
  •  
  • Get the initial web root based only on the requested
  • page template.
  • --->
  • <cfset request.webRoot = repeatString(
  • "../",
  • (
  • listLen( getDirectoryFromPath( expandPath( arguments.script ) ), "\\/" ) -
  • listLen( application.rootDirectory, "\\/" )
  • )) />
  •  
  • <!---
  • Now that we have the base web root from the requested
  • template, we have to see if there is any extra pathing
  • being used. As before, we will determine this to be
  • true if the script name and the path info are different
  • values.
  • --->
  • <cfif (cgi.script_name neq cgi.path_info)>
  •  
  • <!---
  • Because there will be a one directory difference
  • in browser perception if the PATH_INFO was entered
  • manually, versus if this was a rewrite, we have to
  • check for the rewrite flag.
  • --->
  • <cfif structKeyExists( url, "_rewrite" )>
  •  
  • <!---
  • There will be an offset required when using
  • the PATH_INFO for calculation.
  • --->
  • <cfset local.webRootOffset = 1 />
  •  
  • <!---
  • Delete the rewrite flag as it will not be
  • needed for anything else in this request.
  • --->
  • <cfset structDelete( url, "_rewrite" ) />
  • <cfset structDelete( request.attributes, "_rewrite" ) />
  •  
  • <cfelse>
  •  
  • <!---
  • The PATH_INFO was entered manually. As such,
  • there will be no offset needed for the web
  • root.
  • --->
  • <cfset local.webRootOffset = 0 />
  •  
  • </cfif>
  •  
  • <!---
  • We are using extra PATH_INFO. The good news here
  • is that the browser acts the SAME whether or not
  • the pathing is done via the URL or via the rewrite
  • since they both add the same depth to the request.
  • --->
  • <cfset request.webRoot &= repeatString(
  • "../",
  • (
  • listLen( (cgi.path_info & "-" ), "\/" ) -
  • local.webRootOffset
  • )) />
  •  
  • </cfif>
  •  
  •  
  • <!--- Return true so that the page can be processed. --->
  • <cfreturn true />
  • </cffunction>
  •  
  •  
  • <cffunction
  • name="onRequest"
  • access="public"
  • returntype="void"
  • output="true"
  • hint="I execute the page request.">
  •  
  • <!--- Define arguments. --->
  • <cfargument
  • name="script"
  • type="string"
  • required="true"
  • hint="I am the requested script name."
  • />
  •  
  • <!--- Include the requested page. --->
  • <cfinclude template="#arguments.script#" />
  •  
  • <!--- Return out. --->
  • <cfreturn />
  • </cffunction>
  •  
  • </cfcomponent>

I won't go into too much detail since I've talked about this before AND I need to start working. The pattern matching works just as it did in my previous posts - the difference being that I'm pulling my information out of the CGI.PATH_INFO value. The real gotcha in this approach is that the relative web root becomes a bit more complicated. Not only can pathing be done using URL rewriting as well as manually entered by the user but, the relative web root needs to take into account not only the requested script but also the path_info as well. Hopefully, in the code comments, it is clear what I am doing.

In the end, I have to say that I rather like that PATH_INFO approach if for no other reason, that it can be done without any URL rewriting at all. Thanks Justice for the suggestion.




Reader Comments

Excellent post, Ben.

Very interesting to see the built-in handler methods used to help control the rewrite rules.

Time to have a play with this myself :)

Reply to this Comment

@Matt,

Yeah, I think I might want to convert my blog over to using this approach. I use a similar technique, but all 404 powered (as thrown by IIS). This seems much cleaner - and that I can simulate it *without* any rewriting as well (just straight up path_info) is rather awesome.

Reply to this Comment

@Ben,

Thanks for the shoutout! And good job getting this kind of scenario up and running, and then writing a good post about it.

I see you decided to try to tackle a thorny problem that I was thinking about too for a while. In the end, I decided just to sidestep the whole problem. This is the approach I took:

Any path that matches with '/media/.*' gets passed through unaltered. Obviously, that means you put all CSS/JavaScript/images somewhere in a folder 'media' which is itself directly in the webroot.

Next, *rewrite every single path* from '/foo/bar' to '/index.cfm/foo/bar'. No questions asked. Rewrite will always be performed. No more wondering whether this is *really* a rewrite. (Recall that all of the static file paths already got matched by the previous rule and don't get rewritten by this rule.)

To create a more portable app, you would actually want to support the following two scenarios: *no* paths are rewritten throughout the whole application, and *all* paths are rewritten throughout the whole application (except for paths in the '/media' folder). But this setting can be set statically somewhere, depending on how the app is deployed, and it is a global setting, rather than a per-page-request setting.

I wouldn't mix-and-match rewriting and non-rewriting within the same app and within the same deployment of that app, because that tends to make things ... complicated. My approach certainly seems more restrictive at the outset, but I think that it trades away the ability to do something you don't really really need to do anyway for a little bit of extra simplicity in your life. Supporting both '/foo/bar' and '/index.cfm/foo/bar' at the same time is tricky for the reasons you mentioned (what if it's just '/index.cfm', what about relative paths, etc.), but in the end I don't think you really need to go through all that complexity because I think that should be a global, app-wide setting, not a per-page-request setting.

Happy coding!

Cheers,
Justice

Reply to this Comment

@Justice,

The one thing, though, that I can't quite figure out is how to reconcile what the web browser sees with what the app engine sees. After all, the rewriting happens at the server level, not the browser level (unless you do a hard redirect). As such, even I were to rewrite *all* the URLs, the browser would still be fooled by its own path_info.

For example, let's say someone does type in:

index.cfm/foo/bar/

... even if I rewrite on the server, the browser still sees the user as 3 levels deep, which will necessitate a web root of "../../../".

Even with a rewrite-everything rule, how do you deal with that?

Reply to this Comment

@Ben,

The web browser should always see '/foo/bar'. All of your URLs in the HTML should say '/foo/bar'. If you have a link in your HTML to, say '../diz', then that would be resolved to '/foo/bar/../diz' = '/foo/diz'. Behind the scenes, there would be an '/index.cfm' involved in processing everything, but that is invisible to the browser.

I typically like to use absolute paths in my links. So in the HTML for '/foo/bar' I might have a link to '/foo/diz' rather than a link to '../diz'.

It should be impossible for the user to access '/index.cfm' directly. In other words, if the user typed in '/index.cfm', then the rewrite engine would kick in and then ColdFusion would actually see '/index.cfm/index.cfm'. Likewise, if the user typed in '/default.php', then IIS would do an internal redirect and would see '/index.cfm/default.php' and would send the request to CF instead of PHP. Likewise, if the user types in '/index.cfm/foo/bar' then IIS does an internal redirect and calls the '/index.cfm' template with a PATH_INFO of '/index.cfm/foo/bar'.

Ultimately, *no* templates are directly accessible from the browser. Of course if someone tries to access '/foo/bar/baz.cfm' from the browser, then IIS does an internal rewrite to '/index.cfm/foo/bar/baz.cfm' and then your '/index.cfm' file is allowed to check if the PATH_INFO value of '/foo/bar/baz.cfm' exists as a real path on the server and then cfinclude that template. I actually consider this to be a benefit.

Cheers
Justice

Reply to this Comment

@Justice,

OK, I see what you're saying. So you rewrite every file, whether it exists or not. I hadn't thought of that. I think I was in the world of physical files for so long, that it never occurred to me to write for files that existed (only for ones that don't).

I think there is a certain amount of sense to what you are doing - everything goes through the front controller, whether you like it or not (outside your Media folder of course).

My only concern with this approach would be that you'd have to really work that angle from the get-go, otherwise, you could quite easily cripple a site. My blog, for example, has all kinds of random things on it (presentations, demos, sample apps) that exist outside the main framework. As such, there is no logic in my front controller that knows how to handle that.

Of course, I could always just add additional exceptions for that (such as nothing in the "resources" folder gets re-written).

Reply to this Comment

@Ben,

Exactly. In my head, rewriting is the rule, and the /media/ and /resources/ dirs are the exception.

One thing that you can easily do within a pre-existing app is to have a special folder like /go/ where only stuff like '/go/foo/bar' gets rewritten to '/go/index.cfm/foo/bar'. So that makes the rewriting the exception and leave-it-alone the rule.

So, purely as an example for your blog, instead of browsing to '/index.cfm?dax=blog:1744.view', we would browse to '/blog/posts/1744'. This would get rewritten internally to '/blog/index.cfm/posts/1744' (and that path would be inaccessible from the browser). But only paths that already start with '/blog/' would get rewritten in this way - all other paths would be left alone, leaving the way most of your site works intact.

Cheers,
Justice

Reply to this Comment

@Justice,

Thanks for the clarification. I suppose that makes sense; I always default to creating some sort of flexibility into my code. But, I think that desire is completely arbitrary - not dictated by an actual need.

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.