I had a series of blog posts a while back that discussed ColdFusion session management and spiders. In those posts, I was actually disabling session management for users that were believed to be spiders or bots. This was a technique that I originally learned from Michael Dinowitz. Some time later, in a discussion also with Michael Dinowitz, he was explaining to me that he no longer did this. Instead, he used a slightly altered technique in which all users get session management with the caveat that the session timeout given to spiders and bots is much smaller (around 2 seconds).
When you have one user that gets session management and one user that does not, your page requires additional logic in all places that touch the user's session object; certain code will have to be excluded from execution if a user has no session. This new technique allows the page to execute without exception cases while at the same time accounting for the "pseudo memory leak" caused by extensive and cookieless spider traffic.
The entries that I have been posting recently about sessions that expire mid-page were in preparation for this post. I wanted to make sure that giving a user a very short session timeout would not cause problems on pages that had a longer than usual execution time. And, since we have found that the SESSION object is available for the entire request no matter what happens in terms of a timeout, I now feel it is safe to introduce this code.
While I use Application.cfc almost exclusively, I have decided to demonstrate this using Application.cfm. Some people have asked to see my original session management posts with the CFApplication tag, and since these are along the same lines, I figured I would downgrade the example to cover more bases.
All the important logic here takes place in the Application.cfm where the ColdFusion application is defined:
<!--- Check to see if we have a standard user or a spider. We can test the user agent for commonly known spider bots. First, let's get the user agent in lower case so that we can do faster non-case-sensitive testing. ---> <cfset REQUEST.UserAgent = LCase( CGI.http_user_agent ) /> <!--- Now, let's check for a spider and set the session timeout accordingly (we will single out the session timeout so that we can define the application in a single place). In addition, we are adding a special diagnostic test so that the user can test the short session (since it's more complicated to spoof a spider). ---> <cfif ( <!--- Run diagnostic test. ---> StructKeyExists( URL, "TestShortSession" ) OR <!--- Test user agents. ---> (NOT Len( REQUEST.UserAgent )) OR REFind( "bot\b", REQUEST.UserAgent ) OR Find( "crawl", REQUEST.UserAgent ) OR REFind( "\brss", REQUEST.UserAgent ) OR Find( "feed", REQUEST.UserAgent ) OR Find( "news", REQUEST.UserAgent ) OR Find( "blog", REQUEST.UserAgent ) OR Find( "reader", REQUEST.UserAgent ) OR Find( "syndication", REQUEST.UserAgent ) OR Find( "coldfusion", REQUEST.UserAgent ) OR Find( "slurp", REQUEST.UserAgent ) OR Find( "google", REQUEST.UserAgent ) OR Find( "zyborg", REQUEST.UserAgent ) OR Find( "emonitor", REQUEST.UserAgent ) OR Find( "jeeves", REQUEST.UserAgent ) )> <!--- This is a spider, so set a really small timeout. In this case, we are going to go with 2 seconds. ---> <cfset REQUEST.SessionTimeout = CreateTimeSpan( 0, 0, 0, 2 ) /> <cfelse> <!--- This is a standard web user, so allocate a standard session timeout of 20 minutes. ---> <cfset REQUEST.SessionTimeout = CreateTimeSpan( 0, 0, 20, 0 ) /> </cfif> <!--- ASSERT: At this point, no matter what type of user is visiting the site, we know what session timeout to supply. ---> <!--- Define the application with given session timeout. ---> <cfapplication name="SessionTesting" applicationtimeout="#CreateTimeSpan( 0, 1, 0, 0 )#" sessionmanagement="true" sessiontimeout="#REQUEST.SessionTimeout#" /> <!--- Set page request settings. ---> <cfsetting showdebugoutput="false" requesttimeout="20" />
When it comes to defining the CFApplication tag with our goal in mind, the only different is the SessionTimeout property. Everything else about the CFApplication tag is exactlt the same. As such, we are using our logic to store the session timeout in a variable and then just defining the application in one place with one CFApplication tag.
In order to set the proper session timeout, we are testing the user agent making the page request. As discussed in my previous posts on ColdFusion session management, many popular spiders, bots, and RSS feed readers have special user agents that set them apart from your standard FireFox, Safari, and IE users. Therefore, by testing the existence of these markers, we can figure out (with good success) who is who.
In addition to testing user agents, you will notice that the first line of my CFIF statement checks for the TestShortSession key in the URL scope. This is a hook for developers to test page requests that have short sessions without having to spoof a spider's user agent.
Then, just to test to make sure this was working, I set up a simple index.cfm page that does a CFDump of the application settings using the undocumented GetApplicationSettings() method:
<!--- Display the applicaiton settings using the undocumented function, GetApplicationSettings(), which gives us access to all the application properties. ---> <cfdump var="#APPLICATION.GetApplicationSettings()#" label="Application Settings" />
Now, running the page as a standard user, we get the following CFDump output:
Notice that the SessionTimeout has the value 1200. This is the number of seconds allocated for session timeout (1200 seconds = 20 minutes * 60 seconds). This is just what we want for a standard user.
Now, when we re-run the page, putting ?TestShortSession in the URL, we get the following CFDump output:
Notice that this time, the SessionTimeout value is 2 seconds. This is just what we want for spiders and bots so that even if a spider hits your site 10,000 consecutive times, creating a new session each time, at least that memory usage explosion will be very short lived.
Want to use code from this post? Check out the license.