Updated Session Management And Web Spiders & Bots

Posted June 9, 2006 at 7:39 AM

Tags: ColdFusion

As the list of spiders that hits my site grows, I am trying to keep the session management under control. I don't want to offer spidres sessions since I don't want to have unused session variables taking up RAM on the server. But at the same time, I don't want to add a lot of processing overhead to each page just checking to see if a user is really a user or really a spider/bot. Afterall, the server does have 4 gigs or RAM. Processing speed and the user experience is more important to me than RAM usage.

But, in the interest of optimization I have combined several of my user agent checks into one Regular Expression (RegEx) search for the string "bot" on a word boundry: "bot\b". As you can see below, this takes care of 18 user agent types. I doubt that this will give me any false positives on standard browsers, but if it does, the only difference is that they will not get sessions.

In previous posts I talked about how Short-Circuit evaluation was faster than large regular expressions. I have not gone back on this. While this is a regular expression, it is not a variable-length regular expression. It is meerly a qualified standard string search (qualified by the word bountry) and is therefore very fast.

 Launch code in new window » Download code as text file »

  • // Create a lowercase version of the user agent so we can run without
  • // NoCase checks.
  • strTempUserAgent = LCase( CGI.http_user_agent );
  •  
  • // Check user agent.
  • if (
  • (NOT Len(strTempUserAgent)) OR
  •  
  • // We are gonna try to optimize even a little bit more. A good number
  • // of the spider names end in "bot". If we check for names that have
  • // BOT ending on a word bountry, we can eliminate severl of the other
  • // spider checkes. The bot\b search here takes care of the spiders
  • // that are now commented out below. As you can see, it takes the
  • // place of 18 different spider Find()'s.
  • REFind( "bot\b", strTempUserAgent ) OR
  •  
  • Find( "slurp", strTempUserAgent ) OR
  • // Find( "googlebot", strTempUserAgent ) OR
  • // Find( "becomebot", strTempUserAgent ) OR
  • // Find( "msnbot", strTempUserAgent ) OR
  • Find( "mediapartners-google", strTempUserAgent ) OR
  • Find( "zyborg", strTempUserAgent ) OR
  • // Find( "rufusbot", strTempUserAgent ) OR
  • Find( "emonitor", strTempUserAgent ) OR
  • // Find( "researchbot", strTempUserAgent ) OR
  • // Find( "ip2mapbot", strTempUserAgent ) OR
  • // Find( "gigabot", strTempUserAgent ) OR
  • Find( "jeeves", strTempUserAgent ) OR
  • // Find( "exabot", strTempUserAgent ) OR
  • Find( "sbider", strTempUserAgent ) OR
  • Find( "findlinks", strTempUserAgent ) OR
  • Find( "yahooseeker", strTempUserAgent ) OR
  • Find( "mmcrawler", strTempUserAgent ) OR
  • // Find( "mj12bot", strTempUserAgent ) OR
  • // Find( "outfoxbot", strTempUserAgent ) OR
  • Find( "jbrowser", strTempUserAgent ) OR
  • // Find( "ziggsbot", strTempUserAgent ) OR
  • Find( "java", strTempUserAgent ) OR
  • Find( "pmafind", strTempUserAgent ) OR
  • Find( "blogbeat", strTempUserAgent ) OR
  • // Find( "turnitinbot", strTempUserAgent ) OR
  • Find( "converacrawler", strTempUserAgent ) OR
  • Find( "ocelli", strTempUserAgent ) OR
  • Find( "labhoo", strTempUserAgent ) OR
  • Find( "validator", strTempUserAgent ) OR
  • Find( "sproose", strTempUserAgent ) OR
  • // Find( "obot", strTempUserAgent ) OR
  • // Find( "myfamilybot", strTempUserAgent ) OR
  • // Find( "girafabot", strTempUserAgent ) OR
  • // Find( "aipbot", strTempUserAgent ) OR
  • Find( "ia_archiver", strTempUserAgent ) OR
  • // Find( "snapbot", strTempUserAgent ) OR
  • Find( "larbin", strTempUserAgent ) OR
  • Find( "psycheclone", strTempUserAgent )
  • // Find( "IRLbot", strTempUserAgent )
  • ){
  •  
  • // This application definition is for robots that do NOT need sessions.
  • THIS.Name = "KinkySolutions v.1 {dev}";
  • THIS.SessionManagement = false;
  • THIS.SetClientCookies = false;
  • THIS.ClientManagement = false;
  • THIS.SetDomainCookies = false;
  •  
  • // Set the flag for session use.
  • REQUEST.HasSessionScope = false;
  •  
  • } else {
  •  
  • // This application is for the standard user.
  • THIS.Name = "KinkySolutions v.1 {dev}";
  • THIS.SessionManagement = true;
  • THIS.SetClientCookies = true;
  • THIS.SessionTimeout = CreateTimeSpan(0, 0, 20, 0);
  • THIS.LoginStorage = "SESSION";
  •  
  • // Set the flag for session use.
  • REQUEST.HasSessionScope = true;
  •  
  • }

Download Code Snippet ZIP File

Post Comment  |  Ask Ben  |  Permalink  |  Other Searches  |  Print Page




Learning ColdFusion 9 - ColdFusion 9 tutorials, samples, examples, demos

Reader Comments

There are no comments posted for this web log entry.


Post Comment  |  Ask Ben

Recent Blog Comments
Nov 20, 2009 at 5:38 PM
Learning ColdFusion 8: CFImage Part I - Reading And Writing Images
Hi Ben, Great article. I've been looking around to see if ColdFusion image engine can programatically create the following "wrap around" effect: http://www.creativepro.com/article/photoshop-s-she ... read »
Nov 20, 2009 at 5:35 PM
Maintaining ColdFusion Sessions Across SMS Text Message Requests Without Cookies
@Dave: I talked to Gert he suggested: <cfhttp method="get" url="http://{some cf website}" result="stuff" addtoken="yes" /> Note the addition of cfhttp attribute addtoken. That should persist y ... read »
Nov 20, 2009 at 5:23 PM
Maintaining ColdFusion Sessions Across SMS Text Message Requests Without Cookies
@Todd, Ahh, gotcha, yeah that makes sense. ... read »
Nov 20, 2009 at 5:17 PM
Maintaining ColdFusion Sessions Across SMS Text Message Requests Without Cookies
Ben, sorry if I didn't make this clear. You can make it work like that if you want, just put <cfset session.foo = 1> (and <cfset application.foo = 1>) in your OnRequestStart() and it reve ... read »
Nov 20, 2009 at 5:07 PM
Maintaining ColdFusion Sessions Across SMS Text Message Requests Without Cookies
@Todd, I have seen tidbits about the way Railo handles session. I can understand that it lazy-loads sessions, but I also think that I might make some things more complicated. For example, often tim ... read »
Nov 20, 2009 at 4:53 PM
Maintaining ColdFusion Sessions Across SMS Text Message Requests Without Cookies
Ben, you can ramp up the security by turning on J2EE session which gives you a third set of numbers other than CFID/CFTOKEN. There's a reason why ACF put this in place (other than just session replic ... read »
Nov 20, 2009 at 4:52 PM
Maintaining ColdFusion Sessions Across SMS Text Message Requests Without Cookies
Case in point, Ben, you may not be aware of this, but in Railo - OnApplicationStart() & OnSessionStart() act differently than in ACF. ACF does: OnApplicationStart (1st hit) OnSessionStart (1st and e ... read »
Nov 20, 2009 at 4:46 PM
Maintaining ColdFusion Sessions Across SMS Text Message Requests Without Cookies
@Todd, That's understandable. I am not sure if this really leaves any more security holes than the fact that using old cookie-based CFID / CFTOKEN values will create a new session using the old CFI ... read »