Updated Session Management And Web Spiders & Bots

Posted June 9, 2006 at 7:39 AM

Tags: ColdFusion

As the list of spiders that hits my site grows, I am trying to keep the session management under control. I don't want to offer spidres sessions since I don't want to have unused session variables taking up RAM on the server. But at the same time, I don't want to add a lot of processing overhead to each page just checking to see if a user is really a user or really a spider/bot. Afterall, the server does have 4 gigs or RAM. Processing speed and the user experience is more important to me than RAM usage.

But, in the interest of optimization I have combined several of my user agent checks into one Regular Expression (RegEx) search for the string "bot" on a word boundry: "bot\b". As you can see below, this takes care of 18 user agent types. I doubt that this will give me any false positives on standard browsers, but if it does, the only difference is that they will not get sessions.

In previous posts I talked about how Short-Circuit evaluation was faster than large regular expressions. I have not gone back on this. While this is a regular expression, it is not a variable-length regular expression. It is meerly a qualified standard string search (qualified by the word bountry) and is therefore very fast.

 Launch code in new window » Download code as text file »

  • // Create a lowercase version of the user agent so we can run without
  • // NoCase checks.
  • strTempUserAgent = LCase( CGI.http_user_agent );
  •  
  • // Check user agent.
  • if (
  • (NOT Len(strTempUserAgent)) OR
  •  
  • // We are gonna try to optimize even a little bit more. A good number
  • // of the spider names end in "bot". If we check for names that have
  • // BOT ending on a word bountry, we can eliminate severl of the other
  • // spider checkes. The bot\b search here takes care of the spiders
  • // that are now commented out below. As you can see, it takes the
  • // place of 18 different spider Find()'s.
  • REFind( "bot\b", strTempUserAgent ) OR
  •  
  • Find( "slurp", strTempUserAgent ) OR
  • // Find( "googlebot", strTempUserAgent ) OR
  • // Find( "becomebot", strTempUserAgent ) OR
  • // Find( "msnbot", strTempUserAgent ) OR
  • Find( "mediapartners-google", strTempUserAgent ) OR
  • Find( "zyborg", strTempUserAgent ) OR
  • // Find( "rufusbot", strTempUserAgent ) OR
  • Find( "emonitor", strTempUserAgent ) OR
  • // Find( "researchbot", strTempUserAgent ) OR
  • // Find( "ip2mapbot", strTempUserAgent ) OR
  • // Find( "gigabot", strTempUserAgent ) OR
  • Find( "jeeves", strTempUserAgent ) OR
  • // Find( "exabot", strTempUserAgent ) OR
  • Find( "sbider", strTempUserAgent ) OR
  • Find( "findlinks", strTempUserAgent ) OR
  • Find( "yahooseeker", strTempUserAgent ) OR
  • Find( "mmcrawler", strTempUserAgent ) OR
  • // Find( "mj12bot", strTempUserAgent ) OR
  • // Find( "outfoxbot", strTempUserAgent ) OR
  • Find( "jbrowser", strTempUserAgent ) OR
  • // Find( "ziggsbot", strTempUserAgent ) OR
  • Find( "java", strTempUserAgent ) OR
  • Find( "pmafind", strTempUserAgent ) OR
  • Find( "blogbeat", strTempUserAgent ) OR
  • // Find( "turnitinbot", strTempUserAgent ) OR
  • Find( "converacrawler", strTempUserAgent ) OR
  • Find( "ocelli", strTempUserAgent ) OR
  • Find( "labhoo", strTempUserAgent ) OR
  • Find( "validator", strTempUserAgent ) OR
  • Find( "sproose", strTempUserAgent ) OR
  • // Find( "obot", strTempUserAgent ) OR
  • // Find( "myfamilybot", strTempUserAgent ) OR
  • // Find( "girafabot", strTempUserAgent ) OR
  • // Find( "aipbot", strTempUserAgent ) OR
  • Find( "ia_archiver", strTempUserAgent ) OR
  • // Find( "snapbot", strTempUserAgent ) OR
  • Find( "larbin", strTempUserAgent ) OR
  • Find( "psycheclone", strTempUserAgent )
  • // Find( "IRLbot", strTempUserAgent )
  • ){
  •  
  • // This application definition is for robots that do NOT need sessions.
  • THIS.Name = "KinkySolutions v.1 {dev}";
  • THIS.SessionManagement = false;
  • THIS.SetClientCookies = false;
  • THIS.ClientManagement = false;
  • THIS.SetDomainCookies = false;
  •  
  • // Set the flag for session use.
  • REQUEST.HasSessionScope = false;
  •  
  • } else {
  •  
  • // This application is for the standard user.
  • THIS.Name = "KinkySolutions v.1 {dev}";
  • THIS.SessionManagement = true;
  • THIS.SetClientCookies = true;
  • THIS.SessionTimeout = CreateTimeSpan(0, 0, 20, 0);
  • THIS.LoginStorage = "SESSION";
  •  
  • // Set the flag for session use.
  • REQUEST.HasSessionScope = true;
  •  
  • }

Download Code Snippet ZIP File

Post Comment  |  Ask Ben  |  Permalink  |  Other Searches  |  Print Page




Learning ColdFusion 9 - ColdFusion 9 tutorials, samples, examples, demos

Reader Comments

There are no comments posted for this web log entry.


Post Comment  |  Ask Ben

Recent Blog Comments
Nov 7, 2009 at 5:53 PM
Ask Ben: Javascript String Replace Method
You can find here an advanced function that prepared with javascript replace function. This can make the first letters of words, sentences, lines and whatever you define automatically: http://www.m ... read »
Andrew Neely
Nov 7, 2009 at 4:56 PM
A Moment That Touched Me - The Fountainhead
Ben, Glad you enjoyed the podcast. Yeah, the Tank Riot guys can get really chatty during the episodes, but that's part of the charm of it for me. They've covered everything from Nichola Tesla to Cha ... read »
Nov 7, 2009 at 4:43 PM
Building A Fixed-Position Bottom Menu Bar (ala FaceBook)
Is it possible to make some more MenĂ¼`s ? ... read »
Jill
Nov 7, 2009 at 11:40 AM
How To Unformat Your Code (Like A Pro)
Derek, I think you might be right - sweet! Thanks for the link :) ... read »
Nov 7, 2009 at 11:25 AM
How To Unformat Your Code (Like A Pro)
I think it would be way easier to just use this http://www.logichammer.com/html-formatter/ He just released v3 and it rocks. ... read »
Jill
Nov 7, 2009 at 7:58 AM
How To Unformat Your Code (Like A Pro)
LMAO - this was pretty funny! I have to admit - I also love to reformat code so I can read it. My boss used to tell me to leave my OCD at home. Now I don't feel so bad after reading everyone else' ... read »
Nov 6, 2009 at 10:10 PM
How To Unformat Your Code (Like A Pro)
The timing of this post is just uncanny. I spent the last 15-20 minutes manually un-formatting my "Ben Nadel" style code within a CFC of mine. I was really digging the readability a few weeks ago, bu ... read »
Roe
Nov 6, 2009 at 5:11 PM
Passing Arrays By Reference In ColdFusion - SWEEET!
ArraySort also reorders the results of these java obj's ... read »