Short-Circuit Evaluation Is Fast

Posted June 2, 2006 at 8:23 AM by Ben Nadel

Tags: ColdFusion, Search Engine Optimization

As I wrote some time ago, taking Michael Dinowitz's advice, I turn off session management for Spiders and Bots in an effort to cut down on memory usage on the server. See, spiders do not accept client cookies and therefore (on my sites) cannot hold sessions. Consequently, they start a new session for each page request they make. Since sessions take some time to timeout, this ends of creating large numbers of session variables that go unused (in proportion to the number of pages spidered).

When I first did this I used a Regular Expression (RegEx) to check for commonly known spider user agents (CGI.http_user_agent). It looked something like:

  • if (REFindNoCase( "slurp|googlebot|....", CGI.http_user_agent )){

This works great; however, I started adding more spiders to the list (as they started hitting my site) and I starting to fear that it wasn't efficient. If you ever look at how a regular expression works by using a program such as The RegEx Coach you can actually step through the RegEx path and you will see that for every character it comes across in the target sting, it does a LOT of logic for the regular expression. And, the larger the expression the more the logic.

This got me thinking about short-circuit evaluation. I am not sure which version brought this on board, but ColdFusion MX 7 has this feature, this optimization. This optimization means that evaluation of a relational expression in an IF statement is terminated just as soon as it is possible to tell what the result will be. Meaning that if you have several parts of a single IF statement and the first can determine the fate of the IF, then the remaining parts are not evaluated.

For example, in the following example, only the first value is checked:

if (false AND true AND true AND true){ ... }

Since the "false" makes the statement false no matter what the rest of the arguments are, the remaining "true" statement are not even evaluated.

I have taken this idea and applied it to the problem of turning off session management for spiders. Instead of using a regular expression, I break out each comparison to its own sub-part of an IF statement:

  • // Define the application. To stop unnecessary memory usage, we are going
  • // to give web crawler no session management. This way, they don't have
  • // to worry about cookie acceptance and object persistence (except for
  • // APPLICATION scope). Here, we are using short-circuit evaluation on the
  • // IF statement with the most popular search engines at the top of the
  • // list. This will help us minimize the amount of time that it takes to
  • // evaluate the list.
  • if (
  • (NOT Len(CGI.http_user_agent)) OR
  • FindNoCase( "Slurp", CGI.http_user_agent ) OR
  • FindNoCase( "Googlebot", CGI.http_user_agent ) OR
  • FindNoCase( "BecomeBot", CGI.http_user_agent ) OR
  • FindNoCase( "msnbot", CGI.http_user_agent ) OR
  • FindNoCase( "Mediapartners-Google", CGI.http_user_agent ) OR
  • FindNoCase( "ZyBorg", CGI.http_user_agent ) OR
  • FindNoCase( "RufusBot", CGI.http_user_agent ) OR
  • FindNoCase( "EMonitor", CGI.http_user_agent ) OR
  • FindNoCase( "researchbot", CGI.http_user_agent ) OR
  • FindNoCase( "IP2MapBot", CGI.http_user_agent ) OR
  • FindNoCase( "GigaBot", CGI.http_user_agent ) OR
  • FindNoCase( "Jeeves", CGI.http_user_agent ) OR
  • FindNoCase( "Exabot", CGI.http_user_agent ) OR
  • FindNoCase( "SBIder", CGI.http_user_agent ) OR
  • FindNoCase( "findlinks", CGI.http_user_agent ) OR
  • FindNoCase( "YahooSeeker", CGI.http_user_agent ) OR
  • FindNoCase( "MMCrawler", CGI.http_user_agent ) OR
  • FindNoCase( "MJ12bot", CGI.http_user_agent ) OR
  • FindNoCase( "OutfoxBot", CGI.http_user_agent ) OR
  • FindNoCase( "jBrowser", CGI.http_user_agent ) OR
  • FindNoCase( "ZiggsBot", CGI.http_user_agent ) OR
  • FindNoCase( "Java", CGI.http_user_agent ) OR
  • FindNoCase( "PMAFind", CGI.http_user_agent ) OR
  • FindNoCase( "Blogbeat", CGI.http_user_agent ) OR
  • FindNoCase( "TurnitinBot", CGI.http_user_agent ) OR
  • FindNoCase( "ConveraCrawler", CGI.http_user_agent ) OR
  • FindNoCase( "Ocelli", CGI.http_user_agent ) OR
  • FindNoCase( "Labhoo", CGI.http_user_agent ) OR
  • FindNoCase( "Validator", CGI.http_user_agent ) OR
  • FindNoCase( "sproose", CGI.http_user_agent ) OR
  • FindNoCase( "oBot", CGI.http_user_agent ) OR
  • FindNoCase( "MyFamilyBot", CGI.http_user_agent ) OR
  • FindNoCase( "Girafabot", CGI.http_user_agent ) OR
  • FindNoCase( "aipbot", CGI.http_user_agent ) OR
  • FindNoCase( "ia_archiver", CGI.http_user_agent ) OR
  • FindNoCase( "Snapbot", CGI.http_user_agent ) OR
  • FindNoCase( "Larbin", CGI.http_user_agent ) OR
  • FindNoCase( "psycheclone", CGI.http_user_agent ) OR
  • FindNoCase( "ColdFusion", CGI.http_user_agent )
  • ){
  •  
  • // This application definition is for robots that do NOT need sessions.
  • THIS.Name = "KinkySolutions v.1 {dev}";
  • THIS.SessionManagement = false;
  • THIS.SetClientCookies = false;
  • THIS.ClientManagement = false;
  • THIS.SetDomainCookies = false;
  •  
  • // Set the flag for session use.
  • REQUEST.HasSessionScope = false;
  •  
  • } else {
  •  
  • // This application is for the standard user.
  • THIS.Name = "KinkySolutions v.1 {dev}";
  • THIS.SessionManagement = true;
  • THIS.SetClientCookies = true;
  • THIS.SessionTimeout = CreateTimeSpan(0, 0, 20, 0);
  • THIS.LoginStorage = "SESSION";
  •  
  • // Set the flag for session use.
  • REQUEST.HasSessionScope = true;
  •  
  • }

Now, regular expressions do short-circuit evaluation also, so the difference here is subtle. Let's say that we get a page request from a non-spider user agent. This is the "worst case" scenario since we will have to check every spider value against the string. With a regular expression, we would have to run through the matching processing for each of the (N) spider values for each of the (C) characters in the user agent. That's NxC iterations. However, in the compound IF statement, we would only have to run the matching process for each spider for each (U) instance of a user agent. That's just NxU and since U is always one, its just N number of iterations.

Now this is misleading because for string comparison, the substrings still have to match against many characters in the target string, but I am sure (but do not know for a fact) that literal matching must be faster than RegEx matching since there is not "logic" to literal matching.

If we do get a spider request that is a popular spider (higher in the IF statement, earlier in the regular expression), it's still faster to have the compound IF statement. See, the regular expression still needs to be checked in it's entirety for EACH character it comes across in the target string. But the IF statement only needs a sub-set of the IF sub-part run just once.

Of course, in practicality, they all run between 0-16ms per page hit. With large iterations (10,000+), the compound IF statement is levels of magnitude faster.

Furthermore, you can make it even faster by creating a temporary string of the LCase() of the user agent and then doing Find() rather than FindNoCase() for each sub-part (not shown above).



Reader Comments

Oct 24, 2007 at 3:48 PM // reply »
92 Comments

Would a switch statement be even faster in this case Ben?


Oct 24, 2007 at 7:36 PM // reply »
11,238 Comments

@Andy,

A Switch statement would not quite work in a situation like this since we are not matching on the entire user agent, just parts of it.


Mar 14, 2009 at 7:42 PM // reply »
2 Comments

For the record, I was not able to get this working w/ Model Glue because some Model Glue CFC accesses the Session Scope. (ModelGlue.unity.statebuilder.StateBuilder.cfc

I ended up using this approach as a way to not trigger some logging code, but memory variables are still floating out there.

For some odd reason the most common bot that hits my site is ColdFusion


Jun 28, 2009 at 9:37 PM // reply »
16 Comments

I have something similar in mine but I use a different tack; I set the session timeout shortly for any agent without a cookie.jsessionid (using j2ee sessions).

You might consider using that as your first IF check since a bot won't have a cookie.jsessionid (or a cfid/cftoken).


Jun 29, 2009 at 2:11 AM // reply »
34 Comments

Hey Ben

Assume that you develope this for a client who has no idea how to edit the Application.cfc file. Could you just simply store all the bots into a database and just loop over them?

Wouldn't that make it slightly a bit easier to manage? Or perhabs instead of checking for bots just check to see if the user is using an a common browser. I assume this would be an easier step considering there are more bots than there are browser types. Just saying...


Jun 29, 2009 at 8:38 AM // reply »
11,238 Comments

@Brian,

Do the CFID / CFTOKEN values exist at that point in the first page request? I'd have to double-check on that.

@Jody,

True, there a number of ways to do this. To be honest, I rarely ever update this logic. I haven't even thought much about in the last few years. For all I know - there might be more bots hitting my site than I realize :)

Yeah, you could probably just check to see if there is a standard user agent.


Jun 29, 2009 at 10:46 AM // reply »
16 Comments

Actually I may be wrong here - cookies get set early on in the request lifecycle even if they don't stick. I am pretty sure I need to use my own cookie rather than one of the built-ins. Like:

structKeyExists(cookie, "NEEDCOOKIES")

And set it AFTER that check, meaning your first request will get you a short timeout and your second will get you a regular one. We use a cookie like this anyways to verify people can register/stay signed in so I will need to look at using this with your bot check.


Jul 3, 2009 at 9:07 AM // reply »
11,238 Comments

@Brian,

Ooooh, I see what you're saying. Oh I really like that idea. Thanks!


May 7, 2012 at 3:28 PM // reply »
1 Comments

I am a bit confused as to why you would do this in the Application.cfc? So every time a bot visits your site, the application is going to loose session for all users. Where in the Application.cfc are you putting this code? OnRequestStart? Please explain how this does not corrupt the application scope.


Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
May 21, 2013 at 11:51 AM
Ask Ben: Parsing Very Large XML Documents In ColdFusion
Looking at my first ever XML document that I have to parse and put into MS SQL 2000 with CF8. I get it to list the desired Field name, many times over, and have a long list of this field name displa ... read »
May 21, 2013 at 9:25 AM
Turning Off and On Identity Column in SQL Server
you are awesome..i am lucky to get this blog between such a garbage one....Thanks, Prashant ... read »
May 20, 2013 at 4:38 PM
Using A Dynamic Column Name With ValueList() In ColdFusion
@Dana, Your confusion is well founded, since this is a very confusing features. In fact, it ONLY works if you use array notation. Meaning, that this: arrayToList( query[ "columnName" ] ) ... read »
May 20, 2013 at 4:34 PM
Using A Dynamic Column Name With ValueList() In ColdFusion
I was thinking chicken and the egg, I wouldn't have expected it to work in the valuelist going in I guess. Maybe I just need a beer, long day :) ... read »
May 20, 2013 at 4:29 PM
Using A Dynamic Column Name With ValueList() In ColdFusion
@Dana, That's if you're trying to reference a specific row. In this case, we're trying to reference the entire query column as one cohesive value. So, you are correct that if you wanted to output a ... read »
May 20, 2013 at 4:24 PM
Using A Dynamic Column Name With ValueList() In ColdFusion
I thought when you used array notation to reference queries you always had to have the row or it would throw a similar error as well? ... read »
May 20, 2013 at 11:45 AM
Using jQuery's Animate() Step Callback Function To Create Custom Animations
This is really useful. I found out that you don't actually have to use a dummy css property (surprisingly). To animate a property in a linear-gradient for instance I did this this.css('someLinearGra ... read »
May 20, 2013 at 10:51 AM
Using A Dynamic Column Name With ValueList() In ColdFusion
@Josh, Oh snap! You're totally right! I'm not sure I've ever tried that. I did know that you can call a number of other array-methods on ColdFusion query columns: http://www.bennadel.com/blog/167 ... read »
InVision App - Prototyping Made Beautiful With Prototyping Tools