Testing String Equality Of Any Length Happens Instantly In ColdFusion

Posted May 4, 2007 at 7:25 AM by Ben Nadel

Tags: ColdFusion

I was looking at someone's SQL error the other day when it turned out that the dude was trying to use a TEXT data field in a GROUP BY clause. This is not allowed (at least in SQL Server). I am not exactly sure why (either it's too costly or database structure is not built to grab text blobs on the fly?) but it got me thinking about comparing very large strings (such as the comparison that would be in a GROUP BY).

What happens when you compare a string of any length? Does the cost of the comparisons relate to the lengths of the strings in question? Let's see:

  • <!---
  • Store a large string which will then used to build an
  • even bigger string (FYI: I wrote this as a 300 word count
  • assignment in Creative Writing at Tufts).
  • --->
  • <cfsavecontent variable="strTextA">
  • She looked peaceful; eyes closed, head tilted down, breathing
  • - controlled and soft. Areas of fabric, turned dark with
  • sweat, stuck to her body revealing the grizzly figure below
  • it. Cut-off army fatigues could do little to hide her massive
  • and striated legs. Femininity with a warrior's touch.
  • Exhaling sharply, she opened her eyes and mounted the machine
  • next to her. With pads resting on her shoulders and white-
  • knuckled fists grabbing at the handles, she took one deep
  • breath, held it, then lifted the weight from its stack. Her
  • face, once soft and peaceful, was now plagued with pain. She
  • tried to control her body, which began to quake violently
  • beneath the load of five hundred extra pounds. At best, she
  • was able to stop her knees from buckling. She began to slowly
  • lower the heel of her foot down beyond the level of her toes.
  • Resting at the bottom for no more than a split second, she
  • groaned loudly and exploded upwards, flexing her engorged
  • calves as hard as she could. Back down and up, and again and
  • again. With every rep came the surfacing of a new vein, a
  • new ripple, new growth. Then, one the last rep, she held it
  • at the top, biting down, trying to fight the pain. And when
  • she could not hold it anymore, she collapsed. The iron
  • collided with its cradle as she collided with the floor.
  • There she lay, chest heaving, desperately trying to fill her
  • lungs. This time however, she did not try to control her
  • twitching legs. This time she smiled.
  • </cfsavecontent>
  •  
  • <!---
  • Repeat the above string 40 times. This will
  • generate a string that is 61,161, certainly a
  • hard string to compare???
  • --->
  • <cfset strTextA = RepeatString( strTextA, 40 ) />
  •  
  • <!--- Store a copy of A into B. --->
  • <cfset strTextB = strTextA />
  •  
  •  
  • <!---
  • Compare the two strings. Does this have to compare
  • every character? If so, it would be tens of thousands
  • of characters to compare.
  • --->
  • <cftimer
  • label="EQ Operator"
  • type="outline">
  •  
  • Equals: #(strTextA EQ strTextB)#
  •  
  • </cftimer>

Running the above code, we get:

Equals: YES

... and it runs in 0ms. That's instantaneous! So, what's going on? While I have not been educated in this formally, I believe that this is what the Hash code is for in all Java objects. As the string is constructed, the internal hash code of the string is updated. When one string is compared to the other, it must use this (instantaneous) rather than comparing every character.

To look into this a bit more, let's dump out the hash code before and after:

  • Pre Alteration:<br />
  • #strTextA.HashCode()#<br />
  • #strTextB.HashCode()#<br />
  •  
  • <!---
  • Alter both variables in such a way that the two
  • strings cannot be the same. I am using a RandRange()
  • here to ensure that the speed is NOT due to
  • compilation optimization of a static string.
  • --->
  • <cfset strTextA = (
  • strTextA &
  • ListGetAt( "A,a", RandRange( 1, 2 ) )
  • ) />
  •  
  • <cfset strTextB = (
  • strTextB &
  • ListGetAt( "B,b", RandRange( 1, 2 ) )
  • ) />
  •  
  • Post Alteration:<br />
  • #strTextA.HashCode()#<br />
  • #strTextB.HashCode()#<br />

This gives us the following output:

Pre Alteration:
-917390464
-917390464

Post Alteration:
1625666753
1625666754

As you can see, prior to the final edit, both strings have the same hash code even though they are not the same object. Then, once the strings are slightly altered, the hash codes are slightly different (indicating that the strings do not contain the same value).

But remember, ColdFusion comparison is NOT case sensitive. In ColdFusion "Test" EQ "TEST" EQ "TeSt". In that case, strings that are not technically the same might still have the same ColdFusion equality. Let's see what the hash codes of two different case strings are:

  • #ToString( "ABCDEF" ).HashCode()#<br />
  • #ToString( "AbCdEf" ).HashCode()#<br />

This gives the us the following output:

1923910755
1953494211

Here, these two strings do not have the same hash code and yet:

  • Equals: #("ABCDEF" EQ "AbCdEf")#

... gives us:

Equals: YES

So obviously, there's something more than simple hash code comparisons going on. Also, a hash does not garuntee uniqueness. A hash value only guaruntees that two objects of the same value will have the same hash. So, after all this, I am still not sure how ColdFusion (on top of Java) does string comparison soooo freaking fast. I will just have to settle for the fact that ColdFusion is freakin-sweet!

I am sure this kind of stuff is covered on Day One of Java school, but this kind of stuff is fun for me to explore.




Reader Comments

May 4, 2007 at 12:17 PM // reply »
4 Comments

I don't know if this applies or not, but I've noticed that in the times listed in the debugging information at the bottom of the page, that CF never reports a time between 0 and 10 ms. I assume it's because of the difficulty in measuring a period of time that small, but I don't know for sure. as such, any time I'm doing timed comparisons, I always wrap it in a loop of 1000 or more, depending on the resource intesity of the code. Might be an interesting experiment.


May 4, 2007 at 12:28 PM // reply »
11,238 Comments

Yeah, I get the same thing. I have done what you are saying for other things (putting it in loops):

http://www.bennadel.com/index.cfm?dax=blog:236.view

But this string has sooo many characters that I felt that was not necessary.


NKT
May 12, 2007 at 5:41 PM // reply »
1 Comments

Not to sure about this, but wouldn't comparing two identical long strings be more taxing if the pre-compiler didn't know that they were identical due to the copy immediately before?

I'm no expert on ColdFusion, but I know that many systems look for redundant code like this and simply replace it at runtime. Add a single letter to the end, do the comparison, then remove it, then do the comparion again.

The times should be identical (since the number of characters compared before something was realised), but if they aren't, something is doing some clever look-ahead. For instance, if the string length is held, a fast comparison would be to compare the length register, if there is one, before wasting time doing an actual compare. You'll see that because the time for the second comparison will be far faster than the other two.


Jan 30, 2008 at 5:33 PM // reply »
132 Comments

(stumbled across this on google...)

Java string comparison actually uses a multi-step approach. The algorithm looks something like (in CFScript-ish pseudo-code):

function string_equals( o ) {
// check the type
if( o is not an instance of java.lang.String ) return false;

// check the length
if( o.length() != this.length ) return false;

// intern'ed strings are really the same instance, so do a reference
// equality check.
if( o == this ) return true;

// compare char by char
for( i=0; i < this.length(); i++ ) {
if( this.charAt(i) != o.charAt(i) ) return false;
}

return true;
}

Case-insensitive comparison is only different in the last step where it compares by char, but instead uses java.lang.Character#toUpperCase() or toLowerCase().

The equality check you're probably hitting is the reference check. That is, since both variables are references to the same String object in memory they must be equal, and there's no reason to even look at the chars. So it's O(1).

Comparing a string based on hashCode would actually be at least twice as expensive as the approach above because it wouldn't be able to abort right at the first non-matching char, instead it'd have the walk the entire length of both strings, and then compare the number after. And yeah, like you mentioned you'd still have to check every char because of the collision.

The actual implementation is available in the sun source (which I can't find online) or the GNU version is http://www.docjar.com/html/api/java/lang/String.java.html :)


Jan 30, 2008 at 5:43 PM // reply »
11,238 Comments

@Elliott,

Good explanation. I love when stuff is designed so efficiently :)


Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
May 17, 2013 at 7:42 PM
HashKeyCopier - An AngularJS Utility Class For Merging Cached And Live Data
Ben - thanks so much for posting these Angular articles and findings, they've been a huge help towards learning one of the more 'complex' JavaScript frameworks out there (IMO). I have been using Angu ... read »
May 16, 2013 at 5:01 PM
UPDATE: Parsing CSV Data Files In ColdFusion With csvToArray()
Your code was the closest thing I've found to obtaining some direction for converting ISO fields to values that CF can translate properly. Thank you for posting! ... read »
May 15, 2013 at 10:37 PM
Very Simple Pusher And ColdFusion Powered Chat
hi id making plz easy ... read »
May 15, 2013 at 6:07 PM
Making SOAP Web Service Requests With ColdFusion And CFHTTP
Ben, you once again saved my bacon at work. Thank you, thank you, thank you! ... read »
May 15, 2013 at 4:15 PM
What If All User Interface (UI) Data Came In Reports?
@Josh, Thanks! @Ben, I definitely recommend the David West book "Object Thinking" I've been quoting from. It goes deeply into the philosophy and history of OO programming. His breadth ... read »
May 15, 2013 at 11:36 AM
Ask Ben: Print Part Of A Web Page With jQuery
I found this helpfull when you need to keep (refresh) the original parent page after closing the iframe child print dialog (Hoping you're not using a form at this time so it won't submit again): On ... read »
May 14, 2013 at 7:13 PM
What If All User Interface (UI) Data Came In Reports?
@Jonah, If there's any books you'd recommend on the subject of domain modelling, I'd love to hear it. I just downloaded the free PDF of "Domain Driven Design Quickly". Figured I'd give it ... read »
May 14, 2013 at 6:57 PM
The UX Of Prototyping: Low-Fidelity Is The New High-Fidelity
@Phillip, I'm not sure I follow what you mean? Are you saying that you looked at the list of widgets provided by the jQuery UI and let that be your style guide? ... read »
InVision App - Prototyping Made Beautiful With Prototyping Tools