IMPORTANT UPDATE: XML Parsing Is WAY Faster Than ColdFusion Custom Tags

Posted September 4, 2008 at 3:02 PM by Ben Nadel

Tags: ColdFusion

Earlier today, I posted about how ColdFusion custom tags executed much faster than XML parsing. To make the example more general, I was using XmlSearch() with XPath to get at the XML nodes. I did this because that way, the nature of the XML document could be more variable, just like the nature of ColdFusion custom tags. Tony Petruzzi suggested that removing the XmlSearch() would help a bit. I assumed it would, but at lunch (just now) decided to give it a go.

Here is the updated tag that parses the XML and creates a comma separated values (CSV) file. Notice that rather than using XmlSearch(), I am using the pseudo-array that ColdFusion makes available in XML documents when you refer to XML nodes by tag name:

  • <!--- Check to see which tag mode we are executing. --->
  • <cfswitch expression="#THISTAG.ExecutionMode#">
  •  
  • <cfcase value="Start">
  •  
  • <!--- Set the path to our output file. --->
  • <cfset THISTAG.FilePath = ExpandPath( "xml_data2.csv" ) />
  •  
  • </cfcase>
  •  
  • <cfcase value="End">
  •  
  • <!--- Parse the XML that was generated in this tag. --->
  • <cfset THISTAG.XmlData = XmlParse(
  • Trim( THISTAG.GeneratedContent )
  • ) />
  •  
  • <!---
  • Create a string buffer to hold intermediary data so
  • we don't have to write to the file just yet.
  • --->
  • <cfset THISTAG.Buffer = CreateObject(
  • "java",
  • "java.lang.StringBuffer"
  • ).Init()
  • />
  •  
  •  
  • <!---
  • Loop over rows using the pseudo-array that ColdFusion
  • provides when referencing XML nodes by name.
  • --->
  • <cfloop
  • index="THISTAG.RowIndex"
  • from="1"
  • to="#ArrayLen( THISTAG.XmlData.data.row )#"
  • step="1">
  •  
  • <!--- Get a reference to the current row. --->
  • <cfset THISTAG.XmlRow = THISTAG.XmlData.data.row[ THISTAG.RowIndex ] />
  •  
  • <!---
  • Loop over values using the pseudo-array that
  • ColdFusion provides when referencing XML nodes
  • by name.
  • --->
  • <cfloop
  • index="THISTAG.ValueIndex"
  • from="1"
  • to="#ArrayLen( THISTAG.XmlRow.value )#"
  • step="1">
  •  
  • <!--- Get a reference to the current value. --->
  • <cfset THISTAG.XmlValue = THISTAG.XmlRow.value[ THISTAG.ValueIndex ] />
  •  
  • <!---
  • Add value to string buffer. Add a tab after
  • each value (this will leave a tag at the end
  • of every line, but I am worried about speed,
  • not extra characters).
  • --->
  • <cfset THISTAG.Buffer.Append(
  • JavaCast(
  • "string",
  • (
  • THISTAG.XmlValue.XmlText &
  • Chr( 9 )
  • ))
  • ) />
  •  
  • </cfloop>
  •  
  •  
  • <!--- Now that we added the values, add new line. --->
  • <cfset THISTAG.Buffer.Append(
  • JavaCast( "string", (Chr( 13 ) & Chr( 10 )) )
  • ) />
  •  
  • </cfloop>
  •  
  •  
  • <!---
  • Our string buffer should contain our CSV data. Now,
  • let's write that to the output file.
  • --->
  • <cffile
  • action="write"
  • file="#THISTAG.FilePath#"
  • output="#THISTAG.Buffer.ToString()#"
  • />
  •  
  • <!--- Reset the content. --->
  • <cfset THISTAG.GeneratedContent = "" />
  •  
  • </cfcase>
  •  
  • </cfswitch>

The previous version of this used to run at just over 13 seconds. This new version that uses pseudo-xml-arrays runs in about 800 milliseconds!

When I first saw this result, I just assumed something was going wrong. I renamed the CSV file (xml_data2.csv) and ran it again. But sure enough, it ran in a little of 700 milliseconds and the new file (xml_data2.csv) contained all 1,000 rows of data.

Holy Cow! As it turns out, XML Parsing blows the pants off of ColdFusion custom tags when it comes to performance. Obviously, there is going to be an eventual tradeoff as the XML parsing has to be done in-memory, but for 1000 rows, this was INSANELY fast. Two things:

  1. I am shocked at how slow XmlSearch() is! This is good information to know. It was the XmlSearch() alone that add 13 seconds to the processing time in the previous example.
  2. I am a little surprised at how slow ColdFusion custom tags seem to be, comparatively. Over 5 seconds to do what XML parsing did in milliseconds? That's kind of whack.

So any way, sorry for misleading people in my last post. This makes me want to try an experiment where I recode my POI stuff using XML parsing rather than Custom Tags. I wonder if that would make it wicked fast.


 
 
 

 
XML Parsing Is Much Faster Than The Equivalent ColdFusion Custom Tags  
 
 
 



Reader Comments

Sep 4, 2008 at 3:23 PM // reply »
319 Comments

I'm doing some testing on this.

Did you notice that you cleared generatedContent in all 3 layers of your custom tags? That isn't necessary. Only the parent needs to do this. When I removed those lines from your two child tags, the processing time dropped dramatically. Not to < 1 second, but to about 2.2 or 2.5 seconds. About twice as quick.

p.s. An off topic recommendation. To test your code, I had to rename a bunch of "snippet_N.txt" files. This was confusing. In the future, could you provide a zip with the files named right? Also - your code all had headers with <---. No !. This made them show up in the output.


Sep 4, 2008 at 4:11 PM // reply »
2 Comments

Good post - that graphic is kind of ... NSFW, though. I hope nobody saw it over my shoulder as have a big CRT!


Sep 4, 2008 at 4:13 PM // reply »
5 Comments

To echo some comments from the previous post, I'd recommend looking into XSLT as an additional option for this exercise. To touch on Ray's earlier point, for more complex parsing, while XSLT might be better suited to the task, some people may be more comfortable working solely in CF, making it preferable if the speeds are comparable.

If you put together an XSL file, really all you'll need to do in CF is use the XmlTransform function to get your CSV file.

I'd recommend w3schools.com for a quick intro to XSLT and XPath ... I used that site to learn enough about XSL to move data from Oracle to text files or Word docs. Unfortunately it was in Java (as was my related POI experience) and at a previous employer, so I have no code readily available to post, but I'm sure there are others who can post some good XSL files if you wanted more examples.


Sep 4, 2008 at 4:56 PM // reply »
44 Comments

@Ben,

Awesome man, just awesome. I had a feeling it would be faster seeing how XPATH lookups are extremely slow in any language.

It's going to blow your mind how fast your POI utility is when you rewrite it.


Sep 4, 2008 at 6:45 PM // reply »
11,241 Comments

@Ray,

Hmmm, when I remove the generated content clearing lines, I am not seeing any increase in speed. Of course, I wouldn't go so far as to say my DEV service is a powerful box :) If you go back and add IN the lines again, does it slow down?

Also, yeah, the code downloading is a bit hacky on the site, I'll admit it. It actually builds the code downloads based on the code in the actual blog post (I am not uploading any separate download file). Therefore, the Snippet.txt files can't have any meaningful name or ordering - they are in the same order as the code in the post. I'll put my thinking cap on to see if I can come up with anything better.

@Bobbie,

She's actually fully clothed and wearing a tub-top, you just can't see... shame shame, where is your mind ;)

@Dave,

Yeah, XSLT is cool. I have some limitted experience with it, but from what I have seen it is cool. I tried to write a tutorial for my former company, if anyone is interested:

http://www.bennadel.com/index.cfm?dax=blog:952.view

@Tony,

This is good news, but not sure how I want to apply it just yet. The POI system I use doesn't use XML yet, so I am not worried about the XPath performance. However, it does heavily use ColdFusion custom tags; if I take those out, I might see some good performance. We'll see what I try.


Sep 4, 2008 at 9:07 PM // reply »
319 Comments

When I ran your code as is, it actually took like 12-13 seconds on my machine, which I thought was rather beefy, but I was doing quite a bit at that time. But for me, the change was even more dramatic (down 10 seconds).


Sep 5, 2008 at 8:33 AM // reply »
11,241 Comments

@Ray,

That's a pretty big difference in processing time! I wonder what it could be doing? I assume it just resetting some internal buffer for each tag. What version of CF are you running? 8 I assume (me too).


Sep 5, 2008 at 9:41 AM // reply »
319 Comments

Ye, 8.0.1.


Sep 5, 2008 at 10:16 AM // reply »
11,241 Comments

Hmmmm. Not sure why it would be so different.


Sep 5, 2008 at 10:58 AM // reply »
2 Comments

Thanks for the reply, I am really learning a lot from this site!


Sep 5, 2008 at 12:32 PM // reply »
132 Comments

@Ben

You should try using arrayNew(1) and arrayAppend and finally arrayToList() instead of that StringBuffer.

People seem to think that StringBuffer is the "right way" to build up strings, but using an array and arrayToList(buffer,"") is actually faster!

I see about a 30% performance difference for large buffers.


Sep 5, 2008 at 12:59 PM // reply »
319 Comments

Elliott, what you say makes sense to me, but I'm not seeing any speed increases with the default 1k row query Ben's data uses. Did you dramatically increase the size?


Sep 5, 2008 at 1:39 PM // reply »
11,241 Comments

@Elliott,

You make a good point - I (and maybe others) do have a bit of a love affair with the String Buffer. I guess we have been made so afraid of string concatenation that its just fear-based decisions.

However, at the end of the day, both examples use string buffer, so the comparison between XML and ColdFusion custom tags is still valid (I believe).


Sep 5, 2008 at 4:32 PM // reply »
3 Comments

@Ray and Ben,

In reference to: "Hmmmm. Not sure why it would be so different."

Could it be the environment (Mac vs. Win)?


Sep 5, 2008 at 4:35 PM // reply »
11,241 Comments

@Bash,

I am on Windows Server.


Sep 5, 2008 at 5:15 PM // reply »
132 Comments

@Ray

Yes, the really noticeable difference is in big sets.

<cfset buffer = arrayNew(1)>
<cfloop from="1" to="2000" index="i">
<cfset arrayAppend(buffer,repeatString("abc",500))>
</cfloop>
<cfset buffer = arrayToList(buffer,"")>

That beats the StringBuilder by 30% on my machine. If I bump it up to 8000 instead I see a difference more like 50-100% faster in some cases.

If you look at smaller, like 1000, iterations, then I see stuff like 15-22ms for the Buffer and 7-10ms for the array.

Even if you don't see noticeable differences on your machine for small cases, why use Java objects when CF provides you with a native solution anyway? :)

You also get the benefit of this code working on BD.NET, if that matters to you.

I think the really important thing here though is that coding hoops into your apps to use StringBuilder/StringBuffer is silly. For instance Fusebox uses a StringBuffer and a FakeStringBuffer.cfc to "work around" the fact that not all systems have it, which is silly, since they could have just used an array! :P



Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
May 22, 2013 at 5:35 PM
Script Tags, jQuery, And Html(), Text() And Contents()
This is still an issue 2 years later. jQuery is supposed to remediate these cross browser issues, no? I have been unable to find any statement from the jQuery team calling this behavior "by de ... read »
May 22, 2013 at 12:44 PM
Ask Ben: Query Loop Inside CFScript Tags
In cf10, if you call a function that has: local.result = {}; local.result.msg = ""; local.svc = new query(); local.svc.setSQL("SELECT * FROM..."); local.obj = local.svc.exe ... read »
May 22, 2013 at 12:29 PM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@Ben: What version of Java are you using? Also, did you test users.id to see what Java reports as the data type? I wonder if it's not a Java primitive data type, but getting returned as something ... read »
May 22, 2013 at 11:47 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@Dana, Awesome - so it looks like this bug was fixed in ColdFusion 10. Thanks so much for double-checking that. ... read »
May 22, 2013 at 11:37 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
When I c&p and run on cf10, I get: Selected User IDs: 1,4 User 1 selected: YES - YES User 2 selected: NO - NO User 3 selected: NO - NO User 4 selected: YES - YES User 5 selected: NO - ... read »
May 22, 2013 at 11:27 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@Tom, Good thought, but no dice. Both of these still exhibit the same behavior: users.id[ users.currentRow ] users[ "id" ][ users.currentRow ] It's just something whacky happening with ... read »
May 22, 2013 at 11:07 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
Could your problem be that "users.id" is actually an ARRAY, not a single value? Perhaps try it again with "users.id[1]" (I only have CF8 here at work). ... read »
May 22, 2013 at 7:52 AM
Nested Views, Routing, And Deep Linking With AngularJS
Hi, Just a quick thank you. As it happens, for my own purposes, the pending ui-router work being done in native angular is likely the one I'll adopt, but your exploration, code and documentation of ... read »
InVision App - Prototyping Made Beautiful With Prototyping Tools