IMPORTANT UPDATE: XML Parsing Is WAY Faster Than ColdFusion Custom Tags

Posted September 4, 2008 at 3:02 PM

Tags: ColdFusion

Earlier today, I posted about how ColdFusion custom tags executed much faster than XML parsing. To make the example more general, I was using XmlSearch() with XPath to get at the XML nodes. I did this because that way, the nature of the XML document could be more variable, just like the nature of ColdFusion custom tags. Tony Petruzzi suggested that removing the XmlSearch() would help a bit. I assumed it would, but at lunch (just now) decided to give it a go.

Here is the updated tag that parses the XML and creates a comma separated values (CSV) file. Notice that rather than using XmlSearch(), I am using the pseudo-array that ColdFusion makes available in XML documents when you refer to XML nodes by tag name:

 Launch code in new window » Download code as text file »

  • <!--- Check to see which tag mode we are executing. --->
  • <cfswitch expression="#THISTAG.ExecutionMode#">
  •  
  • <cfcase value="Start">
  •  
  • <!--- Set the path to our output file. --->
  • <cfset THISTAG.FilePath = ExpandPath( "xml_data2.csv" ) />
  •  
  • </cfcase>
  •  
  • <cfcase value="End">
  •  
  • <!--- Parse the XML that was generated in this tag. --->
  • <cfset THISTAG.XmlData = XmlParse(
  • Trim( THISTAG.GeneratedContent )
  • ) />
  •  
  • <!---
  • Create a string buffer to hold intermediary data so
  • we don't have to write to the file just yet.
  • --->
  • <cfset THISTAG.Buffer = CreateObject(
  • "java",
  • "java.lang.StringBuffer"
  • ).Init()
  • />
  •  
  •  
  • <!---
  • Loop over rows using the pseudo-array that ColdFusion
  • provides when referencing XML nodes by name.
  • --->
  • <cfloop
  • index="THISTAG.RowIndex"
  • from="1"
  • to="#ArrayLen( THISTAG.XmlData.data.row )#"
  • step="1">
  •  
  • <!--- Get a reference to the current row. --->
  • <cfset THISTAG.XmlRow = THISTAG.XmlData.data.row[ THISTAG.RowIndex ] />
  •  
  • <!---
  • Loop over values using the pseudo-array that
  • ColdFusion provides when referencing XML nodes
  • by name.
  • --->
  • <cfloop
  • index="THISTAG.ValueIndex"
  • from="1"
  • to="#ArrayLen( THISTAG.XmlRow.value )#"
  • step="1">
  •  
  • <!--- Get a reference to the current value. --->
  • <cfset THISTAG.XmlValue = THISTAG.XmlRow.value[ THISTAG.ValueIndex ] />
  •  
  • <!---
  • Add value to string buffer. Add a tab after
  • each value (this will leave a tag at the end
  • of every line, but I am worried about speed,
  • not extra characters).
  • --->
  • <cfset THISTAG.Buffer.Append(
  • JavaCast(
  • "string",
  • (
  • THISTAG.XmlValue.XmlText &
  • Chr( 9 )
  • ))
  • ) />
  •  
  • </cfloop>
  •  
  •  
  • <!--- Now that we added the values, add new line. --->
  • <cfset THISTAG.Buffer.Append(
  • JavaCast( "string", (Chr( 13 ) & Chr( 10 )) )
  • ) />
  •  
  • </cfloop>
  •  
  •  
  • <!---
  • Our string buffer should contain our CSV data. Now,
  • let's write that to the output file.
  • --->
  • <cffile
  • action="write"
  • file="#THISTAG.FilePath#"
  • output="#THISTAG.Buffer.ToString()#"
  • />
  •  
  • <!--- Reset the content. --->
  • <cfset THISTAG.GeneratedContent = "" />
  •  
  • </cfcase>
  •  
  • </cfswitch>

The previous version of this used to run at just over 13 seconds. This new version that uses pseudo-xml-arrays runs in about 800 milliseconds!

When I first saw this result, I just assumed something was going wrong. I renamed the CSV file (xml_data2.csv) and ran it again. But sure enough, it ran in a little of 700 milliseconds and the new file (xml_data2.csv) contained all 1,000 rows of data.

Holy Cow! As it turns out, XML Parsing blows the pants off of ColdFusion custom tags when it comes to performance. Obviously, there is going to be an eventual tradeoff as the XML parsing has to be done in-memory, but for 1000 rows, this was INSANELY fast. Two things:

  1. I am shocked at how slow XmlSearch() is! This is good information to know. It was the XmlSearch() alone that add 13 seconds to the processing time in the previous example.
  2. I am a little surprised at how slow ColdFusion custom tags seem to be, comparatively. Over 5 seconds to do what XML parsing did in milliseconds? That's kind of whack.

So any way, sorry for misleading people in my last post. This makes me want to try an experiment where I recode my POI stuff using XML parsing rather than Custom Tags. I wonder if that would make it wicked fast.


 
 
 

 
XML Parsing Is Much Faster Than The Equivalent ColdFusion Custom Tags  
 
 
 

Download Code Snippet ZIP File

Post Comment  |  Ask Ben  |  Permalink  |  Other Searches  |  Print Page




Learning ColdFusion 9 - ColdFusion 9 tutorials, samples, examples, demos

Reader Comments

Sep 4, 2008 at 3:23 PM // reply »
207 Comments

I'm doing some testing on this.

Did you notice that you cleared generatedContent in all 3 layers of your custom tags? That isn't necessary. Only the parent needs to do this. When I removed those lines from your two child tags, the processing time dropped dramatically. Not to < 1 second, but to about 2.2 or 2.5 seconds. About twice as quick.

p.s. An off topic recommendation. To test your code, I had to rename a bunch of "snippet_N.txt" files. This was confusing. In the future, could you provide a zip with the files named right? Also - your code all had headers with <---. No !. This made them show up in the output.


Sep 4, 2008 at 4:11 PM // reply »
2 Comments

Good post - that graphic is kind of ... NSFW, though. I hope nobody saw it over my shoulder as have a big CRT!


Sep 4, 2008 at 4:13 PM // reply »
5 Comments

To echo some comments from the previous post, I'd recommend looking into XSLT as an additional option for this exercise. To touch on Ray's earlier point, for more complex parsing, while XSLT might be better suited to the task, some people may be more comfortable working solely in CF, making it preferable if the speeds are comparable.

If you put together an XSL file, really all you'll need to do in CF is use the XmlTransform function to get your CSV file.

I'd recommend w3schools.com for a quick intro to XSLT and XPath ... I used that site to learn enough about XSL to move data from Oracle to text files or Word docs. Unfortunately it was in Java (as was my related POI experience) and at a previous employer, so I have no code readily available to post, but I'm sure there are others who can post some good XSL files if you wanted more examples.


Sep 4, 2008 at 4:56 PM // reply »
40 Comments

@Ben,

Awesome man, just awesome. I had a feeling it would be faster seeing how XPATH lookups are extremely slow in any language.

It's going to blow your mind how fast your POI utility is when you rewrite it.


Sep 4, 2008 at 6:45 PM // reply »
6,516 Comments

@Ray,

Hmmm, when I remove the generated content clearing lines, I am not seeing any increase in speed. Of course, I wouldn't go so far as to say my DEV service is a powerful box :) If you go back and add IN the lines again, does it slow down?

Also, yeah, the code downloading is a bit hacky on the site, I'll admit it. It actually builds the code downloads based on the code in the actual blog post (I am not uploading any separate download file). Therefore, the Snippet.txt files can't have any meaningful name or ordering - they are in the same order as the code in the post. I'll put my thinking cap on to see if I can come up with anything better.

@Bobbie,

She's actually fully clothed and wearing a tub-top, you just can't see... shame shame, where is your mind ;)

@Dave,

Yeah, XSLT is cool. I have some limitted experience with it, but from what I have seen it is cool. I tried to write a tutorial for my former company, if anyone is interested:

http://www.bennadel.com/index.cfm?dax=blog:952.view

@Tony,

This is good news, but not sure how I want to apply it just yet. The POI system I use doesn't use XML yet, so I am not worried about the XPath performance. However, it does heavily use ColdFusion custom tags; if I take those out, I might see some good performance. We'll see what I try.


Sep 4, 2008 at 9:07 PM // reply »
207 Comments

When I ran your code as is, it actually took like 12-13 seconds on my machine, which I thought was rather beefy, but I was doing quite a bit at that time. But for me, the change was even more dramatic (down 10 seconds).


Sep 5, 2008 at 8:33 AM // reply »
6,516 Comments

@Ray,

That's a pretty big difference in processing time! I wonder what it could be doing? I assume it just resetting some internal buffer for each tag. What version of CF are you running? 8 I assume (me too).


Sep 5, 2008 at 9:41 AM // reply »
207 Comments

Ye, 8.0.1.


Sep 5, 2008 at 10:16 AM // reply »
6,516 Comments

Hmmmm. Not sure why it would be so different.


Sep 5, 2008 at 10:58 AM // reply »
2 Comments

Thanks for the reply, I am really learning a lot from this site!


Sep 5, 2008 at 12:32 PM // reply »
125 Comments

@Ben

You should try using arrayNew(1) and arrayAppend and finally arrayToList() instead of that StringBuffer.

People seem to think that StringBuffer is the "right way" to build up strings, but using an array and arrayToList(buffer,"") is actually faster!

I see about a 30% performance difference for large buffers.


Sep 5, 2008 at 12:59 PM // reply »
207 Comments

Elliott, what you say makes sense to me, but I'm not seeing any speed increases with the default 1k row query Ben's data uses. Did you dramatically increase the size?


Sep 5, 2008 at 1:39 PM // reply »
6,516 Comments

@Elliott,

You make a good point - I (and maybe others) do have a bit of a love affair with the String Buffer. I guess we have been made so afraid of string concatenation that its just fear-based decisions.

However, at the end of the day, both examples use string buffer, so the comparison between XML and ColdFusion custom tags is still valid (I believe).


Sep 5, 2008 at 4:32 PM // reply »
3 Comments

@Ray and Ben,

In reference to: "Hmmmm. Not sure why it would be so different."

Could it be the environment (Mac vs. Win)?


Sep 5, 2008 at 4:35 PM // reply »
6,516 Comments

@Bash,

I am on Windows Server.


Sep 5, 2008 at 5:15 PM // reply »
125 Comments

@Ray

Yes, the really noticeable difference is in big sets.

<cfset buffer = arrayNew(1)>
<cfloop from="1" to="2000" index="i">
<cfset arrayAppend(buffer,repeatString("abc",500))>
</cfloop>
<cfset buffer = arrayToList(buffer,"")>

That beats the StringBuilder by 30% on my machine. If I bump it up to 8000 instead I see a difference more like 50-100% faster in some cases.

If you look at smaller, like 1000, iterations, then I see stuff like 15-22ms for the Buffer and 7-10ms for the array.

Even if you don't see noticeable differences on your machine for small cases, why use Java objects when CF provides you with a native solution anyway? :)

You also get the benefit of this code working on BD.NET, if that matters to you.

I think the really important thing here though is that coding hoops into your apps to use StringBuilder/StringBuffer is silly. For instance Fusebox uses a StringBuffer and a FakeStringBuffer.cfc to "work around" the fact that not all systems have it, which is silly, since they could have just used an array! :P


Post Comment  |  Ask Ben

Recent Blog Comments
aha
Nov 22, 2009 at 7:42 AM
Using A Name Suffix In ColdFusion's CFMail Tag
Why not? ... read »
Nov 22, 2009 at 7:37 AM
Using A Name Suffix In ColdFusion's CFMail Tag
asd ... read »
Nov 22, 2009 at 4:30 AM
jQuery Live() Method And Event Bubbling
dasegtezr ... read »
Nov 22, 2009 at 4:03 AM
jQuery Live() Method And Event Bubbling
C_fieri ... read »
Nov 22, 2009 at 1:56 AM
Learning ColdFusion 9: Using CFQuery In CFScript Can Enable SQL Injection Attacks
Why adobe would give you script equivalent of cfquery is beyond me. I love cfquery tag because it helps me wriite clean sql, and get away from the horrible jdbc queries If I wanted to write javali ... read »
Nov 22, 2009 at 1:45 AM
Streaming Text Using ColdFusion's CFContent Tag And The Variable Attribute
The reason you would want to do this is to stream. Ack json/xml files to ria clients I used thus technique before because putting json in response stream causes debugging info to come thru As well a ... read »
Nov 21, 2009 at 6:47 PM
Hal Helms - Real World Object Oriented Development, Sarasota - Day Five
@charlie griefer, Thank you.. ... read »
Nov 21, 2009 at 5:15 PM
Using ColdFusion Structures To Remove Duplicate List Values
@Jose Galdamez, Oh heh yeah I didn't paste the whole code. I should have defined the vars -- my bad. It's fixed thou. Thanks. ... read »