IMPORTANT UPDATE: XML Parsing Is WAY Faster Than ColdFusion Custom Tags
Posted September 4, 2008 at 3:02 PM by Ben Nadel
Earlier today, I posted about how ColdFusion custom tags executed much faster than XML parsing. To make the example more general, I was using XmlSearch() with XPath to get at the XML nodes. I did this because that way, the nature of the XML document could be more variable, just like the nature of ColdFusion custom tags. Tony Petruzzi suggested that removing the XmlSearch() would help a bit. I assumed it would, but at lunch (just now) decided to give it a go.
Here is the updated tag that parses the XML and creates a comma separated values (CSV) file. Notice that rather than using XmlSearch(), I am using the pseudo-array that ColdFusion makes available in XML documents when you refer to XML nodes by tag name:
- <!--- Check to see which tag mode we are executing. --->
- <cfswitch expression="#THISTAG.ExecutionMode#">
- <cfcase value="Start">
- <!--- Set the path to our output file. --->
- <cfset THISTAG.FilePath = ExpandPath( "xml_data2.csv" ) />
- <cfcase value="End">
- <!--- Parse the XML that was generated in this tag. --->
- <cfset THISTAG.XmlData = XmlParse(
- Trim( THISTAG.GeneratedContent )
- ) />
- Create a string buffer to hold intermediary data so
- we don't have to write to the file just yet.
- <cfset THISTAG.Buffer = CreateObject(
- Loop over rows using the pseudo-array that ColdFusion
- provides when referencing XML nodes by name.
- to="#ArrayLen( THISTAG.XmlData.data.row )#"
- <!--- Get a reference to the current row. --->
- <cfset THISTAG.XmlRow = THISTAG.XmlData.data.row[ THISTAG.RowIndex ] />
- Loop over values using the pseudo-array that
- ColdFusion provides when referencing XML nodes
- by name.
- to="#ArrayLen( THISTAG.XmlRow.value )#"
- <!--- Get a reference to the current value. --->
- <cfset THISTAG.XmlValue = THISTAG.XmlRow.value[ THISTAG.ValueIndex ] />
- Add value to string buffer. Add a tab after
- each value (this will leave a tag at the end
- of every line, but I am worried about speed,
- not extra characters).
- <cfset THISTAG.Buffer.Append(
- THISTAG.XmlValue.XmlText &
- Chr( 9 )
- ) />
- <!--- Now that we added the values, add new line. --->
- <cfset THISTAG.Buffer.Append(
- JavaCast( "string", (Chr( 13 ) & Chr( 10 )) )
- ) />
- Our string buffer should contain our CSV data. Now,
- let's write that to the output file.
- <!--- Reset the content. --->
- <cfset THISTAG.GeneratedContent = "" />
The previous version of this used to run at just over 13 seconds. This new version that uses pseudo-xml-arrays runs in about 800 milliseconds!
When I first saw this result, I just assumed something was going wrong. I renamed the CSV file (xml_data2.csv) and ran it again. But sure enough, it ran in a little of 700 milliseconds and the new file (xml_data2.csv) contained all 1,000 rows of data.
Holy Cow! As it turns out, XML Parsing blows the pants off of ColdFusion custom tags when it comes to performance. Obviously, there is going to be an eventual tradeoff as the XML parsing has to be done in-memory, but for 1000 rows, this was INSANELY fast. Two things:
- I am shocked at how slow XmlSearch() is! This is good information to know. It was the XmlSearch() alone that add 13 seconds to the processing time in the previous example.
- I am a little surprised at how slow ColdFusion custom tags seem to be, comparatively. Over 5 seconds to do what XML parsing did in milliseconds? That's kind of whack.
So any way, sorry for misleading people in my last post. This makes me want to try an experiment where I recode my POI stuff using XML parsing rather than Custom Tags. I wonder if that would make it wicked fast.
| || || |
| || |
| || || |
What Other People Are Searching For
I'm doing some testing on this.
Did you notice that you cleared generatedContent in all 3 layers of your custom tags? That isn't necessary. Only the parent needs to do this. When I removed those lines from your two child tags, the processing time dropped dramatically. Not to < 1 second, but to about 2.2 or 2.5 seconds. About twice as quick.
p.s. An off topic recommendation. To test your code, I had to rename a bunch of "snippet_N.txt" files. This was confusing. In the future, could you provide a zip with the files named right? Also - your code all had headers with <---. No !. This made them show up in the output.
Good post - that graphic is kind of ... NSFW, though. I hope nobody saw it over my shoulder as have a big CRT!
To echo some comments from the previous post, I'd recommend looking into XSLT as an additional option for this exercise. To touch on Ray's earlier point, for more complex parsing, while XSLT might be better suited to the task, some people may be more comfortable working solely in CF, making it preferable if the speeds are comparable.
If you put together an XSL file, really all you'll need to do in CF is use the XmlTransform function to get your CSV file.
I'd recommend w3schools.com for a quick intro to XSLT and XPath ... I used that site to learn enough about XSL to move data from Oracle to text files or Word docs. Unfortunately it was in Java (as was my related POI experience) and at a previous employer, so I have no code readily available to post, but I'm sure there are others who can post some good XSL files if you wanted more examples.
Awesome man, just awesome. I had a feeling it would be faster seeing how XPATH lookups are extremely slow in any language.
It's going to blow your mind how fast your POI utility is when you rewrite it.
Hmmm, when I remove the generated content clearing lines, I am not seeing any increase in speed. Of course, I wouldn't go so far as to say my DEV service is a powerful box :) If you go back and add IN the lines again, does it slow down?
Also, yeah, the code downloading is a bit hacky on the site, I'll admit it. It actually builds the code downloads based on the code in the actual blog post (I am not uploading any separate download file). Therefore, the Snippet.txt files can't have any meaningful name or ordering - they are in the same order as the code in the post. I'll put my thinking cap on to see if I can come up with anything better.
She's actually fully clothed and wearing a tub-top, you just can't see... shame shame, where is your mind ;)
Yeah, XSLT is cool. I have some limitted experience with it, but from what I have seen it is cool. I tried to write a tutorial for my former company, if anyone is interested:
This is good news, but not sure how I want to apply it just yet. The POI system I use doesn't use XML yet, so I am not worried about the XPath performance. However, it does heavily use ColdFusion custom tags; if I take those out, I might see some good performance. We'll see what I try.
When I ran your code as is, it actually took like 12-13 seconds on my machine, which I thought was rather beefy, but I was doing quite a bit at that time. But for me, the change was even more dramatic (down 10 seconds).
That's a pretty big difference in processing time! I wonder what it could be doing? I assume it just resetting some internal buffer for each tag. What version of CF are you running? 8 I assume (me too).
Hmmmm. Not sure why it would be so different.
Thanks for the reply, I am really learning a lot from this site!
You should try using arrayNew(1) and arrayAppend and finally arrayToList() instead of that StringBuffer.
People seem to think that StringBuffer is the "right way" to build up strings, but using an array and arrayToList(buffer,"") is actually faster!
I see about a 30% performance difference for large buffers.
Elliott, what you say makes sense to me, but I'm not seeing any speed increases with the default 1k row query Ben's data uses. Did you dramatically increase the size?
You make a good point - I (and maybe others) do have a bit of a love affair with the String Buffer. I guess we have been made so afraid of string concatenation that its just fear-based decisions.
However, at the end of the day, both examples use string buffer, so the comparison between XML and ColdFusion custom tags is still valid (I believe).
@Ray and Ben,
In reference to: "Hmmmm. Not sure why it would be so different."
Could it be the environment (Mac vs. Win)?
I am on Windows Server.
Yes, the really noticeable difference is in big sets.
<cfset buffer = arrayNew(1)>
<cfloop from="1" to="2000" index="i">
<cfset buffer = arrayToList(buffer,"")>
That beats the StringBuilder by 30% on my machine. If I bump it up to 8000 instead I see a difference more like 50-100% faster in some cases.
If you look at smaller, like 1000, iterations, then I see stuff like 15-22ms for the Buffer and 7-10ms for the array.
Even if you don't see noticeable differences on your machine for small cases, why use Java objects when CF provides you with a native solution anyway? :)
You also get the benefit of this code working on BD.NET, if that matters to you.
I think the really important thing here though is that coding hoops into your apps to use StringBuilder/StringBuffer is silly. For instance Fusebox uses a StringBuffer and a FakeStringBuffer.cfc to "work around" the fact that not all systems have it, which is silly, since they could have just used an array! :P