Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at CFUNITED 2010 (Landsdown, VA) with:

XML Building / Parsing / Traversing Speed In ColdFusion

By Ben Nadel on
Tags: ColdFusion

After my little discovery yesterday about the relative speed of XML to ColdFusion custom tags in regards to structured data collection, I decided to do a little more investigation into XML performance in ColdFusion 8. In the following demonstration, I am testing three different aspects of ColdFusion:

  1. Building an XML string using CFSaveContent. I decided against testing CFXML because I think it is less flexible in that having an XML string first allows us to take more actions than just creating an XML document. Plus, I think it is probably just using a data buffer and then parsing the resulting string afterwards (just a guess).
  2. Parsing an XML string using XmlParse(). Basically, taking the above string and parsing it into a ColdFusion XML document.
  3. Traversing an XML document. Taking the resultant ColdFusion XML document from above and walking over each node, getting the value of the leaf-nodes.

The following code does all three of these in a row, each test building on the results of the previous one. I tried with a variety of data set sizes, which I will review afterwards:

  • <!--- Increase page running time. --->
  • <cfsetting requesttimeout="200" />
  •  
  • <!--- Create a blank ColdFusion query object. --->
  • <cfset qData = QueryNew( "" ) />
  •  
  • <!---
  • Create an array of values. We are going to use this array
  • to populate N columns in the query object. Sure, they values
  • will all be uniform, but this is the easiest method.
  • --->
  • <cfset arrColumnData = ListToArray(
  • RepeatString( "v#RandRange( 1111, 9999 )#,", 10000 )
  • ) />
  •  
  • <!--- Add columns to this query. --->
  • <cfloop
  • index="intColumn"
  • from="1"
  • to="50"
  • step="1">
  •  
  • <!---
  • Add the data array, from above, as the default data
  • for this new column (this will add a row for each
  • array index).
  • --->
  • <cfset QueryAddColumn(
  • qData,
  • "column#intColumn#",
  • "cf_sql_varchar",
  • arrColumnData
  • ) />
  •  
  • </cfloop>
  •  
  •  
  • <!---
  • Get the column list as an array. This will allow us to
  • loop over it faster which will cut down on XML creation
  • time amoratized over all the rows.
  • --->
  • <cfset arrColumns = ListToArray( qData.ColumnList ) />
  •  
  •  
  • <!--- Keep track of how long it takes to build XML string. --->
  • <cftimer
  • label="Building XML String"
  • type="outline">
  •  
  • <!--- Create an XML string for the given query. --->
  • <cfsavecontent variable="strQueryAsXML">
  • <cfoutput>
  •  
  • <query>
  • <!--- Create a row for each record. --->
  • <cfloop query="qData">
  •  
  • <row index="#qData.CurrentRow#">
  • <!--- Loop over the columns. --->
  • <cfloop
  • index="strColumnName"
  • array="#arrColumns#">
  • <column name="#strColumnName#">#qData[ strColumnName ][ qData.CurrentRow ]#</column>
  • </cfloop>
  • </row>
  •  
  • </cfloop>
  • </query>
  •  
  • </cfoutput>
  • </cfsavecontent>
  •  
  • <p>
  • Done building XML string.
  • </p>
  •  
  • </cftimer>
  •  
  •  
  •  
  • <!--- Keep track of how long it takes to parse XML. --->
  • <cftimer
  • label="Parsing XML String"
  • type="outline">
  •  
  • <!--- Parse into a ColdFusion XML document. --->
  • <cfset xmlQuery = XmlParse( strQueryAsXML ) />
  •  
  • <p>
  • Done parsing XML string.
  • </p>
  •  
  • </cftimer>
  •  
  •  
  •  
  • <!---
  • Keep track of how long it takes to traverse the XML
  • document (assuming that it is the above query format).
  • --->
  • <cftimer
  • label="Traversing XML Document"
  • type="outline">
  •  
  • <!---
  • Kill any output that is caused from the xml
  • document traversal. Killing the white space has a HUGE
  • impact on performance because it ignores any buffering
  • updates (I assume).
  • --->
  • <cfsilent>
  •  
  • <cfloop
  • index="intRow"
  • from="1"
  • to="#ArrayLen( xmlQuery.query.XmlChildren )#"
  • step="1">
  •  
  • <!--- Get a pointer to the current row. --->
  • <cfset xmlRow = xmlQuery.query.XmlChildren[ intRow ] />
  •  
  • <!--- Loop over each column. --->
  • <cfloop
  • index="intChild"
  • from="1"
  • to="#ArrayLen( xmlRow.XmlChildren )#"
  • step="1">
  •  
  • <!--- Get a pointer to current child. --->
  • <cfset xmlChild = xmlRow.XmlChildren[ intChild ] />
  •  
  • <!--- Get node value. --->
  • <cfset strValue = xmlChild.XmlText />
  •  
  • <!---
  • We are not going to output any value at this
  • point since that will only slow things down
  • unnecessarily.
  • --->
  •  
  • </cfloop>
  •  
  • </cfloop>
  •  
  • </cfsilent>
  •  
  • <p>
  • Done traversing XML document.
  • </p>
  •  
  • </cftimer>

As you can see, the test is running off of manually created ColdFusion query object. While the code in the demo create a query with a set height (row count) and width (column count), I ran is many times with different dimensions. Here are the results that I saw:

1,000 Rows x 50 Cells (50,000 Values)

Building XML String: 289.25ms on average.
Parsing XML String: 836ms on average.
Traversing XML Document: 519.5ms on average.

5,000 Rows x 50 Cells (250,000 Values)

Building XML String: 1,082.88ms on average.
Parsing XML String: 4,465ms on average.
Traversing XML Document: 2,617.25ms on average.

10,000 Rows x 50 Cells (500,000 Values)

Building XML String: 2,835.75ms on average.
Parsing XML String: 9,000ms on average.
Traversing XML Document: 7,668.25ms on average.

20,000 Rows x 50 Cells (1,000,000 Values)

Java heap space null

60,000 Rows x 10 Cells (600,000 Values)

Java heap space null

60,000 Rows x 5 Cells (300,000 Values)

Building XML String: 6,375ms on average.
Parsing XML String: 7,179.75ms on average.
Traversing XML Document: 173,242.25ms on average.

There's a couple of things to make note of here. For starters, ColdFusion 8 can parse XML really fast. A 50,000 leaf-node XML tree parses in under a second and can be fully traversed in just over half a second. That's pretty awesome!

Of course, XML parsing does have its limits - as you can see, once we get over like 500,000 records, ColdFusion simply does not have enough memory to do the XML Parsing (and yes, it is the XmlParse() line that cause the heap space error).

Here's the really interesting thing, though - the depth and breadth of the XML tree each has a different impact on traversal performance. If you look at our third test with 10,000 rows and 500,000 leaf nodes, the tree can be fully traversed in less than 8 seconds. However, if we have 60,000 rows with only 300,000 leaf-nodes (200,000 less that the previous example), it takes 173 seconds to traverse! So, it looks like the 60,000 rows has a greater impact that the 50 columns amortized over the rows.

And finally, while you can't see this in the demo because as it was too slow to use, Named pseudo-arrays that ColdFusion allows when dealing with XML documents is extremely slow! Using the actual XmlChildren arrays was orders of magnitude faster. Moral of the story - use the XmlChildren array.

Ok, I'm exhausted so that's all I'm gonna review for the moment. Have a great weekend.




Reader Comments

Interesting post.

One thing I've observed when building large XML documents, is that using CF's XML functions like xmlElemNew() and its ilk were much less mean on memory than creating the thing using CFXML (which is just CFSAVECONTENT with an implicit xmlParse() added in, as far as I can tell).

I suppose it's obvious (?) that CFXML requires the whole string to be complete before converting it to XML, compared to the function-based approach that construct the document ad-hoc.

Another thing that could be worth measuring is the performance of accessing an XML doc via xpath, rather than "brute force" CF constructs.

--
Adam

Reply to this Comment

CF XML processing is slow and memory intensive because it uses a DOM processor (Xerces), which has to represent the entire document in memory to work with it. My preferred alternative is XOM (http://www.xom.nu/), which provides a very nice API atop a SAX processor instead, which is considerably faster and uses less memory to perform the same operations, such as building, parsing, traversing and querying XML documents.

Reply to this Comment

@Adam,

I'll have to try that out, although I think it would make sense that building the DOM manually would be faster; as you have pointed out, CFXML / XmlParse() needs to have the entire XML string in memory before it does anything.

As far as XPath, in my recent experience, that has proved to be extremely slow. In fact, in one of my previous posts, I found that XPath slowed down my testing by like 13 seconds. It was extremely slow.

@OrangePips,

I'll have to take a look at these types of solutions one day. I think I tried once to get a SAX parser to work, but was having trouble building my event listener as a CFC.

Reply to this Comment

@Ben

Using XOM you don't deal with the SAX API. Instead you use XOM's API which is similar to how Coldfusion works with XML, but has the advantage of being backed by SAX parser instead, which is more memory efficient than the DOM parser CF is using.

Reply to this Comment

@OrangePips,

XOM looks interesting. I tried looking through some of their sample code. I found some of it hard to follow, but I didn't really give it that much looking into. When I have some more time, I'll check it out.

Reply to this Comment

So when it breaks due to being too big, is there anyway of splitting it up into smaller chunks?
I'm thinking of taking the raw data before xmlParsing it and seeing if I can't break it up. But if somebody has experience with this already it would save me the trouble. :)

Reply to this Comment

@Don,

I've played around with a couple of approaches to dealing with XML documents that are too large to be parsed in one shot. In one, approach, I use Regular Expression to try and parse one tag at a time:

http://www.bennadel.com/blog/1345-Ask-Ben-Parsing-Very-Large-XML-Documents-In-ColdFusion.htm

In another approach, similar to that one, I tried to implement a ColdFusion XML listener like the SAX parser:

http://www.bennadel.com/blog/1264-Partial-Entry-For-Steve-Levithan-s-Regular-Expression-Contest.htm

That was more of an experiment, though.

Reply to this Comment

Ben your comments about named-psuedo arrays speed up a parsing process I had by about 95%. If your ever in Virginia Beach I owe you a beer

Reply to this Comment

Have you had problems with this in CF9? I'm trying to build an XML string using cfsavecontent and it's a mare.
Basically, I've lifted the code that works fine on my CF8 box but it's decided it doesn't want to play on the CF9.
My hair's going grey (UK ;o))

Reply to this Comment

The performance difference with XmlChildren and named pseudo-arrays is surprising. I have an xml feed that has grown from 100kb to 8mb over the years. It's become time consuming to run it and I figured XmlParse was the bottleneck. Turns out that arrays were the problem. Switching to XmlChildren improved the script run time by about 20 minutes. Wow.

Reply to this Comment

Great post! We were using cffile action="read" and xmlparse but then our xml file got too big to read. The method you wrote about solved our problems for a while but now we are getting a new java error "Stream closed"

java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145)

Any idea how to fix this? Thanks so much for all of your great posts!

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.