After my little discovery yesterday about the relative speed of XML to ColdFusion custom tags in regards to structured data collection, I decided to do a little more investigation into XML performance in ColdFusion 8. In the following demonstration, I am testing three different aspects of ColdFusion:
Building an XML string using CFSaveContent. I decided against testing CFXML because I think it is less flexible in that having an XML string first allows us to take more actions than just creating an XML document. Plus, I think it is probably just using a data buffer and then parsing the resulting string afterwards (just a guess).
Parsing an XML string using XmlParse(). Basically, taking the above string and parsing it into a ColdFusion XML document.
Traversing an XML document. Taking the resultant ColdFusion XML document from above and walking over each node, getting the value of the leaf-nodes.
The following code does all three of these in a row, each test building on the results of the previous one. I tried with a variety of data set sizes, which I will review afterwards:
<!--- Increase page running time. ---> <cfsetting requesttimeout="200" /> <!--- Create a blank ColdFusion query object. ---> <cfset qData = QueryNew( "" ) /> <!--- Create an array of values. We are going to use this array to populate N columns in the query object. Sure, they values will all be uniform, but this is the easiest method. ---> <cfset arrColumnData = ListToArray( RepeatString( "v#RandRange( 1111, 9999 )#,", 10000 ) ) /> <!--- Add columns to this query. ---> <cfloop index="intColumn" from="1" to="50" step="1"> <!--- Add the data array, from above, as the default data for this new column (this will add a row for each array index). ---> <cfset QueryAddColumn( qData, "column#intColumn#", "cf_sql_varchar", arrColumnData ) /> </cfloop> <!--- Get the column list as an array. This will allow us to loop over it faster which will cut down on XML creation time amoratized over all the rows. ---> <cfset arrColumns = ListToArray( qData.ColumnList ) /> <!--- Keep track of how long it takes to build XML string. ---> <cftimer label="Building XML String" type="outline"> <!--- Create an XML string for the given query. ---> <cfsavecontent variable="strQueryAsXML"> <cfoutput> <query> <!--- Create a row for each record. ---> <cfloop query="qData"> <row index="#qData.CurrentRow#"> <!--- Loop over the columns. ---> <cfloop index="strColumnName" array="#arrColumns#"> <column name="#strColumnName#">#qData[ strColumnName ][ qData.CurrentRow ]#</column> </cfloop> </row> </cfloop> </query> </cfoutput> </cfsavecontent> <p> Done building XML string. </p> </cftimer> <!--- Keep track of how long it takes to parse XML. ---> <cftimer label="Parsing XML String" type="outline"> <!--- Parse into a ColdFusion XML document. ---> <cfset xmlQuery = XmlParse( strQueryAsXML ) /> <p> Done parsing XML string. </p> </cftimer> <!--- Keep track of how long it takes to traverse the XML document (assuming that it is the above query format). ---> <cftimer label="Traversing XML Document" type="outline"> <!--- Kill any output that is caused from the xml document traversal. Killing the white space has a HUGE impact on performance because it ignores any buffering updates (I assume). ---> <cfsilent> <cfloop index="intRow" from="1" to="#ArrayLen( xmlQuery.query.XmlChildren )#" step="1"> <!--- Get a pointer to the current row. ---> <cfset xmlRow = xmlQuery.query.XmlChildren[ intRow ] /> <!--- Loop over each column. ---> <cfloop index="intChild" from="1" to="#ArrayLen( xmlRow.XmlChildren )#" step="1"> <!--- Get a pointer to current child. ---> <cfset xmlChild = xmlRow.XmlChildren[ intChild ] /> <!--- Get node value. ---> <cfset strValue = xmlChild.XmlText /> <!--- We are not going to output any value at this point since that will only slow things down unnecessarily. ---> </cfloop> </cfloop> </cfsilent> <p> Done traversing XML document. </p> </cftimer>
As you can see, the test is running off of manually created ColdFusion query object. While the code in the demo create a query with a set height (row count) and width (column count), I ran is many times with different dimensions. Here are the results that I saw:
1,000 Rows x 50 Cells (50,000 Values)
Building XML String: 289.25ms on average.
Parsing XML String: 836ms on average.
Traversing XML Document: 519.5ms on average.
5,000 Rows x 50 Cells (250,000 Values)
Building XML String: 1,082.88ms on average.
Parsing XML String: 4,465ms on average.
Traversing XML Document: 2,617.25ms on average.
10,000 Rows x 50 Cells (500,000 Values)
Building XML String: 2,835.75ms on average.
Parsing XML String: 9,000ms on average.
Traversing XML Document: 7,668.25ms on average.
20,000 Rows x 50 Cells (1,000,000 Values)
Java heap space null
60,000 Rows x 10 Cells (600,000 Values)
Java heap space null
60,000 Rows x 5 Cells (300,000 Values)
Building XML String: 6,375ms on average.
Parsing XML String: 7,179.75ms on average.
Traversing XML Document: 173,242.25ms on average.
There's a couple of things to make note of here. For starters, ColdFusion 8 can parse XML really fast. A 50,000 leaf-node XML tree parses in under a second and can be fully traversed in just over half a second. That's pretty awesome!
Of course, XML parsing does have its limits - as you can see, once we get over like 500,000 records, ColdFusion simply does not have enough memory to do the XML Parsing (and yes, it is the XmlParse() line that cause the heap space error).
Here's the really interesting thing, though - the depth and breadth of the XML tree each has a different impact on traversal performance. If you look at our third test with 10,000 rows and 500,000 leaf nodes, the tree can be fully traversed in less than 8 seconds. However, if we have 60,000 rows with only 300,000 leaf-nodes (200,000 less that the previous example), it takes 173 seconds to traverse! So, it looks like the 60,000 rows has a greater impact that the 50 columns amortized over the rows.
And finally, while you can't see this in the demo because as it was too slow to use, Named pseudo-arrays that ColdFusion allows when dealing with XML documents is extremely slow! Using the actual XmlChildren arrays was orders of magnitude faster. Moral of the story - use the XmlChildren array.
Ok, I'm exhausted so that's all I'm gonna review for the moment. Have a great weekend.
Want to use code from this post? Check out the license.