Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at the New York ColdFusion User Group (Nov. 2009) with:

ColdFusion Custom Tags Are Significantly Faster Than XML Parsing

By Ben Nadel on
Tags: ColdFusion

IMPORTANT Update: This post is NOT accurate. As it turns out, it is the XmlSearch() that is causing such a slow down in the XML version. When the XmlSearch() / XPath is removed, the XML parsing method is significatnly faster. See this update.

When I started building my POI Utility functionality for creating native, highly formatted Excel documents, I had to decide whether I wanted to use XML or ColdFusion custom tags. I decided to go with ColdFusion custom tags because I just assumed that they would execute faster than XML parsing. This was just a gut feeling, I never actually tested it. However, after some recent report generation was taking longer than I would have hoped for when using my ColdFusion custom tag POI library, I decided that it would be worth at least testing the difference between XML parsing and ColdFusion custom tag execution.

To test relative speeds, I decide to take a ColdFusion query object and convert it to a CSV file using tabs. One version uses XML parsing, the other uses ColdFusion custom tags instead of XML nodes. Here is the XML version of the test:

  • <!--- Import the tag library. --->
  • <cfimport taglib="./" prefix="tag" />
  •  
  • <!--- Include the query builder. --->
  • <cfinclude template="_build_query.cfm" />
  •  
  •  
  • <!--- Test parsing speed of XML. --->
  • <cftimer label="Xml Data" type="outline">
  •  
  • <!--- Parse the data as XML. --->
  • <tag:datafromxml>
  • <cfoutput>
  •  
  • <data>
  • <cfloop query="qData">
  • <row>
  • <value>#qData.col1#</value>
  • <value>#qData.col2#</value>
  • <value>#qData.col3#</value>
  • <value>#qData.col4#</value>
  • <value>#qData.col5#</value>
  • <value>#qData.col6#</value>
  • <value>#qData.col7#</value>
  • <value>#qData.col8#</value>
  • <value>#qData.col9#</value>
  • </row>
  • </cfloop>
  • </data>
  •  
  • </cfoutput>
  • </tag:datafromxml>
  •  
  • Done.
  •  
  • </cftimer>

As you can see, I do use a custom tag (datafromxml) to contain the XML; but, all the data inside that tag is pure XML. Let's take a quick look at the custom tag that is parsing that XML:

  • <!--- Check to see which tag mode we are executing. --->
  • <cfswitch expression="#THISTAG.ExecutionMode#">
  •  
  • <cfcase value="Start">
  •  
  • <!--- Set the path to our output file. --->
  • <cfset THISTAG.FilePath = ExpandPath( "xml_data.csv" ) />
  •  
  • </cfcase>
  •  
  • <cfcase value="End">
  •  
  • <!--- Parse the XML that was generated in this tag. --->
  • <cfset THISTAG.XmlData = XmlParse(
  • Trim( THISTAG.GeneratedContent )
  • ) />
  •  
  • <!---
  • Create a string buffer to hold intermediary data so
  • we don't have to write to the file just yet.
  • --->
  • <cfset THISTAG.Buffer = CreateObject(
  • "java",
  • "java.lang.StringBuffer"
  • ).Init()
  • />
  •  
  •  
  • <!--- Search for row nodes. --->
  • <cfset THISTAG.Rows = XmlSearch(
  • THISTAG.XmlData,
  • "/data/row/"
  • ) />
  •  
  •  
  • <!--- Loop over row nodes. --->
  • <cfloop
  • index="THISTAG.XmlRow"
  • array="#THISTAG.Rows#">
  •  
  • <!--- Search for values in this row. --->
  • <cfset THISTAG.Values = XmlSearch(
  • THISTAG.XmlRow,
  • "./value/"
  • ) />
  •  
  • <!--- Loop over value nodes. --->
  • <cfloop
  • index="THISTAG.XmlValue"
  • array="#THISTAG.Values#">
  •  
  • <!---
  • Add value to string buffer. Add a tab after
  • each value (this will leave a tag at the end
  • of every line, but I am worried about speed,
  • not extra characters).
  • --->
  • <cfset THISTAG.Buffer.Append(
  • JavaCast(
  • "string",
  • (
  • THISTAG.XmlValue.XmlText &
  • Chr( 9 )
  • ))
  • ) />
  •  
  • </cfloop>
  •  
  •  
  • <!--- Now that we added the values, add new line. --->
  • <cfset THISTAG.Buffer.Append(
  • JavaCast( "string", (Chr( 13 ) & Chr( 10 )) )
  • ) />
  •  
  • </cfloop>
  •  
  •  
  • <!---
  • Our string buffer should contain our CSV data. Now,
  • let's write that to the output file.
  • --->
  • <cffile
  • action="write"
  • file="#THISTAG.FilePath#"
  • output="#THISTAG.Buffer.ToString()#"
  • />
  •  
  • <!--- Reset the content. --->
  • <cfset THISTAG.GeneratedContent = "" />
  •  
  • </cfcase>
  •  
  • </cfswitch>

As you can see, once tag finishes building the inner content, I parse it into XML then loop over the rows and value nodes adding them to the running string buffer. A the end of the End tag execution mode, I then write the string buffer to a file. Pretty straightforward. The whole file executes in about 13 seconds.

Ok, so now, let's look at the equivalent ColdFusion custom tag test:

  • <!--- Import the tag library. --->
  • <cfimport taglib="./" prefix="tag" />
  •  
  • <!--- Include the query builder. --->
  • <cfinclude template="_build_query.cfm" />
  •  
  •  
  • <!--- Test parsing speed of XML. --->
  • <cftimer label="Tag Data" type="outline">
  •  
  • <!--- Parse the data with custom tags. --->
  • <tag:datafromtags>
  •  
  • <cfloop query="qData">
  • <tag:row>
  • <tag:value value="#qData.col1#" />
  • <tag:value value="#qData.col2#" />
  • <tag:value value="#qData.col3#" />
  • <tag:value value="#qData.col4#" />
  • <tag:value value="#qData.col5#" />
  • <tag:value value="#qData.col6#" />
  • <tag:value value="#qData.col7#" />
  • <tag:value value="#qData.col8#" />
  • <tag:value value="#qData.col9#" />
  • </tag:row>
  • </cfloop>
  •  
  • </tag:datafromtags>
  •  
  • Done.
  •  
  • </cftimer>

This looks very similar to the XML test, except for instead of creating an XML string within the root tag, we are actually using ColdFusion custom tags for our ROW and VALUE nodes. This gives us access to the values "as they happen" rather than having to get at them after the fact.

Here is the root ColdFusion custom tag for this test:

  • <!--- Check to see which tag mode we are executing. --->
  • <cfswitch expression="#THISTAG.ExecutionMode#">
  •  
  • <cfcase value="Start">
  •  
  • <!--- Set the path to our output file. --->
  • <cfset VARIABLES.FilePath = ExpandPath( "tag_data.csv" ) />
  •  
  • <!---
  • Create a string buffer to hold intermediary data so
  • we don't have to write to the file just yet.
  • --->
  • <cfset VARIABLES.Buffer = CreateObject(
  • "java",
  • "java.lang.StringBuffer"
  • ).Init()
  • />
  •  
  • </cfcase>
  •  
  • <cfcase value="End">
  •  
  • <!---
  • Our string buffer should contain our CSV data. Now,
  • let's write that to the output file.
  • --->
  • <cffile
  • action="write"
  • file="#VARIABLES.FilePath#"
  • output="#VARIABLES.Buffer.ToString()#"
  • />
  •  
  • <!--- Reset the content. --->
  • <cfset THISTAG.GeneratedContent = "" />
  •  
  • </cfcase>
  •  
  • </cfswitch>

Notice that this tag doesn't have to parse any data because it assumes that its child tags update its String Buffer.

Here is the Row tag:

  • <!--- Check to see which tag mode we are executing. --->
  • <cfswitch expression="#THISTAG.ExecutionMode#">
  •  
  • <cfcase value="Start">
  •  
  • <!--- Associate with base tag. --->
  • <cfset VARIABLES.BaseTag = GetBaseTagData( "cf_datafromtags" ) />
  •  
  • </cfcase>
  •  
  • <cfcase value="End">
  •  
  • <!--- Now that we added the values, add new line. --->
  • <cfset VARIABLES.BaseTag.Buffer.Append(
  • JavaCast( "string", (Chr( 13 ) & Chr( 10 )) )
  • ) />
  •  
  • <!--- Reset the content. --->
  • <cfset THISTAG.GeneratedContent = "" />
  •  
  • </cfcase>
  •  
  • </cfswitch>

... and the Value tag:

  • <!--- Check to see which tag mode we are executing. --->
  • <cfswitch expression="#THISTAG.ExecutionMode#">
  •  
  • <cfcase value="Start">
  •  
  • <!--- Param attributes. --->
  • <cfparam name="ATTRIBUTES.Value" type="string" />
  •  
  • <!--- Associate with base tag. --->
  • <cfset VARIABLES.BaseTag = GetBaseTagData( "cf_datafromtags" ) />
  •  
  • </cfcase>
  •  
  • <cfcase value="End">
  •  
  • <!---
  • Add value to string buffer. Add a tab after each
  • value (this will leave a tag at the end of every
  • line, but I am worried about speed, not extra
  • characters).
  • --->
  • <cfset VARIABLES.BaseTag.Buffer.Append(
  • JavaCast(
  • "string",
  • (ATTRIBUTES.Value & Chr( 9 ))
  • )
  • ) />
  •  
  • <!--- Reset the content. --->
  • <cfset THISTAG.GeneratedContent = "" />
  •  
  • </cfcase>
  •  
  • </cfswitch>

The value tag does the bulk of the work by adding its own attribute value to the base tag's String Buffer.

This method clearly has a little more overhead in that it requires three ColdFusion custom tags to execute rather than just a single one; however, the additional overhead seems to be quite worth it. This methodology ran in about 5 seconds.

So, in the end, this test is really just confirming what my gut was telling me all along - building data sets with ColdFusion custom tags is significantly faster than building data with XML. Over several tests, the numbers were very consistent:

Xml Parsing: 13 seconds.

ColdFusion Custom Tags: 5 seconds.

And, if you care about how the query was built, although it really shoudl have no influence on the comparitive speeds as they both included the same file:

  • <!--- Create a query for testing. --->
  • <cfset qData = QueryNew(
  • "col1, col2, col3, col4, col5, col6, col7, col8, col9",
  • "cf_sql_varchar, cf_sql_varchar, cf_sql_varchar, cf_sql_varchar, cf_sql_varchar, cf_sql_varchar, cf_sql_varchar, cf_sql_varchar, cf_sql_varchar"
  • ) />
  •  
  • <!--- Add rows to query. --->
  • <cfset QueryAddRow( qData, 1000 ) />
  •  
  • <!--- Populate query with random data. --->
  • <cfloop
  • index="intI"
  • from="1"
  • to="#qData.RecordCount#"
  • step="1">
  •  
  • <!--- Loop over each column and populate with random data. --->
  • <cfloop
  • index="intJ"
  • from="1"
  • to="#ListLen( qData.ColumnList )#"
  • step="1">
  •  
  • <!--- Set random column value. --->
  • <cfset qData[ "col#intJ#" ][ intI ] = JavaCast(
  • "string",
  • RandRange( 111, 999 )
  • ) />
  •  
  • </cfloop>
  •  
  • </cfloop>



Reader Comments

try removing the xmlsearch functions from the xmlparser code and access the data using dot | bracket notation. i bet you could shave off some seconds with that.

@Tony,

It's funny you mention that cause when I was writing the demo, I did actually think about that. In the end, I decided not to do that because that way, if there were no Value nodes or what not, the code wouldn't break. Of course, I could check to see if the key exists first... maybe I will give that try to see if it makes any difference.

I think most important is what API is easier for your users to work with. I'd place higher importance on that then speed. Obviously speed is important. I'm not saying ignore that. But if the speed was somewhat close, or even not drastically different, I'd definitely give preference to the API that is easiest for folks to use.

Ben,

Speed kills. I've been using your POI utility to generate a multi-worksheet document (weekly tabs form 2006 to the present) and with about 30 rows of 15 data items per sheet. I can generate the whole sheet within 30 seconds, which rocks compared to other solutions.

What bogs me down, is having to twiddle through the .NET Directory services to retrieve over 800 records from Active Directory... I've had to resort to using cfthread, which kinda sorta works...but it's still taking 5 minutes plus just to loop the entire recordset. You'd think that using a native call to the .NET directory services object would be faster than CFLDAP, but apparently it's just not hinged that way.

I don't suppose you know of a quicker way to pull back a query of data from AD, ala CFLDAP, vice an object that requires a call per record?

@Ray,

I completely agree. That's why I like ColdFusion custom tags so much - they look and feel very much like XML except for the tag prefix. I find XML to be awesome and very user-friendly. ColdFusion custom tags, I think, take that usability and encapsulate a lot of functionality.

The nice thing about XML, on the other hand is that you can have it in a totally separate file. You could have a ".xml" that someone creates and then another file includes (or uploads or something). You simply can't get that goodness from custom-tag-based interfaces.

@Brian,

That's awesome to hear that such an enormous data set is generated so relatively quickly :) Rock on. ... can't help you with the LDAP - never used it, sorry.

I bet you would see a big improvement with the xml parsing method if you used xslt instead of doing all that looping in the custom tag.

@Mike,

I am not sure that would make sense. XSLT is used to transform one XML document into another XML document. My goal here isn't to great an XML document - it's to get data from the user. In my demo, I am creating a CSV, granted - but in my POI project, I am generating and Excel document - this cannot be done with XSLT (or at least not in any feasible way).

@Ben, Mikes comment about xslt though is fair in pointing to a solution to the problem of creating the csv. And with your leadership and standing, people very possibly will come across this while trying to find a solution for creating csv, or just for generating ideas on transforming data.

my simple version of an xslt transformation completes the process in ~200 ms with the same 1000 rows and creating the file.

It absolutely doesn't fix your problem with trying to work with poi, but it is helpful either way.

I'm gonna shoot you an email with my sample code, since I suspect it would get butchered in a post.

@Ben, you're right that you need to start with an XML document to use xslt, but you don't necessarily need to end up with an XML document.

For anyone that's a little curious, here's a very short stylesheet that can be applied to your example xml document above. While this has an extra tab at the end of each line and an extra carriage return, those could be stripped off relatively easily but it would make the sample here a bit longer.

Here goes (I have no idea how code is going to post here, but I'll take a shot)

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xml:space="default">
<xsl:strip-space elements="datafromxml data row"/>
<xsl:output omit-xml-declaration="yes"/>

<xsl:template match="row">
<xsl:apply-templates />
<xsl:text> </xsl:text>
</xsl:template>

<xsl:template match="value">
<xsl:value-of select="node()"/>
<xsl:text> </xsl:text>
</xsl:template>
</xsl:stylesheet>

I originally thought that you might be working with the .xslx file format, in which case, you could do some pretty cools stuff with xslt.

oops. I was afraid something would go wrong there.

The first xsl:text element should contain: &#x0A;(carriage return)
and the second should contain: &#x09;(tab)

No doubt XSLT is a powerful tool, and yes, it can be used to create non-XML documents. Matt, I got your email, thanks for that.

So, yes, good point, if anyone comes to this post looking for a way to convert XML to CSV, definitely take a look at XSLT :)

@Tony,

HOLY COW! I'm eating lunch right now and I tried to run it without XmlSearch()... and its WICKED fast! New post on its way :)