Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at the New York ColdFusion User Group (Jan. 2008) with:

Text Nodes Do Not Always Exist In A ColdFusion XML Document

By Ben Nadel on
Tags: ColdFusion

Yesterday, I was working on an audit tracker that stored updates in an XML document when I came across an XML behavior that I didn't know about. Apparently, not all element nodes in an XML document have to have a nested text node. Now, when you think about this, this makes some sense; however, when you look at a ColdFusion XML document, it can certainly be confusing. To examine this, let's create a simple ColdFusion XML document:

  • <!--- Create a ColdFusion XML document. --->
  • <cfxml variable="xmlGirl">
  •  
  • <girl>
  • <name>Hayden Panettiere</name>
  • <age>18</age>
  • <height></height>
  • <weight></weight>
  • <description>
  • Hayden played Claire, the Cheerleader, on the hit
  • Fox television show, Heroes.
  • </description>
  • </girl>
  •  
  • </cfxml>
  •  
  • <!--- Dump out XML document. --->
  • <cfdump
  • var="#xmlGirl#"
  • label="Girl: Hayden Panettiere"
  • />

Here, our Hayden Panettiere girl XML object has a number of nested fields. Of these fields, some have nested text values and some don't. However, even though that is the case, here is what the CFDump of the XML looks like:


 
 
 

 
ColdFusion XML Document That Has Some Text Nodes And Some Non-Text Nodes  
 
 
 

Notice that even though some element nodes have text and other don't, from a ColdFusion XML Document Object Model (DOM) standpoint, they all have XmlText values; some of them just happen to be empty strings. This kind of structure might lead you to believe that all element nodes have a text node element inside of them, and, in fact, this is what I used to think. As it turns out, though, this is not true - the XmlText attribute in the ColdFusion XML DOM has nothing to do with whether or not a text node actually exists.

To prove this, let's use XmlSearch() and XPath to select all nodes in the ColdFusion XML document that have a nested text node. We can do this by using the predicate [ text() ]. This predicate merely checks for existence and is not concerned with actual value:

  • <!---
  • Select all nodes from anywhere in the ColdFusion XML
  • document that have a nested text node.
  • --->
  • <cfset arrNodes = XmlSearch(
  • xmlGirl,
  • "//*[ text() ]"
  • ) />
  •  
  • <!--- Output names of nodes. --->
  • <cfloop
  • index="xmlNode"
  • array="#arrNodes#">
  •  
  • #xmlNode.XmlName#<br />
  •  
  • </cfloop>

After selecting all nodes that have a nested text node and outputting those node names, we get the following list:

  • girl
  • name
  • age
  • description

Notice that nodes Height and Weight are not getting selected. This is because they have no text node. And, if you look up at my original XML, you will see that matter of factly, the Height and Weight nodes open and close with no text data in between. Because there is no text data, there is no text node; so, while the ColdFusion XML DOM has XmlText values for these nodes, realize that they are not actually parents of anything.

While this might not seem like such a problem, this can cause things to be a little kinky when you actually need to query an XML document based on text values. A non-existing text node is very much like a NULL value in SQL; it's an "unknown" value. And, because it's an unknown value, you can't compare data do it. This goes for both testing equality as well as inequality. To demonstrate this, let's get all nodes whose text value is either equal to or not equal to, "Blam":

  • <!---
  • Select all nodes from anywhere in the CodlFusion XML
  • document if their text value equals "Blam" or does
  • NOT equal "Blam".
  • --->
  • <cfset arrNodes = XmlSearch(
  • xmlGirl,
  • "//*[ (text() = 'Blam') or (text() != 'Blam') ]"
  • ) />
  •  
  • <!--- Output names of nodes. --->
  • <cfloop
  • index="xmlNode"
  • array="#arrNodes#">
  •  
  • #xmlNode.XmlName#<br />
  •  
  • </cfloop>

Instinctively, you might think that this will return all nodes of the XML, right? I mean, anyone who's taken math knows that something and NOT something should return the "universe". However, just as with SQL and NULL values, because some of our text nodes don't exist, they cannot result in a known comparison whether that be a test of equality or inequality. And, in fact, when we output the selected node names:

  • girl
  • name
  • age
  • description

... we see that, again, neither the Height or Weight element nodes were selected. Imagine trying to select all nodes whose text value was NOT something. If you didn't realize how this worked, you might spend a heck of a lot of time banging your head against a wall trying to figure out why only some of the expected nodes were being selected.

If you work with a lot of XML, you probably already know this; but, if you have only worked with the ColdFusion XML Document Object Model (DOM), it may not be immediately obvious that the existence of an XmlText attribute does not mean that there is a corresponding text node in the DOM. I know that I didn't realize this, and it probably took me a good 30 minutes to figure out what the heck was going on.




Reader Comments

Ben - Thanks for these posts on working with XML. I have just started doing for work with XML files and I know that I will be glad to have this type of information further down the road when I am troubleshooting issues with datasets that have tens of thousands of records or more.

Reply to this Comment

@Jason,

No problem my man. I can tell you right now, though, that XML documents that have enormous amounts of data are no fun to deal with :) Parsing is not that fast and people tell me that ColdFusion can crash if the XML parsing eats up all the memory.

That is what I hear from other people - I have never actually had to deal with such large files.

Reply to this Comment

Thanks Ben.
I'm refactoring an application and the Table of Contents is coming from an XML file. I've seen the code and I'm dreading to start working on it.

It's good to know this before hand so I don't pull my hair out when working on the project.

çB^]\..

Reply to this Comment

@Fernando,

No problem my man. I do love me some XML and XPath, even some XSLT. If you run into any problems, drop me a line.

Reply to this Comment

@Ben,
If you're trying to check for "empty" "Text Nodes", you could alter your XPath statement to something like this:

<cfset arrNodes = XmlSearch(xmlGirl, "//*[ boolean(text()) = false ]") />

Reply to this Comment

@Steve,

Thanks man. I have to do a more thorough exploration of the XPath functions that are actually supported in ColdFusion 8. I have tested a few of them and it seems to be really hit or miss.

Reply to this Comment

Ah ha. It is me that seems to be mistaken! CF is providing an xmlText, but there is no empty string as far as XPath is concerned. I should read the spec better next time before I comment.

text() actually returns a node set with all the text nodes of the element, and if there are no text nodes it returns an empty node set, which when compared to other things returns false.

So it looks like "//*[text() or not(node())]" would do what you want.

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.