Text Nodes Do Not Always Exist In A ColdFusion XML Document

Posted May 22, 2008 at 8:13 AM

Tags: ColdFusion

Yesterday, I was working on an audit tracker that stored updates in an XML document when I came across an XML behavior that I didn't know about. Apparently, not all element nodes in an XML document have to have a nested text node. Now, when you think about this, this makes some sense; however, when you look at a ColdFusion XML document, it can certainly be confusing. To examine this, let's create a simple ColdFusion XML document:

 Launch code in new window » Download code as text file »

  • <!--- Create a ColdFusion XML document. --->
  • <cfxml variable="xmlGirl">
  •  
  • <girl>
  • <name>Hayden Panettiere</name>
  • <age>18</age>
  • <height></height>
  • <weight></weight>
  • <description>
  • Hayden played Claire, the Cheerleader, on the hit
  • Fox television show, Heroes.
  • </description>
  • </girl>
  •  
  • </cfxml>
  •  
  • <!--- Dump out XML document. --->
  • <cfdump
  • var="#xmlGirl#"
  • label="Girl: Hayden Panettiere"
  • />

Here, our Hayden Panettiere girl XML object has a number of nested fields. Of these fields, some have nested text values and some don't. However, even though that is the case, here is what the CFDump of the XML looks like:


 
 
 

 
ColdFusion XML Document That Has Some Text Nodes And Some Non-Text Nodes  
 
 
 

Notice that even though some element nodes have text and other don't, from a ColdFusion XML Document Object Model (DOM) standpoint, they all have XmlText values; some of them just happen to be empty strings. This kind of structure might lead you to believe that all element nodes have a text node element inside of them, and, in fact, this is what I used to think. As it turns out, though, this is not true - the XmlText attribute in the ColdFusion XML DOM has nothing to do with whether or not a text node actually exists.

To prove this, let's use XmlSearch() and XPath to select all nodes in the ColdFusion XML document that have a nested text node. We can do this by using the predicate [ text() ]. This predicate merely checks for existence and is not concerned with actual value:

 Launch code in new window » Download code as text file »

  • <!---
  • Select all nodes from anywhere in the ColdFusion XML
  • document that have a nested text node.
  • --->
  • <cfset arrNodes = XmlSearch(
  • xmlGirl,
  • "//*[ text() ]"
  • ) />
  •  
  • <!--- Output names of nodes. --->
  • <cfloop
  • index="xmlNode"
  • array="#arrNodes#">
  •  
  • #xmlNode.XmlName#<br />
  •  
  • </cfloop>

After selecting all nodes that have a nested text node and outputting those node names, we get the following list:

  • girl
  • name
  • age
  • description

Notice that nodes Height and Weight are not getting selected. This is because they have no text node. And, if you look up at my original XML, you will see that matter of factly, the Height and Weight nodes open and close with no text data in between. Because there is no text data, there is no text node; so, while the ColdFusion XML DOM has XmlText values for these nodes, realize that they are not actually parents of anything.

While this might not seem like such a problem, this can cause things to be a little kinky when you actually need to query an XML document based on text values. A non-existing text node is very much like a NULL value in SQL; it's an "unknown" value. And, because it's an unknown value, you can't compare data do it. This goes for both testing equality as well as inequality. To demonstrate this, let's get all nodes whose text value is either equal to or not equal to, "Blam":

 Launch code in new window » Download code as text file »

  • <!---
  • Select all nodes from anywhere in the CodlFusion XML
  • document if their text value equals "Blam" or does
  • NOT equal "Blam".
  • --->
  • <cfset arrNodes = XmlSearch(
  • xmlGirl,
  • "//*[ (text() = 'Blam') or (text() != 'Blam') ]"
  • ) />
  •  
  • <!--- Output names of nodes. --->
  • <cfloop
  • index="xmlNode"
  • array="#arrNodes#">
  •  
  • #xmlNode.XmlName#<br />
  •  
  • </cfloop>

Instinctively, you might think that this will return all nodes of the XML, right? I mean, anyone who's taken math knows that something and NOT something should return the "universe". However, just as with SQL and NULL values, because some of our text nodes don't exist, they cannot result in a known comparison whether that be a test of equality or inequality. And, in fact, when we output the selected node names:

  • girl
  • name
  • age
  • description

... we see that, again, neither the Height or Weight element nodes were selected. Imagine trying to select all nodes whose text value was NOT something. If you didn't realize how this worked, you might spend a heck of a lot of time banging your head against a wall trying to figure out why only some of the expected nodes were being selected.

If you work with a lot of XML, you probably already know this; but, if you have only worked with the ColdFusion XML Document Object Model (DOM), it may not be immediately obvious that the existence of an XmlText attribute does not mean that there is a corresponding text node in the DOM. I know that I didn't realize this, and it probably took me a good 30 minutes to figure out what the heck was going on.

Download Code Snippet ZIP File

Post Comment  |  Ask Ben  |  Permalink  |  Other Searches  |  Print Page





Reader Comments

May 22, 2008 at 9:49 AM // reply »
53 Comments

Ben - Thanks for these posts on working with XML. I have just started doing for work with XML files and I know that I will be glad to have this type of information further down the road when I am troubleshooting issues with datasets that have tens of thousands of records or more.


May 22, 2008 at 9:53 AM // reply »
6,371 Comments

@Jason,

No problem my man. I can tell you right now, though, that XML documents that have enormous amounts of data are no fun to deal with :) Parsing is not that fast and people tell me that ColdFusion can crash if the XML parsing eats up all the memory.

That is what I hear from other people - I have never actually had to deal with such large files.


May 22, 2008 at 10:07 AM // reply »
9 Comments

Thanks Ben.
I'm refactoring an application and the Table of Contents is coming from an XML file. I've seen the code and I'm dreading to start working on it.

It's good to know this before hand so I don't pull my hair out when working on the project.

çB^]\..


May 22, 2008 at 10:10 AM // reply »
6,371 Comments

@Fernando,

No problem my man. I do love me some XML and XPath, even some XSLT. If you run into any problems, drop me a line.


May 22, 2008 at 10:58 AM // reply »
49 Comments

@Ben,
If you're trying to check for "empty" "Text Nodes", you could alter your XPath statement to something like this:

<cfset arrNodes = XmlSearch(xmlGirl, "//*[ boolean(text()) = false ]") />


May 22, 2008 at 11:01 AM // reply »
49 Comments

@Ben,
I forgot to mention my point ... my point is the XmlText Node *does* exist, but as you mentioned before, it's just empty.


May 22, 2008 at 11:12 AM // reply »
6,371 Comments

@Steve,

Thanks man. I have to do a more thorough exploration of the XPath functions that are actually supported in ColdFusion 8. I have tested a few of them and it seems to be really hit or miss.


May 22, 2008 at 12:34 PM // reply »
49 Comments

@Ben,
No problem. I blogged about this at:
http://www.stephenwithington.com/blog/index.cfm/2008/5/22/Text-Nodes-DO-Always-Exist-in-a-ColdFusion-XML-Document

And actually, I updated the code to check for empty text() to:

<cfset arrNodes = XmlSearch(xmlGirl, '//*[ (boolean(text()) = 0) ]') />

It seems this is the proper syntax for false.


May 24, 2008 at 4:23 AM // reply »
123 Comments

You seem to be confused. An empty string evaluates to false, so it doesn't match those nodes.

The XPath spec explains what is true and what is false in more detail:

http://www.w3.org/TR/xpath#function-boolean


May 24, 2008 at 4:36 AM // reply »
123 Comments

Ah ha. It is me that seems to be mistaken! CF is providing an xmlText, but there is no empty string as far as XPath is concerned. I should read the spec better next time before I comment.

text() actually returns a node set with all the text nodes of the element, and if there are no text nodes it returns an empty node set, which when compared to other things returns false.

So it looks like "//*[text() or not(node())]" would do what you want.


Post Comment  |  Ask Ben

Recent Blog Comments
Nov 7, 2009 at 5:53 PM
Ask Ben: Javascript String Replace Method
You can find here an advanced function that prepared with javascript replace function. This can make the first letters of words, sentences, lines and whatever you define automatically: http://www.m ... read »
Andrew Neely
Nov 7, 2009 at 4:56 PM
A Moment That Touched Me - The Fountainhead
Ben, Glad you enjoyed the podcast. Yeah, the Tank Riot guys can get really chatty during the episodes, but that's part of the charm of it for me. They've covered everything from Nichola Tesla to Cha ... read »
Nov 7, 2009 at 4:43 PM
Building A Fixed-Position Bottom Menu Bar (ala FaceBook)
Is it possible to make some more Menü`s ? ... read »
Jill
Nov 7, 2009 at 11:40 AM
How To Unformat Your Code (Like A Pro)
Derek, I think you might be right - sweet! Thanks for the link :) ... read »
Nov 7, 2009 at 11:25 AM
How To Unformat Your Code (Like A Pro)
I think it would be way easier to just use this http://www.logichammer.com/html-formatter/ He just released v3 and it rocks. ... read »
Jill
Nov 7, 2009 at 7:58 AM
How To Unformat Your Code (Like A Pro)
LMAO - this was pretty funny! I have to admit - I also love to reformat code so I can read it. My boss used to tell me to leave my OCD at home. Now I don't feel so bad after reading everyone else' ... read »
Nov 6, 2009 at 10:10 PM
How To Unformat Your Code (Like A Pro)
The timing of this post is just uncanny. I spent the last 15-20 minutes manually un-formatting my "Ben Nadel" style code within a CFC of mine. I was really digging the readability a few weeks ago, bu ... read »
Roe
Nov 6, 2009 at 5:11 PM
Passing Arrays By Reference In ColdFusion - SWEEET!
ArraySort also reorders the results of these java obj's ... read »