Text Nodes Do Not Always Exist In A ColdFusion XML Document

Posted May 22, 2008 at 8:13 AM by Ben Nadel

Tags: ColdFusion

Yesterday, I was working on an audit tracker that stored updates in an XML document when I came across an XML behavior that I didn't know about. Apparently, not all element nodes in an XML document have to have a nested text node. Now, when you think about this, this makes some sense; however, when you look at a ColdFusion XML document, it can certainly be confusing. To examine this, let's create a simple ColdFusion XML document:

  • <!--- Create a ColdFusion XML document. --->
  • <cfxml variable="xmlGirl">
  •  
  • <girl>
  • <name>Hayden Panettiere</name>
  • <age>18</age>
  • <height></height>
  • <weight></weight>
  • <description>
  • Hayden played Claire, the Cheerleader, on the hit
  • Fox television show, Heroes.
  • </description>
  • </girl>
  •  
  • </cfxml>
  •  
  • <!--- Dump out XML document. --->
  • <cfdump
  • var="#xmlGirl#"
  • label="Girl: Hayden Panettiere"
  • />

Here, our Hayden Panettiere girl XML object has a number of nested fields. Of these fields, some have nested text values and some don't. However, even though that is the case, here is what the CFDump of the XML looks like:


 
 
 

 
ColdFusion XML Document That Has Some Text Nodes And Some Non-Text Nodes  
 
 
 

Notice that even though some element nodes have text and other don't, from a ColdFusion XML Document Object Model (DOM) standpoint, they all have XmlText values; some of them just happen to be empty strings. This kind of structure might lead you to believe that all element nodes have a text node element inside of them, and, in fact, this is what I used to think. As it turns out, though, this is not true - the XmlText attribute in the ColdFusion XML DOM has nothing to do with whether or not a text node actually exists.

To prove this, let's use XmlSearch() and XPath to select all nodes in the ColdFusion XML document that have a nested text node. We can do this by using the predicate [ text() ]. This predicate merely checks for existence and is not concerned with actual value:

  • <!---
  • Select all nodes from anywhere in the ColdFusion XML
  • document that have a nested text node.
  • --->
  • <cfset arrNodes = XmlSearch(
  • xmlGirl,
  • "//*[ text() ]"
  • ) />
  •  
  • <!--- Output names of nodes. --->
  • <cfloop
  • index="xmlNode"
  • array="#arrNodes#">
  •  
  • #xmlNode.XmlName#<br />
  •  
  • </cfloop>

After selecting all nodes that have a nested text node and outputting those node names, we get the following list:

  • girl
  • name
  • age
  • description

Notice that nodes Height and Weight are not getting selected. This is because they have no text node. And, if you look up at my original XML, you will see that matter of factly, the Height and Weight nodes open and close with no text data in between. Because there is no text data, there is no text node; so, while the ColdFusion XML DOM has XmlText values for these nodes, realize that they are not actually parents of anything.

While this might not seem like such a problem, this can cause things to be a little kinky when you actually need to query an XML document based on text values. A non-existing text node is very much like a NULL value in SQL; it's an "unknown" value. And, because it's an unknown value, you can't compare data do it. This goes for both testing equality as well as inequality. To demonstrate this, let's get all nodes whose text value is either equal to or not equal to, "Blam":

  • <!---
  • Select all nodes from anywhere in the CodlFusion XML
  • document if their text value equals "Blam" or does
  • NOT equal "Blam".
  • --->
  • <cfset arrNodes = XmlSearch(
  • xmlGirl,
  • "//*[ (text() = 'Blam') or (text() != 'Blam') ]"
  • ) />
  •  
  • <!--- Output names of nodes. --->
  • <cfloop
  • index="xmlNode"
  • array="#arrNodes#">
  •  
  • #xmlNode.XmlName#<br />
  •  
  • </cfloop>

Instinctively, you might think that this will return all nodes of the XML, right? I mean, anyone who's taken math knows that something and NOT something should return the "universe". However, just as with SQL and NULL values, because some of our text nodes don't exist, they cannot result in a known comparison whether that be a test of equality or inequality. And, in fact, when we output the selected node names:

  • girl
  • name
  • age
  • description

... we see that, again, neither the Height or Weight element nodes were selected. Imagine trying to select all nodes whose text value was NOT something. If you didn't realize how this worked, you might spend a heck of a lot of time banging your head against a wall trying to figure out why only some of the expected nodes were being selected.

If you work with a lot of XML, you probably already know this; but, if you have only worked with the ColdFusion XML Document Object Model (DOM), it may not be immediately obvious that the existence of an XmlText attribute does not mean that there is a corresponding text node in the DOM. I know that I didn't realize this, and it probably took me a good 30 minutes to figure out what the heck was going on.




Reader Comments

May 22, 2008 at 9:49 AM // reply »
76 Comments

Ben - Thanks for these posts on working with XML. I have just started doing for work with XML files and I know that I will be glad to have this type of information further down the road when I am troubleshooting issues with datasets that have tens of thousands of records or more.


May 22, 2008 at 9:53 AM // reply »
10,640 Comments

@Jason,

No problem my man. I can tell you right now, though, that XML documents that have enormous amounts of data are no fun to deal with :) Parsing is not that fast and people tell me that ColdFusion can crash if the XML parsing eats up all the memory.

That is what I hear from other people - I have never actually had to deal with such large files.


May 22, 2008 at 10:07 AM // reply »
9 Comments

Thanks Ben.
I'm refactoring an application and the Table of Contents is coming from an XML file. I've seen the code and I'm dreading to start working on it.

It's good to know this before hand so I don't pull my hair out when working on the project.

çB^]\..


May 22, 2008 at 10:10 AM // reply »
10,640 Comments

@Fernando,

No problem my man. I do love me some XML and XPath, even some XSLT. If you run into any problems, drop me a line.


May 22, 2008 at 10:58 AM // reply »
65 Comments

@Ben,
If you're trying to check for "empty" "Text Nodes", you could alter your XPath statement to something like this:

<cfset arrNodes = XmlSearch(xmlGirl, "//*[ boolean(text()) = false ]") />


May 22, 2008 at 11:01 AM // reply »
65 Comments

@Ben,
I forgot to mention my point ... my point is the XmlText Node *does* exist, but as you mentioned before, it's just empty.


May 22, 2008 at 11:12 AM // reply »
10,640 Comments

@Steve,

Thanks man. I have to do a more thorough exploration of the XPath functions that are actually supported in ColdFusion 8. I have tested a few of them and it seems to be really hit or miss.


May 22, 2008 at 12:34 PM // reply »
65 Comments

@Ben,
No problem. I blogged about this at:
http://www.stephenwithington.com/blog/index.cfm/2008/5/22/Text-Nodes-DO-Always-Exist-in-a-ColdFusion-XML-Document

And actually, I updated the code to check for empty text() to:

<cfset arrNodes = XmlSearch(xmlGirl, '//*[ (boolean(text()) = 0) ]') />

It seems this is the proper syntax for false.


May 24, 2008 at 4:23 AM // reply »
132 Comments

You seem to be confused. An empty string evaluates to false, so it doesn't match those nodes.

The XPath spec explains what is true and what is false in more detail:

http://www.w3.org/TR/xpath#function-boolean


May 24, 2008 at 4:36 AM // reply »
132 Comments

Ah ha. It is me that seems to be mistaken! CF is providing an xmlText, but there is no empty string as far as XPath is concerned. I should read the spec better next time before I comment.

text() actually returns a node set with all the text nodes of the element, and if there are no text nodes it returns an empty node set, which when compared to other things returns false.

So it looks like "//*[text() or not(node())]" would do what you want.


Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
InVision App - Prototyping Made Beautiful With Prototyping Tools Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
Feb 10, 2012 at 7:21 PM
jQuery AJAX Strips Script Tags And Inserts Them After Parent-Most Elements
Update! Instead of $(eval(options.insertAfter)).after(data['insertData']); I now use: var ajaxNode = document.createElement('span'); var parent = $(eval(options.insertAfter))[0].parentNode; ... read »
Feb 10, 2012 at 6:18 PM
jQuery AJAX Strips Script Tags And Inserts Them After Parent-Most Elements
encountered this same, what I consider, jQuery bug last week. I'm building a site in which I load some content via AJAX. This content contains Linkedin share button placeholders which Linkedin API ne ... read »
Feb 10, 2012 at 11:30 AM
Cross-Origin Resource Sharing (CORS) AJAX Requests Between jQuery And Node.js
After you understand the concepts here, this is an awesome cheatsheet for enabling CORS in just about anything http://enable-cors.org/ ... read »
JM
Feb 10, 2012 at 9:10 AM
My Safari Browser SQLite Database Hello World Example
@Amy, Here is a very good tutorial on how to use JOIN: http://www.sqltutorial.org/sqljoin-innerjoin.aspx ... read »
Feb 10, 2012 at 4:42 AM
Building A Twitter-Inspired RESTful API Architecture In ColdFusion
This is great, very useful Ben. I spotted a small typo in the api.cgm listing: <cfthrow type="Unauthroized" /> Cheers Stefan ... read »
Feb 9, 2012 at 10:35 PM
CFDirectory Filtering Uses Pipe Character For Multiple Filters (Thanks Steve Withington)
I was wondering if there would be a filter you could apply so that you got everything but what you included in the filter. As in show me all docs that are not a .pdf. ... read »
Feb 9, 2012 at 10:29 PM
Learning ColdFusion 9: Application-Specific Data Sources
@Ben, No offence, but if people were really wanting advanced features they would be using a platform like ASP.NET MVC. CFML is so structurally compromised as a tag-based scripting language that ... read »
Feb 9, 2012 at 10:03 PM
Subversion - Cleanup Failed To Process The Following Paths
@Leviaguirre, do you still have problems with this? ... read »