XmlSearch() Ignores CDATA Sections In ColdFusion XPath

Posted May 21, 2008 at 9:02 AM by Ben Nadel

Tags: ColdFusion

I ran into a really weird problem today. I was working on an event based XML parser (along the lines of the SAX parser, but way dumbbed down) and couldn't seem to get CDATA sections to be returned in my XPath. CDATA is an escape notation that ensure that a block of text is not parsed as if it were XML, but rather utilized as plain text. It is denoted using the following syntax:

<![CDATA[ ...your character data here... ]]>

CDATA text and standard node text values are supposed to be one and the same; they both result in node text and should not be distinguishable. And, if you try to CFDump out a ColdFusion XML document, you will see that that appears to be true. Let's create an XML document that has some CDATA text:

  • <!---
  • Create an XML document in which some of the text is
  • created with inline text and some is created with the
  • use of CDATA sections.
  • --->
  • <cfxml variable="xmlData">
  •  
  • <girl>
  • <name>
  • Sarah Vivenzio
  • </name>
  • <age>
  • 27
  • </age>
  • <description>
  • <![CDATA[
  • She is totally hot! I mean way hot! Probably
  • one of the more attractive people that I have
  • ever had the pleasure of meeting.
  • ]]>
  • </description>
  • </girl>
  •  
  • </cfxml>
  •  
  •  
  • <!--- Dump out the ColdFusion XML document. --->
  • <cfdump
  • var="#xmlData#"
  • label="XmlData With CDATA Section"
  • />

As you can see, the Name and Age nodes have standard, inline text while the Description node has CDATA text. When we CFDump out this ColdFusion XML document, we get the following:


 
 
 

 
ColdFusion XML Document Containing Both Inline Text And CDATA Text  
 
 
 

When you CFDump out a ColdFusion XML document that has CDATA text, there is no distinction - all node text appears in the node XmlText attributes. This is how it should be as CDATA is just a notation, not a distinct type of node.

Now, let's try to grab all the text nodes that are grandchildren of the root Girl node:

  • <!---
  • Now, search for all text nodes in the document that
  • are nested within children of the Girl node.
  • --->
  • <cfset arrTextNodes = XmlSearch(
  • xmlData,
  • "/girl/*/text()"
  • ) />
  •  
  • <!--- Dump out text node array. --->
  • <cfdump
  • var="#arrTextNodes#"
  • label="Text Nodes via XPath"
  • />

This should grab the text nodes for Name, Age, and Description; however, when we CFDump out our array of nodes, we get the following:


 
 
 

 
Text Nodes Returned By XPath And XmlSearch() In ColdFusion When CDATA Is Used  
 
 
 

Notice that the text values for Name and Age came through fine, but he CDATA text for the Description node was totally ignored. I am pretty sure this is a bug - everything that I have read has said that inline node text and CDATA node text should not be distinguished in any way.

As an experiment, I created a ColdFusion XML document that had a mixture of inline text and CDATA text under the same parent node:

  • <!---
  • This time, let's create an XML document that mixes
  • inline node text with CDATA node text.
  • --->
  • <cfxml variable="xmlData">
  •  
  • <girl>
  • <name>
  • Sarah Vivenzio
  • </name>
  • <age>
  • 27
  • </age>
  • <description>
  • She is insanely hot. I swear, you'd have to see it
  • <![CDATA[ to believe it, but you should just ]]>
  • take my word for it. Pretty pretty pretty good.
  • </description>
  • </girl>
  •  
  • </cfxml>
  •  
  • <!--- Output description text. --->
  • <cfset WriteOutput( xmlData.girl.description.XmlText ) />

Here, the girl Description node has intermingled text types. And, as documented, this creates a single text value when accessed directly via xmlData.girl.description.XmlText:

She is insanely hot. I swear, you'd have to see it to believe it, but you should just take my word for it. Pretty pretty pretty good.

However, when we use XPath to get the text node:

  • <!---
  • Now, search for all text nodes in the document that
  • are nested within children of the Girl node.
  • --->
  • <cfset arrTextNodes = XmlSearch(
  • xmlData,
  • "/girl/*/text()"
  • ) />
  •  
  • <!--- Dump out text node array. --->
  • <cfdump
  • var="#arrTextNodes#"
  • label="Text Nodes via XPath"
  • />

... we get the following CFDump output of our text nodes:


 
 
 

 
Text Nodes Returned By XPath And XmlSearch() When Inline Text And CDATA Is Used In Same Node  
 
 
 

Here, something really weird happens - we only get the text node data up to the opening of the CDATA section. The rest of the text value, include the text that comes after the close CDATA section, is completely ignored.

I could be wrong, but this seems like a very serious bug to me. This can create all sorts of complications for building data import solutions in which you get XML from any sort of third party. Has anyone else experienced this problem and is there a way to get around it?




Reader Comments

May 21, 2008 at 9:13 AM // reply »
1 Comments

Just a small typo in the following statement:

It is denoted using the following syntax: <!CDATA[ ...your character data here... ]]>

Should be: <![CDATA[ ...your character data here... ]]>

Notice the first [.


May 21, 2008 at 9:16 AM // reply »
11,246 Comments

@Gatzby,

Ooops, good catch. It has been fixed.


May 21, 2008 at 10:05 AM // reply »
14 Comments

Yes - it looks like there was a problem reported for Xalan 2.5.1 which is used by CF8. Some people say this was fixed in Xalan 2.7.1. I tried upgrading Xalan to 2.7.1 but at the moment I have error 500 NullPointerException everywhere.


May 21, 2008 at 10:07 AM // reply »
131 Comments

@Ben,

Is it possible that the CDATA block is being picked up but the <[ ... ]> brackets are not being stripped off by the text() function? In other words, in the XmlSearch() version, if you view page source, is the content there, but just not rendered to screen by the browser?


May 21, 2008 at 10:09 AM // reply »
12 Comments

If you do this
<cfset arrTextNodes = XmlSearch(xmlData,"/girl/*" ) />

you will get the description as well, I think when you add text() at the end it picks up only the text values where cdata is not really text...... You can get it via .XmlText or .XmlCdata whcih essentially return the same thing but not really sure why it doesnt return the description.


May 21, 2008 at 10:11 AM // reply »
11,246 Comments

@jfish,

Good thinking. I tried that, but it is not in the source either.


May 21, 2008 at 10:21 AM // reply »
14 Comments

Hmmm... seems like it is not fixed. If you replace xalan.jar, xml-apis.jar and add serializer.jar from xalan 2.7.1 CF will work fine but you issue is still there.


May 21, 2008 at 10:21 AM // reply »
11,246 Comments

@Anuj,

That gives me the element nodes under girls, but not quite the CDATA text, unless I access is directly as an XmlText attribute of one of the nodes.

I was working on some more dynamic stuff where I was specifically looking for a text node that was in an XML document created on the fly. I am "working around" the issue by doing something like this:

<cfif Find( "CDATA", NodeString )>

<!--- Fix for CDATA. --->
<cfset arrChildren[ 1 ].XmlText = xmlDoc.root.XmlText />

</cfif>

That seems to take care of the problem; I am lucky that I am such a controlled environment.


May 21, 2008 at 10:22 AM // reply »
11,246 Comments

@Radekg,

Thanks for looking into it.


Feb 12, 2012 at 10:18 PM // reply »
1 Comments

Hi Ben,
I just came across this same problem in the current version of CF.

Did you ever log a bug for it? I can't find it in the public tracker?

Also, I did some sample code - but while it picks up both lots of text outside of the CDATA portion - it omits the internals of CDATA, any ideas?

  • <!--- create XML --->
  • <cfxml variable="myXML">
  • <!--- construct the XML document --->
  • <sampleXML>
  • <sampleXMLNode>
  • This is some plain text 1.
  • <![CDATA[ This is Plain Text 2 ]]>
  • This is more plain text in the sampleXML node.
  • </sampleXMLNode>
  • </sampleXML>
  • </cfxml>
  •  
  • <cfset xmlNodes = XmlSearch(myXml, '/sampleXML/*/text()') >
  •  
  • <!--- Dump out all the nodes of type text. --->
  • <cfdump var="#xmlNodes#" label="1">
  •  
  • <cfif Find( "CDATA", myXML )>
  • <!--- Fix for CDATA. --->
  • <cfset xmlNodes[ 1 ].XmlValue =myXML.sampleXML.sampleXMLNode />
  • </cfif>
  •  
  • <!--- Dump out all the nodes of type text after treatment.. --->
  • <cfdump var="#xmlNodes#" label="2">


Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
May 25, 2013 at 10:01 PM
My Experience With AngularJS - The Super-heroic JavaScript MVW Framework
@Avi, Really glad to help! @Jaredwilli, I'm finding a this image hits home with a lot of people :) Hopefully we can all work through the rough patches together! @Prateek, AngularJS has error ... read »
May 25, 2013 at 9:53 PM
Nested Views, Routing, And Deep Linking With AngularJS
@Mrsean2k, I'm glad I could help! I haven't been able to keep up with the ui-router stuff. I keep saying that I'll carve out time, but I just haven't gotten to it :( ... read »
May 25, 2013 at 9:49 PM
What If All User Interface (UI) Data Came In Reports?
@Jonah, Thanks for the book recommendations. I am looking them up right now. I can see that Object Thinking is available for the Kindle App - sweet! Also, I just recently heard Martin Fowler on the ... read »
May 25, 2013 at 9:41 PM
HashKeyCopier - An AngularJS Utility Class For Merging Cached And Live Data
@Chris, I'm super excited to hear that my posts are helpful. I am also loving AngularJS; but, it definitely has some caveats and some odd behaviors and some things that just don't seem to "wor ... read »
May 25, 2013 at 9:36 PM
Ask Ben: Manually Enforcing Basic HTTP Authorization In ColdFusion
@Adam, @Jason, After reading these comments, I double-checked my latest implementation and I am happy to report that I am using listFirst() and listRest(). ... read »
May 25, 2013 at 9:31 PM
Using "//" And ".//" Expressions In XPath XML Search Directives In ColdFusion
@Daxesh, I am not sure I understand the question about the current node. If you already have a reference to the current node, why would you need to query for it? As for parent node, I believe that ... read »
May 25, 2013 at 10:08 AM
Using "//" And ".//" Expressions In XPath XML Search Directives In ColdFusion
@Ben, my question is that i want the current node with its tag and its parent node. i just want only that data. So, give me the solution for that. and remember solution is working on " xpath 1.0 ... read »
May 25, 2013 at 10:01 AM
Using "//" And ".//" Expressions In XPath XML Search Directives In ColdFusion
hey ben, i want get my current node tag and also want the root node tag withing. So, how can i fix it.. ! ... read »
InVision App - Prototyping Made Beautiful With Prototyping Tools