Skip to main content
Ben Nadel at CF Summit West 2024 (Las Vegas) with: Austin Shelton
Ben Nadel at CF Summit West 2024 (Las Vegas) with: Austin Shelton

XmlSearch() Ignores CDATA Sections In ColdFusion XPath

By
Published in Comments (10)

I ran into a really weird problem today. I was working on an event based XML parser (along the lines of the SAX parser, but way dumbbed down) and couldn't seem to get CDATA sections to be returned in my XPath. CDATA is an escape notation that ensure that a block of text is not parsed as if it were XML, but rather utilized as plain text. It is denoted using the following syntax:

<![CDATA[ ...your character data here... ]]>

CDATA text and standard node text values are supposed to be one and the same; they both result in node text and should not be distinguishable. And, if you try to CFDump out a ColdFusion XML document, you will see that that appears to be true. Let's create an XML document that has some CDATA text:

<!---
	Create an XML document in which some of the text is
	created with inline text and some is created with the
	use of CDATA sections.
--->
<cfxml variable="xmlData">

	<girl>
		<name>
			Sarah Vivenzio
		</name>
		<age>
			27
		</age>
		<description>
			<![CDATA[
				She is totally hot! I mean way hot! Probably
				one of the more attractive people that I have
				ever had the pleasure of meeting.
			]]>
		</description>
	</girl>

</cfxml>


<!--- Dump out the ColdFusion XML document. --->
<cfdump
	var="#xmlData#"
	label="XmlData With CDATA Section"
	/>

As you can see, the Name and Age nodes have standard, inline text while the Description node has CDATA text. When we CFDump out this ColdFusion XML document, we get the following:

ColdFusion XML Document Containing Both Inline Text And CDATA Text

When you CFDump out a ColdFusion XML document that has CDATA text, there is no distinction - all node text appears in the node XmlText attributes. This is how it should be as CDATA is just a notation, not a distinct type of node.

Now, let's try to grab all the text nodes that are grandchildren of the root Girl node:

<!---
	Now, search for all text nodes in the document that
	are nested within children of the Girl node.
--->
<cfset arrTextNodes = XmlSearch(
	xmlData,
	"/girl/*/text()"
	) />

<!--- Dump out text node array. --->
<cfdump
	var="#arrTextNodes#"
	label="Text Nodes via XPath"
	/>

This should grab the text nodes for Name, Age, and Description; however, when we CFDump out our array of nodes, we get the following:

Text Nodes Returned By XPath And XmlSearch() In ColdFusion When CDATA Is Used

Notice that the text values for Name and Age came through fine, but he CDATA text for the Description node was totally ignored. I am pretty sure this is a bug - everything that I have read has said that inline node text and CDATA node text should not be distinguished in any way.

As an experiment, I created a ColdFusion XML document that had a mixture of inline text and CDATA text under the same parent node:

<!---
	This time, let's create an XML document that mixes
	inline node text with CDATA node text.
--->
<cfxml variable="xmlData">

	<girl>
		<name>
			Sarah Vivenzio
		</name>
		<age>
			27
		</age>
		<description>
			She is insanely hot. I swear, you'd have to see it
			<![CDATA[ to believe it, but you should just ]]>
			take my word for it. Pretty pretty pretty good.
		</description>
	</girl>

</cfxml>

<!--- Output description text. --->
<cfset WriteOutput( xmlData.girl.description.XmlText ) />

Here, the girl Description node has intermingled text types. And, as documented, this creates a single text value when accessed directly via xmlData.girl.description.XmlText:

She is insanely hot. I swear, you'd have to see it to believe it, but you should just take my word for it. Pretty pretty pretty good.

However, when we use XPath to get the text node:

<!---
	Now, search for all text nodes in the document that
	are nested within children of the Girl node.
--->
<cfset arrTextNodes = XmlSearch(
	xmlData,
	"/girl/*/text()"
	) />

<!--- Dump out text node array. --->
<cfdump
	var="#arrTextNodes#"
	label="Text Nodes via XPath"
	/>

... we get the following CFDump output of our text nodes:

Text Nodes Returned By XPath And XmlSearch() When Inline Text And CDATA Is Used In Same Node

Here, something really weird happens - we only get the text node data up to the opening of the CDATA section. The rest of the text value, include the text that comes after the close CDATA section, is completely ignored.

I could be wrong, but this seems like a very serious bug to me. This can create all sorts of complications for building data import solutions in which you get XML from any sort of third party. Has anyone else experienced this problem and is there a way to get around it?

Want to use code from this post? Check out the license.

Reader Comments

1 Comments

Just a small typo in the following statement:

It is denoted using the following syntax: <!CDATA[ ...your character data here... ]]>

Should be: <![CDATA[ ...your character data here... ]]>

Notice the first [.

14 Comments

Yes - it looks like there was a problem reported for Xalan 2.5.1 which is used by CF8. Some people say this was fixed in Xalan 2.7.1. I tried upgrading Xalan to 2.7.1 but at the moment I have error 500 NullPointerException everywhere.

140 Comments

@Ben,

Is it possible that the CDATA block is being picked up but the <[ ... ]> brackets are not being stripped off by the text() function? In other words, in the XmlSearch() version, if you view page source, is the content there, but just not rendered to screen by the browser?

12 Comments

If you do this
<cfset arrTextNodes = XmlSearch(xmlData,"/girl/*" ) />

you will get the description as well, I think when you add text() at the end it picks up only the text values where cdata is not really text...... You can get it via .XmlText or .XmlCdata whcih essentially return the same thing but not really sure why it doesnt return the description.

14 Comments

Hmmm... seems like it is not fixed. If you replace xalan.jar, xml-apis.jar and add serializer.jar from xalan 2.7.1 CF will work fine but you issue is still there.

15,811 Comments

@Anuj,

That gives me the element nodes under girls, but not quite the CDATA text, unless I access is directly as an XmlText attribute of one of the nodes.

I was working on some more dynamic stuff where I was specifically looking for a text node that was in an XML document created on the fly. I am "working around" the issue by doing something like this:

<cfif Find( "CDATA", NodeString )>

<!--- Fix for CDATA. --->
<cfset arrChildren[ 1 ].XmlText = xmlDoc.root.XmlText />

</cfif>

That seems to take care of the problem; I am lucky that I am such a controlled environment.

1 Comments

Hi Ben,
I just came across this same problem in the current version of CF.

Did you ever log a bug for it? I can't find it in the public tracker?

Also, I did some sample code - but while it picks up both lots of text outside of the CDATA portion - it omits the internals of CDATA, any ideas?

<!--- create XML --->
<cfxml variable="myXML">
	<!--- construct the XML document --->
	<sampleXML>
		<sampleXMLNode>
			This is some plain text 1.
			<![CDATA[ This is Plain Text 2 ]]>
			This is more plain text in the sampleXML node.
		</sampleXMLNode>
	</sampleXML>
</cfxml>
 
<cfset xmlNodes = XmlSearch(myXml, '/sampleXML/*/text()') >
 
<!--- Dump out all the nodes of type text. --->
<cfdump var="#xmlNodes#" label="1">
 
<cfif Find( "CDATA", myXML )>
	<!--- Fix for CDATA. --->
	<cfset xmlNodes[ 1 ].XmlValue =myXML.sampleXML.sampleXMLNode />
</cfif>
 
<!--- Dump out all the nodes of type text after treatment.. --->
<cfdump var="#xmlNodes#" label="2">
I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel