Using The XPath String() Function In XmlSearch() To Aggregate Node Text In ColdFusion

By Ben Nadel on May 9, 2011

Last week, I talked about aggregating XML node text in a mixed-node ColdFusion XML document. In the comments of that post, Kirill pointed out that you could accomplish pretty much the same thing directly in the original xmlSearch() invocation using the XPath string() function. After trying this out for myself, I have to say that it is brilliant! It completely simplifies the entire process down to a single statement; and, if you throw in the normalize-space() function, it gets even better.

To see this in action, take a look at the following demo. In the code, we are going to take an XML branch and get the string representation of it and its child nodes.

<!--- Define our XML document. --->
<cfxml variable="data">

	<data>
		<message>
			You are <em>wicked</em> sexy!
		</message>
	</data>

</cfxml>


<!---
	Get the aggregate text value of the nodes. The string() method
	returns the string value of the node; the normalize-space()
	method strips leading/trailing spaces and condences multiples
	spaces into a single space.

	NOTE: Unlike most xmlSearch() calls, this one does not return
	a "Node" array - it returns a string value.
--->
<cfset textValue = xmlSearch(
	data,
	"normalize-space( string( //message ) )"
	) />


<!--- Output the text. --->
<cfoutput>

	Message: #htmlEditFormat( textValue )#

</cfoutput>

As you can see, we are calling string() on the node-set, "//message". This returns the aggregate string representation of the node-set which we then trim and normalize. Doing this gives us the following page output:

Message: You are wicked sexy!

This is totally awesome! Notice that while most calls to xmlSearch() result in an array of nodes, this call results in a single string value. So, not only does this approach simplify the gathering of text, it simplifies the use of the resultant value.

A huge thanks to Kirill for pointing this out! I don't think that this task can get any easier at this point; it's about as simple and awesome as we can make it.

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/2190

Reader Comments

Kirill May 9, 2011 at 2:58 PM

17 Comments

Again, ColdFusion proves to be a very versatile and flexible tool by providing several possible types of output of XmlSearch function depending on the type of Xpath we use (this is described in the documentation, basically XmlSearch returns the same type of data as the function used in Xpath string, if any). I very, really very often use this technique to get data when I need to parse html pages (of course after converting them into valid Xml documents): even if there are no inner elements (child nodes) this function is very useful to get just the inner text of an html element. This probably is not as much a ColdFusion feature, as it is an Xpath feature, but ColdFusion power is in that it first lets us use all Xpath functionality natively, and second that it returns relevant type of data.

Ben Nadel May 9, 2011 at 3:30 PM

15,688 Comments

@Kirill,

Very good to know. I just always assumed that xmlSearch() returned a node set, no matter what. It's good know that it passes through the same "type" of data that the underlying XPath function will return.

And, HTML is where I first started using this - well, HTML presented as XML.

@SeanNHenderson Jan 11, 2012 at 1:29 PM

6 Comments

I had thought maybe this would help my CDATA issue, which I still cannot solve. (Using CF8)

<cfset myResultsArr=XmlSearch(myXmlDoc,"normalize-space(string(#form.xpath#))")>

The //someNode/text() portion I am trying to reach is within a CDATA thing-a-ma-bobber (technical term) and was hoping this might be the solution.

Ross Jul 25, 2012 at 2:48 PM

7 Comments

Hey.

Sorry to drag up an old subject but it's the first time I've had to muck about with this stuff.

I'm dealing with XML containing a 'body' node containing HTML links where the tags and containing text are descendant element children of that body node.

I used this technique to reveal the missing text but it doesn't include the HTML tags which surrounded it:

xmlSearch(Variables.ThunderMatchXML,'normalize-space(string(//ContextResult/Body))')

Can I alter the function to include the HTML tags?

Oh my chickens, this post is old!

Hit me up on Twitter if you want to discuss it further.