Ask Ben: Selecting All Text Nodes Of An XML Document

Posted August 13, 2007 at 7:50 PM by Ben Nadel

Tags: ColdFusion, Ask Ben

I like all your xpath stuff you have been doing in ColdFusion. Can you help get the text values of my xml document?

In my presentation on introductory XPath in ColdFusion, I neglected to cover selecting text nodes, but rest assured, this is a fairly easy process. Rather than selecting the node name or the attribute name, all you need to do is select the "text()" value. To explore this, let's first create an XML document:

  • <!--- Build ColdFusion XML object. --->
  • <cfxml variable="xmlImages">
  • <cfoutput>
  •  
  • <images>
  • <image>
  • <url>http://celebslam.buzznet.com/wp-content/uploads/2007/08/carmen-electra-tiny-bikini-14.jpg</url>
  • <desc>Carmen Electra taking off her shirt, revealing a skimpy bikini.</desc>
  • </image>
  • <image>
  • <url>http://celebslam.buzznet.com/wp-content/uploads/2007/08/carmen-electra-tiny-bikini-21.jpg</url>
  • <desc>Carmen Electra walking around in bikini on the beach.</desc>
  • </image>
  • <image>
  • <url>http://celebslam.buzznet.com/wp-content/uploads/2007/08/carmen-electra-tiny-bikini-17.jpg</url>
  • <desc>Carmen Electra playing chicken in a bikini with some other girls in bikinis.</desc>
  • </image>
  • </images>
  •  
  • </cfoutput>
  • </cfxml>

Now, we have a valid XML document object in our ColdFusion variable, xmlImages. We can use XmlSearch() to select all of our text nodes:

  • <!---
  • Select all the text nodes using the text()
  • path construct.
  • --->
  • <cfset arrNodes = XmlSearch(
  • xmlImages,
  • "//text()"
  • ) />
  •  
  • <!--- Dump out matching text nodes. --->
  • <cfdump
  • var="#arrNodes#"
  • label="Text Nodes: //Text()"
  • />

Running the above code, the "//" XPath construct means "anywhere in the XML document." Therefore, this will select all text nodes, giving us the following CFDump output:


 
 
 

 
ColdFusion XPath To Select All Text Nodes In XML Document  
 
 
 

As you can see, we are getting a whole lot of text nodes, not necessarily just the ones we expected - the URL and DESC nodes. Every XML node contains a text node even if it also has nested element (tag) nodes. Therefore, we are getting text nodes for the IMAGES element node as well as for each of the IMAGE nodes.

Now, maybe this is what you want to do, but I am guessing it is not. For the benefit of learning, I am going to assume that we only want to get the text nodes for the URL and DESC element nodes. For that, all we need to do is modify the XPath to narrow down the possible nodes. At first, I was tempted to try and just get any text node that has a length:

  • <!--- Select all the text nodes that have a length. --->
  • <cfset arrNodes = XmlSearch(
  • xmlImages,
  • "//*[ text() != '' ]/text()"
  • ) />

This might look intuitive, but it actually ends up returning all the text nodes in the document. The problem is that our original XML document has a lot of white space. This white space gives the TEXT value of every element node some length. Therefore, we cannot select text nodes based purely on length (had we had no intertag white space, I believe that this would have worked).

To get more accurate results, we need to start narrowing down our path a little more effectively. One method for this would be to select the text of any element node based purely on the node name:

  • <!---
  • Select all the text nodes that are direct
  • descendant of element nodes that have the
  • name url or desc.
  • --->
  • <cfset arrNodes = XmlSearch(
  • xmlImages,
  • "//*[ name() = 'url' or name() = 'desc' ]/text()"
  • ) />
  •  
  • <!--- Dump out matching text nodes. --->
  • <cfdump
  • var="#arrNodes#"
  • label="Get Text Of Nodes URL or DESC"
  • />

Here, we are using an XPath predicate that requires our parent node's names to be either "url" or "desc". Then, once we have those nodes, we are selecting the text(). This gives us the following CFDump output:


 
 
 

 
ColdFusion XPath To Select All Text Nodes Of Specific Nodes (By Name)  
 
 
 

If we don't necessarily know the names of the nodes we want (although I am not sure what that would happen), we can also use a non-direct descendant relationship. In this next example, we are going to select the text nodes of all distant descendants of the image element nodes:

  • <!---
  • Select all the text nodes that are descendants
  • of an image node.
  • --->
  • <cfset arrNodes = XmlSearch(
  • xmlImages,
  • "/images/image/*/text()"
  • ) />
  •  
  •  
  • <!--- Dump out matching text nodes. --->
  • <cfdump
  • var="#arrNodes#"
  • label="Get Text Of Nodes IMAGE descendants"
  • />

This gives us the following CFDump output:


 
 
 

 
ColdFusion XPath To Select Text Nodes As Descendants Of Images  
 
 
 

I wish that ColdFusion's XmlSearch() supported a bit more of the XPath functions, but I hope that this helps point you in the right direction.




Reader Comments

Tim
Aug 24, 2007 at 5:28 PM // reply »
10 Comments

What would be your take on this:

If you have an XML Doc
<employees>
<employee>
<empid>123</empid>
</employee>
<employee>
<empid>456</empid>
</employee>
</employees>

I want a list of empid values. I can use XMLSearch to get an array of empids, and loop over it to listappend the values, but I dont want to have to loop over the nodes. I really wanted to use StructFindKey(myxml,"empid","all") but that function cant be run against xml objects.

Know of a low-intensive way to get a list of empids?


Aug 24, 2007 at 5:56 PM // reply »
11,246 Comments

@Tim,

Not sure if there is an ColdFusion supported XPath way to do this. My guess is that you are gonna need a UDF to do this. I will write one for you :)


Aug 25, 2007 at 2:10 PM // reply »
11,246 Comments

@Tim,

I got a little CFC together that I will post on Monday AM that you might find very useful.


Aug 27, 2007 at 7:12 AM // reply »
11,246 Comments

@Tim,

Take a look at this: http://www.bennadel.com/index.cfm?dax=blog:925.view


Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
May 23, 2013 at 9:52 PM
Preventing Links In Standalone iPhone Applications From Opening In Mobile Safari
@Muhmmadibn Did you figure out a solution to launching PDFs? I am running into the same issues myself. There is no way to close the PDF or go back once you launch it. Thanks in advance! ... read »
May 23, 2013 at 6:06 PM
The Girl Who Broke My Heart, And Made Me A Better Person
Good day,ladies and gentle men, my name is Dr AMADI the great spell caster in Africa, i have help so many people for different kind of problems,who say there is no solution to problems on earth, that ... read »
May 23, 2013 at 4:26 PM
ColdFusion QueryAppend( qOne, qTwo )
@Heather, Glad people are still getting value out of this! ... read »
May 23, 2013 at 3:49 PM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@WebManWalking, I meant the code at the bottom (not the video). I did try to experiment with an intermediary variable, like: value = users.id[ i ]; arrayContains( userIDs, value ); ... but t ... read »
May 23, 2013 at 11:06 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@Ben, Are you talking about As Number: YES As String: YES As Java: YES? If so, that's with 3 different ways of referencing the constant 1, not users.id[1]. Query object references(*) are what seem ... read »
May 23, 2013 at 9:55 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@Dan, According to the CF Admin, I'm running Java "1.6.0_45". As far as the DB column, in the database it's an INT. I'll see if I can dig into what CF sees it as. @WebManWalking, But h ... read »
May 23, 2013 at 9:49 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@Ben, I think the problem is that we're used to loose typing in ColdFusion, like JavaScript. If a value is a number but it's needed in an expression to be a string, noooo problem. I've encountered ... read »
May 23, 2013 at 9:47 AM
ColdFusion QueryAppend( qOne, qTwo )
You rock! Thank you, thank you, thank you!!! ... read »
InVision App - Prototyping Made Beautiful With Prototyping Tools