Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at CFUNITED 2010 (Landsdown, VA) with:

Ask Ben: Finding XML Nodes That Have Children With The Given Case-Insensitive Phrase

By Ben Nadel on

Okay, so how about this one. (BTW I love that I found this site and can ask all my stupid questions). I am bringing in an RSS feed in XML and parse it. Now I want to pull only the articles that pertain to some keyword. Like oh say COLDFUSION. .... What I want to do is pull only the articles with the search term in the title. I can do this of course by looping over the xml but is it possible with XPATH? I'm betting it is but I have just started into XPATH, XSLT, XSQL for Oracle. Does "IN" or "CONTAINS" work?

Yes, XPath does have a contains() method and is, in fact, the way we are going to find your RSS feed items (at least initially). First, though, let's build a test XML Feed structure:

  • <!--- Define the XML feed. --->
  • <cfxml variable="xmlFeed">
  •  
  • <items>
  • <item>
  • <title>I Love ColdFusion</title>
  • <description>ColdFusion is amazing!</description>
  • <link>http://www.bennadel.com</link>
  • </item>
  • <item>
  • <title>I Want To Swim In A Pudding Bath</title>
  • <description>Author talks about why it would be awesome to swim around in a bathtub full of pudding.</description>
  • <link>http://www.bennadel.com</link>
  • </item>
  • <item>
  • <title>I Think ColdFusion Knocked Up My Daughter</title>
  • <description>Author described a conspiracy theory in which he things his ColdFusion application server impregnated his daughter in an attempt to spawn a race of super humans with amazing back-end processing!</description>
  • <link>http://www.bennadel.com</link>
  • </item>
  • <item>
  • <title>Christina Cox Is A Hottie</title>
  • <description>Author talks about actress Christina Cox and what makes her such a hottie.</description>
  • <link>http://www.bennadel.com</link>
  • </item>
  • <item>
  • <title>COLDFusion Is So Hot!</title>
  • <description>Author describes what make ColdFusion such a hot technology.</description>
  • <link>http://www.bennadel.com</link>
  • </item>
  • </items>
  •  
  • </cfxml>

As you can see here, some of the Title tags contain "ColdFusion", some of them do not. Now, we don't want to find the Title tag, right? What we want to do is find the Item node that has the child node, Title, whose text value contains the phrase ColdFusion. To do this, we can leverage the power of XPath predicates (statements that must evaluate to true for a node to be returned in an XmlSearch() result set):

//item[ contains( title/text() , 'ColdFusion' ) ]

Here, the "//item" is telling us to get all the item nodes anywhere within the document. Then our conditional search predicate:

[ contains( title/text() , 'ColdFusion' ) ]

... requires that the given node being examined (item) must have a title child tag whose text() value contains the phrase "ColdFusion". Fairly straightforward, right. Let's put this into action:

  • <!---
  • Get all ITEM nodes that have a Title child whose text
  • value (text()) contains the text "ColdFusion".
  • --->
  • <cfset arrItemNodes = XmlSearch(
  • xmlFeed,
  • "//item[ contains( title/text() , 'ColdFusion' ) ]"
  • ) />
  •  
  • <!--- Output the node titles. --->
  • <cfloop
  • index="xmlItemNode"
  • array="#arrItemNodes#">
  •  
  • #xmlItemNode.Title.XmlText#<br />
  •  
  • </cfloop>

When we run this code, we get the following output:

I Love ColdFusion
I Think ColdFusion Knocked Up My Daughter

It sort of worked - it did find two correct items, but it missed this one:

COLDFusion Is So Hot!

The problem here is that XML and XPath, unlike ColdFusion itself, is very much case-sensitive. Where as in ColdFusion, "ColdFusion" is equal to "COLDFusion", XPath and XmlSearch() see these as two distinct values.

So, what can we do about this? Well, if you look at the library of XPath functions, you will see that it does have methods for converting values to upper or lower case:

  • lower-case()
  • upper-case()

This would be great, but the problem you will quickly find if you try to use them is that these methods have not been implemented as of ColdFusion 8's XPath / XmlSearch() engine. So, what can we do if we want to start performing case-insensitive searches? I don't think there's any one correct answer for this, so I'll just share the first thing that popped into my mind.

What we can do is create a lowercase version of the title text and store it back into the XML document in a way that 1) doesn't ruin the content for further use and 2) can be searched on using XPath and XmlSearch(). To do this, what I'm going to do is loop over the title tags and store the lowercase title as an attribute back into the title tag itself. Then, once that is done, I am going to perform the XPath search again using the title tag's "lcase" attribute rather than the XML Text value:

  • <!--- Gather all of the title nodes. --->
  • <cfset arrTitleNodes = XmlSearch(
  • xmlFeed,
  • "//item/title/"
  • ) />
  •  
  • <!---
  • Loop over each title and store a lowercase attribute of
  • its value that can be searched on in a case-insensitive
  • manner.
  • --->
  • <cfloop
  • index="xmlTitleNode"
  • array="#arrTitleNodes#">
  •  
  • <!--- Store lowercase text in to attribute. --->
  • <cfset xmlTitleNode.XmlAttributes[ "lcase" ] = LCase(
  • XmlFormat( xmlTitleNode.XmlText )
  • ) />
  •  
  • </cfloop>
  •  
  •  
  • <!---
  • Get all ITEM nodes that have a Title child whose LCASE
  • attribute contains the lowercase "coldfusion" value.
  • --->
  • <cfset arrItemNodes = XmlSearch(
  • xmlFeed,
  • "//item[ contains( title/@lcase, 'coldfusion' ) ]"
  • ) />
  •  
  • <!--- Output the node titles. --->
  • <cfloop
  • index="xmlItemNode"
  • array="#arrItemNodes#">
  •  
  • #xmlItemNode.Title.XmlText#<br />
  •  
  • </cfloop>

Notice that this time, we are searching for "coldfusion," not "ColdFusion." There's a little bit more overhead here, but now, when we run this code, we get the following output:

I Love ColdFusion
I Think ColdFusion Knocked Up My Daughter
COLDFusion Is So Hot!

With the aide of this lowercase attribute, we are successfully finding all case-versions of ColdFusion.

Of course, if we are going to loop over the Title tags, we might as well just perform the text search using ColdFusion and grab the appropriate nodes in the first pass. In the following code, as we loop over the Title tags, we are going to perform a case-insensitive ColdFusion text search. If the title has the right text, we are going to grab its parent node, the target Item node, and add it to our array of matching nodes:

  • <!--- Gather all of the title nodes. --->
  • <cfset arrTitleNodes = XmlSearch(
  • xmlFeed,
  • "//item/title/"
  • ) />
  •  
  • <!--- Create an array of item nodes. --->
  • <cfset arrItemNodes = [] />
  •  
  •  
  • <!---
  • Loop over each title and check to see if the text contains
  • the phrase ColdFusion - since we are checking in ColdFusion,
  • we don't have to worry about case.
  • --->
  • <cfloop
  • index="xmlTitleNode"
  • array="#arrTitleNodes#">
  •  
  • <!--- Check for phrase. --->
  • <cfif FindNoCase( "ColdFusion", xmlTitleNode.XmlText )>
  •  
  • <!--- Add parent node (Item) to array. --->
  • <cfset ArrayAppend(
  • arrItemNodes,
  • xmlTitleNode.XmlParent
  • ) />
  •  
  • </cfif>
  •  
  • </cfloop>
  •  
  •  
  • <!--- Output the node titles. --->
  • <cfloop
  • index="xmlItemNode"
  • array="#arrItemNodes#">
  •  
  • #xmlItemNode.Title.XmlText#<br />
  •  
  • </cfloop>

When we run the code this time, we get the following output:

I Love ColdFusion
I Think ColdFusion Knocked Up My Daughter
COLDFusion Is So Hot!

Again, we gather all of the appropriate matches for "ColdFusion" without having to do any additional XPath / XmlSearch() calls.

This would all be made so much easier if ColdFusion would simply support case-conversion methods in XPath, but for now, I hope that something here may have helped.




Reader Comments

There's another couple of options here Ben:

<cfset aNoCase1 = xmlSearch(xmlFeed, "//item[contains(translate(title/text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'coldfusion')]")>

It's a bit long-winded, but it works.

This next one conditionally works... it's OK for looking up the count of results, but as it transforms the XML, one has to be cautious with what one does with the results:

<cfset aNoCase2 = xmlSearch(lcase(xmlFeed), lcase("//item[contains(title/text(), 'COLDFusion')]"))>

Another note here is that the the nodes in the resultant array are not references to the original nodes, they're references to a separate XML doc which is created by the lcase(xmlFeed) operation. So one cannot update the nodes in the array and expect to see the updates in the original doc (like one usually would). So this one comes with some caveats, but if those are not a concern: it's an adequate approach.

--
Adam

Reply to this Comment

@Adam,

Very nice tip on translate(). I have never used that before. Yes, tedius, but it works. As far as the LCase() of the entire XML document, I actually considered going down that path. But, then, my concern was getting back to the original reference in the first document.

They just need to go ahead and support lower-case() :)

Reply to this Comment

Great post. I've also found that if the xml has a schema listed but is not valid the xml search fails even if the elements exist. If I deleted the schema ref (in the string xml prior to xmlParse) the search worked fine. Sure, you would think I should be using valid xml (against the schema) but the thing is I did not control the xml being returned from this web service and it wasn't. I did not see why xmlSearch should care. If the search works then return data dang you.

Reply to this Comment

@RyanTJ,

I believe validation is an optional part of the XML parsing. But, to be honest, I have never used any schema validation explicitly. I cannot offer any better advice on that matter.

Reply to this Comment

Great post, Ben! I think XPath and XSL are often underused, and I always dig your posts on how to get more mileage out of them.

Your examples ("I Think ColdFusion Knocked Up My Daughter"??) are as twisted and borderline-inappropriate as always. Rock on, Mr. Nadel!

Reply to this Comment

RyanTJ, could you pls clarify what you're saying here about xmlSearch() failing? Maybe paste some sample code?

Ben: could you please drop me an email offline (it's just about this lower-case / upper-case stuff, and CF's support for it).

Cheers.

--
Adam

Reply to this Comment

It's not about them implementing anything. XPath 1 just doesn't have those functions, the XPath engine they use (Xalan) is XPath 1 compliant.

They'd need to use an XPath 2 compatible library instead, and that means switching to Saxon because that's the only implementation in Java unfortunately.

People seem to think that Macrodobe actually implement this stuff. They don't. The Regex engine is Apache ORO, the XML stuff is Apache Xerces and Xalan.

Reply to this Comment

>It's not about them implementing anything.

Well, Elliott, it would be about them implementing Saxon instead Xalan, wouldn't it? So it's every thing about them implementing something, isn't it?

>People seem to think that Macrodobe actually implement this stuff.

Yes. They seem to think Adobe implements third-party libraries to get the work done. They also seem to think that perhaps other capabilities might present themselves if CF's chosen XML solution was a different one, possibly one in keeping with the times.

All of which is spot on.

You're the only one confused around here, mate.

--
Adam

Reply to this Comment

@Adam, @Elliott,

I don't want to start attacking ColdFusion or Adobe here. When I say stuff about wishing they would implement it, I'm just generically saying, "That would be a cool feature to have." I don't mean much more than that.

Reply to this Comment

Hi Ben
I don't think there's any way anything you said could've been construed as an attack against anything or one. Everything you said is spot on, valid, and I'm sure is something Adobe are giving at least some consideration to.

--
Adam

Reply to this Comment

"How I Became An XSLT Junkie" :)
I'm finding XSLT/XPATH etc etc so much easier to use than parsing and looping and handling errors in the xml than straight ColdFusion.
I told my dba to have Oracle return XML results to me. But now we are looking at XSQL. Meanwhile the die hard Java, C#, VB programmers are going nutso wacko. (Are were they always that way?)

Seriously, I have scrapped my RSS integrator for websites and replaced it with a much simpler but more powerful XSLT version.

Have you read the book "ColdFusion Brain Freeze"?

Reply to this Comment

@Don,

XSLT is definitely a powerful thing. While there is certainly a learning curve to XSLT, when you get it in your head, it can be a great way to transform XML.

I don't know that book, but I will look it up.

Reply to this Comment

Ben,

You have been an amazing resource for me as I grow my skills and this specific article is pretty close to what I'm looking for, but my question is what if you need a case insensitive search of a node?

Specifically you are expecting people to send xml to you a certain way but you can't trust they won't do contactINFO or contactinfo instead of contactInfo.
The attribute trick you showed here won't work in this case because it's the NODE itself that we can't find properly.

Any thoughts?

Erick

Reply to this Comment

@Erick,

You'd have to create a UDF or something that traverses the XML tree doing case-insensitive searching. Right now, there's really no way with XPath that I can see to do this.

Reply to this Comment

Hi Ben,

Thanks for that tip. After much testing I have found that using MX7 there are a couple of "bugs" in xmlSearch. Perhaps these have been fixed in later versions of CF.

I wanted to search for the <td> that had text containing the string "properties that match your search criteria". I wanted to find the actual number of properties (ie 6 in the example case below).

Given the following example:
<tr>
<td align="left">
There are 6 properties that match your search criteria.
</td>
</tr>
<tr>
<td align="left">
There are 6 properties that match your search criteria.
</td>
</tr>

XmlSearch(xmlObj,"//td[ contains( text(), 'properties that match' ) ]")

only returns the td that DOESNT have the tag embedded. I tried all sorts of combinatins but couldnt get it to work on the first <td>.

I then tried "ends-with" and CF doesnt know about the "ends-with" function, but does know about the "starts-with" function.

But, "//td[ starts-with( text(), 'There') ]" doesnt return either node.

In the end I just did this to get the tds that are potential targets:

"//td/b/parent::*[1]"

ie get the td nodes that contains a tag

then looped over the resulting array doing:

if (findNoCase("properties that match your search criteria",local.tds[i]['xmlText'])){
variables.totalProperties = local.tds[i]['xmlChildren'][1]['xmlText'];
}

Thanks again for your helpful blog.

Murray

Reply to this Comment

@Murray,

Sorry, I had to muck with your comment a bit. For some reason, my editor was totally not able to parse whatever you wrote (I was trying to fix the bolding). As such, I think the bold tag got stripped out.

I hope that future versions of ColdFusion can update the xmlSearch() functionality a bit; I've run into unsupported xpath errors a good number of times. When that happens, the only approach that I have found is what you did - a combination of XPath and good old fashion ColdFusion looping.

Reply to this Comment

Actually, for the benefit of anyone reading this who might want to make sense of the question post, the first <td> had a bold tag surrounding the numeral 6. So, the problem was that the xmlSearch wouldnt return that <td>, but would return the second one because the second one only had plain text, no embedded tags.

Cheers,
Murray

Reply to this Comment

Thanks Ben for the above code. I do have a questions. I am very new to ColdFusion and xml. I'm working on an employee phone search, when the input is the employees entire lastname it works great but if someone inputs - an "a" the search brings back every name that has an "a" in it. How can I narrow down the results to only bring back names starting with "a". Any suggestions?

Terry

Reply to this Comment

Hi,
If I get a result from a web service in xml format and convert it to xml document object and want to display in web browser.What are the ways are there--
It can be done using XMLTransform().

Is there any way if I want to put the xml data in JSON format and want to display it in browser?

Reply to this Comment

Post A Comment

?
You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.