Ask Ben: Finding XML Nodes That Have Children With The Given Case-Insensitive Phrase

Posted February 11, 2009 at 10:01 AM by Ben Nadel

Tags: ColdFusion, Ask Ben

Okay, so how about this one. (BTW I love that I found this site and can ask all my stupid questions). I am bringing in an RSS feed in XML and parse it. Now I want to pull only the articles that pertain to some keyword. Like oh say COLDFUSION. .... What I want to do is pull only the articles with the search term in the title. I can do this of course by looping over the xml but is it possible with XPATH? I'm betting it is but I have just started into XPATH, XSLT, XSQL for Oracle. Does "IN" or "CONTAINS" work?

Yes, XPath does have a contains() method and is, in fact, the way we are going to find your RSS feed items (at least initially). First, though, let's build a test XML Feed structure:

  • <!--- Define the XML feed. --->
  • <cfxml variable="xmlFeed">
  •  
  • <items>
  • <item>
  • <title>I Love ColdFusion</title>
  • <description>ColdFusion is amazing!</description>
  • <link>http://www.bennadel.com</link>
  • </item>
  • <item>
  • <title>I Want To Swim In A Pudding Bath</title>
  • <description>Author talks about why it would be awesome to swim around in a bathtub full of pudding.</description>
  • <link>http://www.bennadel.com</link>
  • </item>
  • <item>
  • <title>I Think ColdFusion Knocked Up My Daughter</title>
  • <description>Author described a conspiracy theory in which he things his ColdFusion application server impregnated his daughter in an attempt to spawn a race of super humans with amazing back-end processing!</description>
  • <link>http://www.bennadel.com</link>
  • </item>
  • <item>
  • <title>Christina Cox Is A Hottie</title>
  • <description>Author talks about actress Christina Cox and what makes her such a hottie.</description>
  • <link>http://www.bennadel.com</link>
  • </item>
  • <item>
  • <title>COLDFusion Is So Hot!</title>
  • <description>Author describes what make ColdFusion such a hot technology.</description>
  • <link>http://www.bennadel.com</link>
  • </item>
  • </items>
  •  
  • </cfxml>

As you can see here, some of the Title tags contain "ColdFusion", some of them do not. Now, we don't want to find the Title tag, right? What we want to do is find the Item node that has the child node, Title, whose text value contains the phrase ColdFusion. To do this, we can leverage the power of XPath predicates (statements that must evaluate to true for a node to be returned in an XmlSearch() result set):

//item[ contains( title/text() , 'ColdFusion' ) ]

Here, the "//item" is telling us to get all the item nodes anywhere within the document. Then our conditional search predicate:

[ contains( title/text() , 'ColdFusion' ) ]

... requires that the given node being examined (item) must have a title child tag whose text() value contains the phrase "ColdFusion". Fairly straightforward, right. Let's put this into action:

  • <!---
  • Get all ITEM nodes that have a Title child whose text
  • value (text()) contains the text "ColdFusion".
  • --->
  • <cfset arrItemNodes = XmlSearch(
  • xmlFeed,
  • "//item[ contains( title/text() , 'ColdFusion' ) ]"
  • ) />
  •  
  • <!--- Output the node titles. --->
  • <cfloop
  • index="xmlItemNode"
  • array="#arrItemNodes#">
  •  
  • #xmlItemNode.Title.XmlText#<br />
  •  
  • </cfloop>

When we run this code, we get the following output:

I Love ColdFusion
I Think ColdFusion Knocked Up My Daughter

It sort of worked - it did find two correct items, but it missed this one:

COLDFusion Is So Hot!

The problem here is that XML and XPath, unlike ColdFusion itself, is very much case-sensitive. Where as in ColdFusion, "ColdFusion" is equal to "COLDFusion", XPath and XmlSearch() see these as two distinct values.

So, what can we do about this? Well, if you look at the library of XPath functions, you will see that it does have methods for converting values to upper or lower case:

  • lower-case()
  • upper-case()

This would be great, but the problem you will quickly find if you try to use them is that these methods have not been implemented as of ColdFusion 8's XPath / XmlSearch() engine. So, what can we do if we want to start performing case-insensitive searches? I don't think there's any one correct answer for this, so I'll just share the first thing that popped into my mind.

What we can do is create a lowercase version of the title text and store it back into the XML document in a way that 1) doesn't ruin the content for further use and 2) can be searched on using XPath and XmlSearch(). To do this, what I'm going to do is loop over the title tags and store the lowercase title as an attribute back into the title tag itself. Then, once that is done, I am going to perform the XPath search again using the title tag's "lcase" attribute rather than the XML Text value:

  • <!--- Gather all of the title nodes. --->
  • <cfset arrTitleNodes = XmlSearch(
  • xmlFeed,
  • "//item/title/"
  • ) />
  •  
  • <!---
  • Loop over each title and store a lowercase attribute of
  • its value that can be searched on in a case-insensitive
  • manner.
  • --->
  • <cfloop
  • index="xmlTitleNode"
  • array="#arrTitleNodes#">
  •  
  • <!--- Store lowercase text in to attribute. --->
  • <cfset xmlTitleNode.XmlAttributes[ "lcase" ] = LCase(
  • XmlFormat( xmlTitleNode.XmlText )
  • ) />
  •  
  • </cfloop>
  •  
  •  
  • <!---
  • Get all ITEM nodes that have a Title child whose LCASE
  • attribute contains the lowercase "coldfusion" value.
  • --->
  • <cfset arrItemNodes = XmlSearch(
  • xmlFeed,
  • "//item[ contains( title/@lcase, 'coldfusion' ) ]"
  • ) />
  •  
  • <!--- Output the node titles. --->
  • <cfloop
  • index="xmlItemNode"
  • array="#arrItemNodes#">
  •  
  • #xmlItemNode.Title.XmlText#<br />
  •  
  • </cfloop>

Notice that this time, we are searching for "coldfusion," not "ColdFusion." There's a little bit more overhead here, but now, when we run this code, we get the following output:

I Love ColdFusion
I Think ColdFusion Knocked Up My Daughter
COLDFusion Is So Hot!

With the aide of this lowercase attribute, we are successfully finding all case-versions of ColdFusion.

Of course, if we are going to loop over the Title tags, we might as well just perform the text search using ColdFusion and grab the appropriate nodes in the first pass. In the following code, as we loop over the Title tags, we are going to perform a case-insensitive ColdFusion text search. If the title has the right text, we are going to grab its parent node, the target Item node, and add it to our array of matching nodes:

  • <!--- Gather all of the title nodes. --->
  • <cfset arrTitleNodes = XmlSearch(
  • xmlFeed,
  • "//item/title/"
  • ) />
  •  
  • <!--- Create an array of item nodes. --->
  • <cfset arrItemNodes = [] />
  •  
  •  
  • <!---
  • Loop over each title and check to see if the text contains
  • the phrase ColdFusion - since we are checking in ColdFusion,
  • we don't have to worry about case.
  • --->
  • <cfloop
  • index="xmlTitleNode"
  • array="#arrTitleNodes#">
  •  
  • <!--- Check for phrase. --->
  • <cfif FindNoCase( "ColdFusion", xmlTitleNode.XmlText )>
  •  
  • <!--- Add parent node (Item) to array. --->
  • <cfset ArrayAppend(
  • arrItemNodes,
  • xmlTitleNode.XmlParent
  • ) />
  •  
  • </cfif>
  •  
  • </cfloop>
  •  
  •  
  • <!--- Output the node titles. --->
  • <cfloop
  • index="xmlItemNode"
  • array="#arrItemNodes#">
  •  
  • #xmlItemNode.Title.XmlText#<br />
  •  
  • </cfloop>

When we run the code this time, we get the following output:

I Love ColdFusion
I Think ColdFusion Knocked Up My Daughter
COLDFusion Is So Hot!

Again, we gather all of the appropriate matches for "ColdFusion" without having to do any additional XPath / XmlSearch() calls.

This would all be made so much easier if ColdFusion would simply support case-conversion methods in XPath, but for now, I hope that something here may have helped.




Reader Comments

Feb 11, 2009 at 12:50 PM // reply »
67 Comments

There's another couple of options here Ben:

<cfset aNoCase1 = xmlSearch(xmlFeed, "//item[contains(translate(title/text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'coldfusion')]")>

It's a bit long-winded, but it works.

This next one conditionally works... it's OK for looking up the count of results, but as it transforms the XML, one has to be cautious with what one does with the results:

<cfset aNoCase2 = xmlSearch(lcase(xmlFeed), lcase("//item[contains(title/text(), 'COLDFusion')]"))>

Another note here is that the the nodes in the resultant array are not references to the original nodes, they're references to a separate XML doc which is created by the lcase(xmlFeed) operation. So one cannot update the nodes in the array and expect to see the updates in the original doc (like one usually would). So this one comes with some caveats, but if those are not a concern: it's an adequate approach.

--
Adam


Feb 11, 2009 at 1:48 PM // reply »
10,640 Comments

@Adam,

Very nice tip on translate(). I have never used that before. Yes, tedius, but it works. As far as the LCase() of the entire XML document, I actually considered going down that path. But, then, my concern was getting back to the original reference in the first document.

They just need to go ahead and support lower-case() :)


Feb 11, 2009 at 5:49 PM // reply »
4 Comments

Great post. I've also found that if the xml has a schema listed but is not valid the xml search fails even if the elements exist. If I deleted the schema ref (in the string xml prior to xmlParse) the search worked fine. Sure, you would think I should be using valid xml (against the schema) but the thing is I did not control the xml being returned from this web service and it wasn't. I did not see why xmlSearch should care. If the search works then return data dang you.


Feb 11, 2009 at 5:51 PM // reply »
10,640 Comments

@RyanTJ,

I believe validation is an optional part of the XML parsing. But, to be honest, I have never used any schema validation explicitly. I cannot offer any better advice on that matter.


Feb 11, 2009 at 5:51 PM // reply »
10,640 Comments

@RyanTJ,

.... all to say, yeah, if it can parse the XML, why does it care :(


Feb 12, 2009 at 9:58 AM // reply »
29 Comments

Great post, Ben! I think XPath and XSL are often underused, and I always dig your posts on how to get more mileage out of them.

Your examples ("I Think ColdFusion Knocked Up My Daughter"??) are as twisted and borderline-inappropriate as always. Rock on, Mr. Nadel!


Feb 12, 2009 at 1:10 PM // reply »
67 Comments

RyanTJ, could you pls clarify what you're saying here about xmlSearch() failing? Maybe paste some sample code?

Ben: could you please drop me an email offline (it's just about this lower-case / upper-case stuff, and CF's support for it).

Cheers.

--
Adam


Feb 12, 2009 at 1:18 PM // reply »
132 Comments

It's not about them implementing anything. XPath 1 just doesn't have those functions, the XPath engine they use (Xalan) is XPath 1 compliant.

They'd need to use an XPath 2 compatible library instead, and that means switching to Saxon because that's the only implementation in Java unfortunately.

People seem to think that Macrodobe actually implement this stuff. They don't. The Regex engine is Apache ORO, the XML stuff is Apache Xerces and Xalan.


Feb 12, 2009 at 5:25 PM // reply »
67 Comments

>It's not about them implementing anything.

Well, Elliott, it would be about them implementing Saxon instead Xalan, wouldn't it? So it's every thing about them implementing something, isn't it?

>People seem to think that Macrodobe actually implement this stuff.

Yes. They seem to think Adobe implements third-party libraries to get the work done. They also seem to think that perhaps other capabilities might present themselves if CF's chosen XML solution was a different one, possibly one in keeping with the times.

All of which is spot on.

You're the only one confused around here, mate.

--
Adam


Feb 12, 2009 at 5:50 PM // reply »
10,640 Comments

@Adam, @Elliott,

I don't want to start attacking ColdFusion or Adobe here. When I say stuff about wishing they would implement it, I'm just generically saying, "That would be a cool feature to have." I don't mean much more than that.


Feb 13, 2009 at 3:49 AM // reply »
67 Comments

Hi Ben
I don't think there's any way anything you said could've been construed as an attack against anything or one. Everything you said is spot on, valid, and I'm sure is something Adobe are giving at least some consideration to.

--
Adam


Don
Feb 24, 2009 at 12:39 PM // reply »
57 Comments

"How I Became An XSLT Junkie" :)
I'm finding XSLT/XPATH etc etc so much easier to use than parsing and looping and handling errors in the xml than straight ColdFusion.
I told my dba to have Oracle return XML results to me. But now we are looking at XSQL. Meanwhile the die hard Java, C#, VB programmers are going nutso wacko. (Are were they always that way?)

Seriously, I have scrapped my RSS integrator for websites and replaced it with a much simpler but more powerful XSLT version.

Have you read the book "ColdFusion Brain Freeze"?


Feb 24, 2009 at 3:28 PM // reply »
10,640 Comments

@Don,

XSLT is definitely a powerful thing. While there is certainly a learning curve to XSLT, when you get it in your head, it can be a great way to transform XML.

I don't know that book, but I will look it up.


Aug 13, 2009 at 12:07 PM // reply »
1 Comments

Ben,

You have been an amazing resource for me as I grow my skills and this specific article is pretty close to what I'm looking for, but my question is what if you need a case insensitive search of a node?

Specifically you are expecting people to send xml to you a certain way but you can't trust they won't do contactINFO or contactinfo instead of contactInfo.
The attribute trick you showed here won't work in this case because it's the NODE itself that we can't find properly.

Any thoughts?

Erick


Aug 18, 2009 at 6:26 PM // reply »
10,640 Comments

@Erick,

You'd have to create a UDF or something that traverses the XML tree doing case-insensitive searching. Right now, there's really no way with XPath that I can see to do this.


Aug 29, 2010 at 12:28 AM // reply »
31 Comments

Hi Ben,

Thanks for that tip. After much testing I have found that using MX7 there are a couple of "bugs" in xmlSearch. Perhaps these have been fixed in later versions of CF.

I wanted to search for the <td> that had text containing the string "properties that match your search criteria". I wanted to find the actual number of properties (ie 6 in the example case below).

Given the following example:
<tr>
<td align="left">
There are 6 properties that match your search criteria.
</td>
</tr>
<tr>
<td align="left">
There are 6 properties that match your search criteria.
</td>
</tr>

XmlSearch(xmlObj,"//td[ contains( text(), 'properties that match' ) ]")

only returns the td that DOESNT have the tag embedded. I tried all sorts of combinatins but couldnt get it to work on the first <td>.

I then tried "ends-with" and CF doesnt know about the "ends-with" function, but does know about the "starts-with" function.

But, "//td[ starts-with( text(), 'There') ]" doesnt return either node.

In the end I just did this to get the tds that are potential targets:

"//td/b/parent::*[1]"

ie get the td nodes that contains a tag

then looped over the resulting array doing:

if (findNoCase("properties that match your search criteria",local.tds[i]['xmlText'])){
variables.totalProperties = local.tds[i]['xmlChildren'][1]['xmlText'];
}

Thanks again for your helpful blog.

Murray


Aug 29, 2010 at 12:29 AM // reply »
31 Comments

Woops. Sorry that my tags embedded in the source above cause the bolding nightmare!


Sep 5, 2010 at 1:45 PM // reply »
10,640 Comments

@Murray,

Sorry, I had to muck with your comment a bit. For some reason, my editor was totally not able to parse whatever you wrote (I was trying to fix the bolding). As such, I think the bold tag got stripped out.

I hope that future versions of ColdFusion can update the xmlSearch() functionality a bit; I've run into unsupported xpath errors a good number of times. When that happens, the only approach that I have found is what you did - a combination of XPath and good old fashion ColdFusion looping.


Sep 5, 2010 at 4:35 PM // reply »
31 Comments

Thanks Ben. Much appreciated.


Sep 5, 2010 at 4:40 PM // reply »
31 Comments

Actually, for the benefit of anyone reading this who might want to make sense of the question post, the first <td> had a bold tag surrounding the numeral 6. So, the problem was that the xmlSearch wouldnt return that <td>, but would return the second one because the second one only had plain text, no embedded tags.

Cheers,
Murray


Sep 5, 2010 at 5:05 PM // reply »
10,640 Comments

@Murray,

Good point on the clarification.


Jan 27, 2012 at 1:26 PM // reply »
1 Comments

Thanks Ben for the above code. I do have a questions. I am very new to ColdFusion and xml. I'm working on an employee phone search, when the input is the employees entire lastname it works great but if someone inputs - an "a" the search brings back every name that has an "a" in it. How can I narrow down the results to only bring back names starting with "a". Any suggestions?

Terry



Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
InVision App - Prototyping Made Beautiful With Prototyping Tools Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
Feb 10, 2012 at 7:21 PM
jQuery AJAX Strips Script Tags And Inserts Them After Parent-Most Elements
Update! Instead of $(eval(options.insertAfter)).after(data['insertData']); I now use: var ajaxNode = document.createElement('span'); var parent = $(eval(options.insertAfter))[0].parentNode; ... read »
Feb 10, 2012 at 6:18 PM
jQuery AJAX Strips Script Tags And Inserts Them After Parent-Most Elements
encountered this same, what I consider, jQuery bug last week. I'm building a site in which I load some content via AJAX. This content contains Linkedin share button placeholders which Linkedin API ne ... read »
Feb 10, 2012 at 11:30 AM
Cross-Origin Resource Sharing (CORS) AJAX Requests Between jQuery And Node.js
After you understand the concepts here, this is an awesome cheatsheet for enabling CORS in just about anything http://enable-cors.org/ ... read »
JM
Feb 10, 2012 at 9:10 AM
My Safari Browser SQLite Database Hello World Example
@Amy, Here is a very good tutorial on how to use JOIN: http://www.sqltutorial.org/sqljoin-innerjoin.aspx ... read »
Feb 10, 2012 at 4:42 AM
Building A Twitter-Inspired RESTful API Architecture In ColdFusion
This is great, very useful Ben. I spotted a small typo in the api.cgm listing: <cfthrow type="Unauthroized" /> Cheers Stefan ... read »
Feb 9, 2012 at 10:35 PM
CFDirectory Filtering Uses Pipe Character For Multiple Filters (Thanks Steve Withington)
I was wondering if there would be a filter you could apply so that you got everything but what you included in the filter. As in show me all docs that are not a .pdf. ... read »
Feb 9, 2012 at 10:29 PM
Learning ColdFusion 9: Application-Specific Data Sources
@Ben, No offence, but if people were really wanting advanced features they would be using a platform like ASP.NET MVC. CFML is so structurally compromised as a tag-based scripting language that ... read »
Feb 9, 2012 at 10:03 PM
Subversion - Cleanup Failed To Process The Following Paths
@Leviaguirre, do you still have problems with this? ... read »