Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at cf.Objective() 2010 (Minneapolis, MN) with:

Finding The XPath Of A Given XML Node In A ColdFusion XML Document

By Ben Nadel on
Tags: ColdFusion

The other day, Brian left a comment on my blog that somewhat intrigued me. He wanted to take the results of an xmlSearch() request and find the paths (XPath) of the node returned in the given node array. I am not sure what kind of use this reverse-engineering would have; but, it seemed like a fun little problem to try and solve.

The idea behind my approach was simply to start at the given node and step up the XML document tree, building a reverse-path, until I reached the root of the document. At each level, I would find the position of the target node within the collection of its named siblings; that is, I am using pseudo-collections rather than the raw child-node collections. Using named-collections is typically slower, in my experience, but it's much more interesting to look at in the output. In any case, converting from named collections to "child::*" type axes would not be much of a shift.

Before we look at how to calculate the XPath, let's take a look at an example:

  • <!--- Build the girls XML data tree. --->
  • <cfxml variable="girlData">
  •  
  • <girls>
  • <!---
  • Some meta data to show that named-nodes will work with
  • mixed-node collections.
  • --->
  • <moderator>Ben Nadel</moderator>
  • <desc>This is a collection of girls with qualities.</desc>
  •  
  • <!--- The girls aggregation. --->
  • <girl>
  • <name>Kate</name>
  • <qualities>
  • <sweet qualify="always" />
  • <sexy qualify="always" />
  • </qualities>
  • </girl>
  • <girl>
  • <name>Joanna</name>
  • <qualities>
  • <sweet qualify="occassionally" />
  • <sultry qualify="onPhone" />
  • <stern qualify="inPerson" />
  • </qualities>
  • </girl>
  • </girls>
  •  
  • </cfxml>
  •  
  •  
  • <!--- Get all of the girls that are sulty. --->
  • <cfset sultryNodes = xmlSearch(
  • girlData,
  • "//sultry[ @qualify != 'never' ]"
  • ) />
  •  
  • <!---
  • Find the path to the girl that contains the given sultry quality.
  • For this demo, we will assume we know the parent-child
  • relationship of the girl node to the qualities node (ie. we have
  • to go up TWO levelts to get the GIRL node).
  • --->
  • <cfset girlPath = xmlGetNodePath(
  • sultryNodes[ 1 ].xmlParent.xmlParent
  • ) />
  •  
  •  
  • <!--- Output the path and target node. --->
  • <cfoutput>
  •  
  • Path: #girlPath#<br />
  • <br />
  •  
  • <!---
  • Now, let's requery the XML tree looking for the girl defined
  • by the calculated path.
  • --->
  • <cfdump
  • var="#xmlSearch( girlData, girlPath )#"
  • label="Sultry Girl"
  • />
  •  
  • </cfoutput>

Here, we have an XML document that contains some girl nodes which contain some quality nodes. To test the reverse engineering, we are finding a particular quality (sultry) and then getting the XPath of the girl node that contains said quality (sultry). Once we have the girl path, we output it and then use it to perform a subsequent XPath search. Running the above code gives us the following, reverse-engineered XPath:

Path: /girls[1]/girl[2]

As you can see, the resultant XPath uses named collections with positional predicates. Using this XPath to then re-query the XML documents gives us the following CFDump output:

 
 
 
 
 
 
Reverse Engineering An XPath Query Based On A Given XML Node In ColdFusion. 
 
 
 

As you can see, from the original sultry node, we were able to reverse engineer an XPath value that gave us the fully qualified location of the appropriate girl ancestor.

Now that you've seen the use-case, let's take a look at the ColdFusion user defined function (UDF) that actually performs the reverse engineering:

  • <cffunction
  • name="xmlGetNodePath"
  • access="public"
  • returntype="string"
  • output="false"
  • hint="I take a given XML node and return it's full XML path.">
  •  
  • <!--- Define arguments. --->
  • <cfargument
  • name="node"
  • type="any"
  • required="true"
  • hint="I am the XML node who's location is being reverse engineered."
  • />
  •  
  • <!--- Define the local scope. --->
  • <cfset var local = {} />
  •  
  • <!--- Start out with an empty path. --->
  • <cfset local.fullPath = "" />
  •  
  • <!---
  • Create a marker so that we can easily identify the node as we
  • examine the sibling collections to find its position (this
  • will be added and removed as necessary).
  • --->
  • <cfset local.marker = "udf:xmlGetNodePath" />
  •  
  • <!--- Get our starting node. --->
  • <cfset local.node = arguments.node />
  •  
  • <!---
  • Keep looping while until we need to break (there are is no
  • parent or the parent does not have a name).
  • --->
  • <cfloop condition="true">
  •  
  • <!---
  • Check for the special case - the current node reference
  • IS the root node. In that case, we cannot traverse any
  • higher up the document tree.
  • --->
  • <cfif (
  • !structKeyExists( local.node, "xmlParent" ) ||
  • !structKeyExists( local.node.xmlParent, "xmlName" )
  • )>
  •  
  • <!---
  • Break out of the loop - we have found the full path
  • to the original node. While the "document" node may
  • technically have a name, it cannot be used in an
  • XPath query.
  • --->
  • <cfbreak />
  •  
  • </cfif>
  •  
  • <!---
  • Add the marker to the current node. We need to find it's
  • position within the sibling node-set and will need a way
  • to distinguish it.
  • --->
  • <cfset local.node.xmlAttributes[ local.marker ] = true />
  •  
  • <!--- Gather all the sibling nodes. --->
  • <cfset local.siblings = xmlSearch(
  • local.node.xmlParent,
  • ("./" & local.node.xmlName)
  • ) />
  •  
  • <!--- Loop over the siblings to find the node. --->
  • <cfloop
  • index="local.siblingIndex"
  • from="1"
  • to="#arrayLen( local.siblings )#"
  • step="1">
  •  
  • <!--- Check to see if this is the given node. --->
  • <cfif structKeyExists(
  • local.siblings[ local.siblingIndex ].xmlAttributes,
  • local.marker
  • )>
  •  
  • <!---
  • This is our node - let's get it's path including
  • it's child index.
  • --->
  • <cfset local.fullPath = (
  • "/" &
  • local.node.xmlName &
  • "[" & local.siblingIndex & "]" &
  • local.fullPath
  • ) />
  •  
  • <!---
  • Break out of this loop to re-enter the upward
  • traversal loop - we still need to move toward
  • the root node.
  • --->
  • <cfbreak />
  •  
  • </cfif>
  •  
  • </cfloop>
  •  
  • <!--- Remove the node marker. --->
  • <cfset structDelete( local.node.xmlAttributes, local.marker ) />
  •  
  • <!--- Move up the tree. --->
  • <cfset local.node = local.node.xmlParent />
  •  
  • </cfloop>
  •  
  • <!--- Return the full path. --->
  • <cfreturn local.fullPath />
  • </cffunction>

As a final note, I should mention that this only works with element nodes. If you had passed in a Text node, the traversal algorithm would not know what to do. I am sure you could alter the DOM traversal to take node-type into account; but for this exploration, it was not a caveat that I have even considered until after I was actually done coding.




Reader Comments

Not quite the same thing, but if you have the XML file locally, or have it on the same domain (due to same domain JavaScript restrictions), you can use the Spry Data Set Explorer to load an XML document and then navigate within the structure of the document to see what the XPath would be for the selected node. An online example:
http://labs.adobe.com/technologies/spry/samples/data_region/DataSetExplorer.html

The selector is the second parameter of the Spry.Data.XMLDataSet constructor that you see when you've selected an entry in the schema item list in the middle of the page. It doesn't give you the selector to the individual nodes as you're doing in this post, but it can get you pretty close if you're trying to come up with the appropriate xpath.

The Spry Data Set Explorer is within the Spry Prerelease package:
http://www.adobe.com/cfusion/entitlement/index.cfm?e=labs_spry

located at: /samples/data_region/DataSetExplorer.html

FWIW: Since most XML documents that I work with are dynamically generated, getting the exact xpath for a particular node isn't all that useful as it's location within the document may change request to request, but this is an interesting way to go about it. Thanks for sharing.

Reply to this Comment

@Danilo,

Spry seems very cool; I never got into it, but I know Ray Camden used to swear by it before jQuery hit the scene.

I am not sure what the best use-case for this would be. Perhaps, I can imagine a situation in which you are posting data to a 3rd party service and you also need to present it with a way to gather the information in that data? Not sure. I'd be curious to hear what people might use this for.

Reply to this Comment

Brian here. So, imagine if you will, database mapping. If you have multiple databases, multiple tables, and multiple fields - which sorta does happen in reality, and you're looking to pull particular database/table/field addresses for those fields that have particular values back to the calling app, and your datasource is a generated XML datastream, XMLSearch's results just aren't very helpful.

Case in point:
Using extended fields within SQL to define field-specific attributes. Say, for example, "dataSource". Several of the tables use the same fieldnames (perhaps, "updatedBy", "i", or even "dynamicField1").

Being able to identify the entire stream down to the fieldName and it's dataSource is important to track validity of some external data sources that feed into this stream of data sources. And then we can apply the smackdown on who's wrong (or out of date) :)

Thanks for the respond, by the way, Ben.

Reply to this Comment

@Brian,

I think your database acrobatics are a bit more advanced than mine. Sounds like some very interesting stuff. If nothing else, thank you for a fun blog idea :)

Reply to this Comment

Hi Ben
I got a large xml file stock.xml which contains more than 10,000 products stock data. But its structure is simple: for each <product> element, there are only 2 sub-elements: productID and Quantity On Hand.

What I want is when given a productID, what is the most effective way to get the quantity of that product. (We need process the stock.xml file every 10 mins)

What I currently do is use XMLSearch() and XPath, the XPath is: //product[stockCode='stockCode']/onHandQty

I read your article about how to load large XML and I think it's really good. So can we improve this XPath?

Thanks in advance

Reply to this Comment

Hi Ben,

I found an interesting scenario where XmlSearch does not respect the xpath you provide it. What I mean is that I believe my xpath is correct, based solely on some documentation that a vendor has provided, but the array the function returns ends up being empty though I expect it to be populated.

In my case, I'm making requests to a web service via .NET and handing the XmlNode which is returned in the response back to a ColdFusion module which consumes my C# class. When I simply cfdump the XmlNode, it appears that there isn't a root XML node. If, however, I convert the node to a string in ColdFusion by calling .toString(), XmlParse that, then cfdump that, a root node magically appears and my xpath starts working. The interesting bit is that my xpath remains constant. Before converting the node to a string, the resulting array is empty, and after converting it to a string, the resulting array is populated.

Have you ever run into this scenario before? If so, do you recall what causes this issue? I realize that I'd get a better answer by posting a code snippet, or giving you a URL to page where you can reproduce the issue, but I'm refraining from doing so for a couple reasons: 1) this was an issue I discovered for work and I'm not sure how my employer would feel with me sharing the code and 2) well, I've already resolved the issue. I'm simply trying to justify the behavior for my sanity's sake.

FWIW, I wasn't able to get the xpath in either ColdFusion or .NET working until I converted the doc. I know that an XmlDocument is an XmlNode but I'm unsure if the contrapositive is true, or what that means to how ColdFusion interprets the object.

Thanks!

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.