I just had a quick regex. I need to replace the first and last xml nodes in an xml string. Basically was doing a quick and dirty way to change the root node of an xml document instead of creating a new root node and copying recursively all the data from one xml object to the new xml object. I can do a simple find "xxx" and replace it with "yyy" but xxx may be part of a child node somewhere like xxxBen or something. so i really want to pinpoint the start and end tags of the string, and also deal with the declaration, and leave all attributes in the root node (if they exist) intact. When you get a minute, do you mind helping?
I know you want to see the string-parsing method, rather than the new root node creation, but I will show you both methodologies as I think that they are both nice to know about. The major difference with the two is that the latter (new root node creation) requires you to parse the XML string into an actual ColdFusion XML document whereas the former only requires that you parse the string with a regular expression.
So first, let's start out be creating our XML data string and storing it in a ColdFusion content buffer:
<!--- Create an XML string that has a root node that gets repeated within the body of the XML as well. ---> <cfsavecontent variable="strXmlData"> <list id="my-to-do-list"> <item> List Item A </item> <item> List Item B </item> <item> <list> <item> Sub Item A </item> <item> Sub Item B </item> </list> </item> <item> List Item D </item> </list> </cfsavecontent>
You'll notice that the element node, "list," appears several times in the document - the root node and a nested node. I have done this to make sure that neither of these techniques replaces the nested node incorrectly.
OK, so now let's replace the root node, "list," with the new root node, "masterlist." With our first approach, we are going to use Regular Expressions to replace the first open tag and last close tag of the xml document string:
<!--- Replace the first and last nodes of the document with a new node name. We are going to do this in two step. Start with the first tag. ---> <cfset strXmlData = REReplace( strXmlData, "<\w+", "<masterlist", "one" ) /> <!--- Now, we want to replace the LAST close node in the document. Because we want to replace the last close node, we want the expression to end in the $ so that it is the end of the document. ---> <cfset strXmlData = REReplace( strXmlData, "(</)\w+([^>]*>\s*)$", "\1masterlist\2", "one" ) /> <!--- Parse and output new XML. ---> <cfdump var="#XmlParse( Trim( strXmlData ) )#" label="New XML Document" />
To keep the node attributes intact in our first REReplace(), we are only replacing the open bracket and node name of the first element (ie. <list becomes <masterlist); in doing so, we are only changing the node name and nothing else. Then, in our second replace, we replace the last close node of the document. Here, we actually have to replace the entire node as we need to use the $ to signify the end of the string data (in the regular expression). However, by using captured groups in our regular expression, we can replace everything other than the node name without having to know much about it.
When we replace the node and CFDump out the resulting XML document, we get the following:
As you can see, the root node, "list," has been replaced with, "masterlist," and the XML attributes have been kept intact.
Ok, now that you see how to do this with regular expressions, let's take a look at actually changing the structure of an existing XML document. Well, sort of - before we have an actual XML document, we are going to wrap the existing XML string in a our new root node, "masterlist." Then, we're going to parse it into an XML document and transfer the original child nodes and XML attribute data into the new root node. Once this is done, we're simply going to delete the old root node.
<!--- Add the new XML root node around the document. ---> <cfsavecontent variable="strXmlData"> <masterlist> #strXmlData# </masterlist> </cfsavecontent> <!--- Now, parse the xml string into an XML document and transfer the child nodes to the master list root node. ---> <cfset xmlData = XmlParse( Trim( strXmlData ) ) /> <!--- Add all of the original children to the new root of the XML document. This creates a *copy* of the original child nodes, NOT a copy-by-reference!! You will lose any references you had to the original nodes. NOTE: This uses the undocumented AddAll() method. If you might want to wrap this up in a UDF, ArrayAppendAll(). ---> <cfset xmlData.XmlRoot.XmlChildren.AddAll( xmlData.masterlist.list.XmlChildren ) /> <!--- Copy any XML attributes. ---> <cfset StructAppend( xmlData.XmlRoot.XmlAttributes, xmlData.XmlRoot.XmlChildren[ 1 ].XmlAttributes ) /> <!--- Delete first child to get rid of old root node. ---> <cfset ArrayDeleteAt( xmlData.XmlRoot.XmlChildren, 1 ) />
There's a few points to take away from the above code. For starters, we are using the undocumented AddAll() method to append an entire array to another array. If you are uncomfortable doing this, I would recommend just making a UDF called ArrayAppendAll() and wrapping the .AddAll() method call in that (in case the feature ever becomes unavailable in future versions of ColdFusion). Second, when we transfer the XML children to the new root node, these nodes get transferred by value, not by reference. This means that if you have any existing variable references to the original child nodes, those references will not point to the transported XML children.
That said, this method results in the exact same XML document as the regular expression replace method. The first is going to be more efficient as it's only string parsing. The latter has string parsing (into XML) and XML document manipulation - a lot more going on. But, I thought it could be beneficial to see both techniques in order to make the most informed decision.
Want to use code from this post? Check out the license.