I am in the middle of helping someone use ColdFusion's CFHttp tag to submit form data. In the process of doing this, we needed to be able grab the form fields from a target URL, parse them out, and then submit our own form values. As a result, it became important to be able to parse individual HTML tags into ColdFusion structures so that we could read in the attributes and then use them in our ColdFusion CFHttpParam tags.
When it comes to parsing an HTML tag in ColdFusion, there are two things we are looking for:
- The tag name
- The tag attributes
Both of these things follow textual patterns that can be parsed using regular expressions. The tag name is quite easy - it's simply the first "word" that comes after the open bracket of the tag. The attributes, on the other hand, are a little bit more challenging. While it is XHTML compliant to use quoted attributes, looking at people's source code, you will notice that not everyone uses quotes. And, to complicate things even more, people will even use a mixture of quoted and non-quoted attributes. Therefore, we need to be able to handle both situations.
The following ColdFusion user defined function (UDF), ParseHTMLTag(), uses some nifty regular expressions (Steve, I did my best :)) to allow for both attributes types. It parses the given HTML tag and returns a structure with the keys:
HTML - The raw HTML that was passed in (echoed back).
Name - The tag name (ex. Input, H1, Textarea).
Attributes - A structure containing the name-value attribute pairs. Each name-value pair is stored in the structure by its attribute name.
Here is the ColdFusion UDF, ParseHTMLTag():
<cffunction name="ParseHTMLTag" access="public" returntype="struct" output="false" hint="Parses the given HTML tag into a ColdFusion struct."> <!--- Define arguments. ---> <cfargument name="HTML" type="string" required="true" hint="The raw HTML for the tag." /> <!--- Define the local scope. ---> <cfset var LOCAL = StructNew() /> <!--- Create a structure for the taget tag data. ---> <cfset LOCAL.Tag = StructNew() /> <!--- Store the raw HTML into the tag. ---> <cfset LOCAL.Tag.HTML = ARGUMENTS.HTML /> <!--- Set a default name. ---> <cfset LOCAL.Tag.Name = "" /> <!--- Create an structure for the attributes. Each attribute will be stored by it's name. ---> <cfset LOCAL.Tag.Attributes = StructNew() /> <!--- Create a pattern to find the tag name. While it might seem overkill to create a pattern just to find the name, I find it easier than dealing with token / list delimiters. ---> <cfset LOCAL.NamePattern = CreateObject( "java", "java.util.regex.Pattern" ).Compile( "^<(\w+)" ) /> <!--- Get the matcher for this pattern. ---> <cfset LOCAL.NameMatcher = LOCAL.NamePattern.Matcher( ARGUMENTS.HTML ) /> <!--- Check to see if we found the tag. We know there can only be ONE tag name, so using an IF statement rather than a conditional loop will help save us processing time. ---> <cfif LOCAL.NameMatcher.Find()> <!--- Store the tag name in all upper case. ---> <cfset LOCAL.Tag.Name = UCase( LOCAL.NameMatcher.Group( 1 ) ) /> </cfif> <!--- Now that we have a tag name, let's find the attributes of the tag. Remember, attributes may or may not have quotes around their values. Also, some attributes (while not XHTML compliant) might not even have a value associated with it (ex. disabled, readonly). ---> <cfset LOCAL.AttributePattern = CreateObject( "java", "java.util.regex.Pattern" ).Compile( "\s+(\w+)(?:\s*=\s*(""[^""]*""|[^\s>]*))?" ) /> <!--- Get the matcher for the attribute pattern. ---> <cfset LOCAL.AttributeMatcher = LOCAL.AttributePattern.Matcher( ARGUMENTS.HTML ) /> <!--- Keep looping over the attributes while we have more to match. ---> <cfloop condition="LOCAL.AttributeMatcher.Find()"> <!--- Grab the attribute name. ---> <cfset LOCAL.Name = LOCAL.AttributeMatcher.Group( 1 ) /> <!--- Create an entry for the attribute in our attributes structure. By default, just set it the empty string. For attributes that do not have a name, we are just going to have to store this empty string. ---> <cfset LOCAL.Tag.Attributes[ LOCAL.Name ] = "" /> <!--- Get the attribute value. Save this into a scoped variable because this might return a NULL value (if the group in our name-value pattern failed to match). ---> <cfset LOCAL.Value = LOCAL.AttributeMatcher.Group( 2 ) /> <!--- Check to see if we still have the value. If the group failed to match then the above would have returned NULL and destroyed our variable. ---> <cfif StructKeyExists( LOCAL, "Value" )> <!--- We found the attribute. Now, just remove any leading or trailing quotes. This way, our values will be consistent if the tag used quoted or non-quoted attributes. ---> <cfset LOCAL.Value = LOCAL.Value.ReplaceAll( "^""|""$", "" ) /> <!--- Store the value into the attribute entry back into our attributes structure (overwriting the default empty string). ---> <cfset LOCAL.Tag.Attributes[ LOCAL.Name ] = LOCAL.Value /> </cfif> </cfloop> <!--- Return the tag. ---> <cfreturn LOCAL.Tag /> </cffunction>
To test it, I am going to build an HTML Input tag that has line returns, quoted attributes, non-quoted attributes, and attributes that have no value:
<!--- Store our HTML tag. ---> <cfsavecontent variable="strHTML"> <input type="input" name=name value=Hello disabled readonly = true maxlength="35" class="inputfield" /> </cfsavecontent> <!--- Parse the INPUT tag and dump out the resultant structure. This should demonstrate that the parsing can handle white space as well as a mix of quotes and non-quoted attributes. ---> <cfdump var="#ParseHTMLTag( Trim( strHTML ) )#" label="ParseHTMLTag() For Input" />
Running the above, we get the following CFDump output:
As you can see, the attributes that had no value (disabled) were stored with an empty string. Since all struct members need a value, this is the best default we can give it. Additionally, since ColdFusion treats NULL values as empty strings, this keeps in line with that idea (as a non-existent attribute value can be thought of as null).
This ColdFusion UDF is just part of a smaller form parsing algorithm that I am working on. Hopefully, I will post that up when it is done.
Want to use code from this post? Check out the license.