REStructFindValue() - Adding Regular Expression Searching To StructFindValue()
Posted July 9, 2009 at 10:08 AM by Ben Nadel
In my previous blog post on creating a unified interface for iterating over structs and arrays in ColdFusion, I mentioned that I had been talking to Marc Esher on Twitter. We had been talking about adding regular expression (RegEx) search capabilities to ColdFusion's StructFindValue() function. I had needed the unified iteration tag because the StructFindValue() method can recursively search over both structs and arrays. In order to not duplicate the logic for each type of object (both of which are essentially key-based collections), I built the following method, REStructFindValue(), using the Each.cfm ColdFusion custom tag.
I'm not completely happy with the way that the recursion was built in the following solution; specifically since the method takes an argument that is not meant to be user-provided. As the method recurses through the nested collections, it passes a fourth "hidden argument" with the subsequent method calls in order to keep track of the growing target path. This could have been solved by creating a sister function, but I didn't like that solution either.
Before we get into the code, however, let's take a look at the context. In the following test, I'm creating a structure with nested data collections. Then, I am going to search for values contained within the total collection based on a regular expression pattern rather than an exact value match.
- <!--- Create a test data structure. --->
- <cfset myData = {
- hotGirls = [
- {
- name = "Tricia",
- hair = "Brunette"
- },
- {
- name = "Kim",
- hair = "Blonde"
- }
- ],
- athleticGirls = [
- {
- name = "Tricia",
- hair = "Brunette"
- },
- {
- name = "Jen",
- hair = "Black"
- }
- ]
- } />
-
-
- <!--- Get all the values that are either brunette OR black. --->
- <cfset results = reStructFindValue(
- myData,
- "brunette|brown|black",
- "all"
- ) />
-
- <!--- Dump out the search results. --->
- <cfdump
- var="#results#"
- label="reStructFindValue( 'brunette|brown|black' )"
- />
As you can see, I am taking my nested structure and searching for values that match the following regular expression:
brunette|brown|black
This will find values that contain the phrases "brunette", "brown", or "black." And, just like ColdFusion's native StructFindValue() method, this returns an array of matches:
| | | | | |
| | ![]() | | ||
| | | |
As you can see, it found the three values in the nested structure that matched the above regular expression. I tried to keep the format of the results as close to that of the StructFindValue() result collection; however, I simplified my "Path" key to always use array notation and never dot notation. Seeing as both arrays and structs can use this (array notation), I felt that the uniformity of the generated path was a good idea.
Now that we see what the new method is doing, let's take a look at the code. Keep in mind that collection iteration used by this method is performed by my Each.cfm ColdFusion custom tag. This is not required, but it does simplify the method greatly.
REStructFindValue( Target, Pattern, Scope )
- <cffunction
- name="reStructFindValue"
- access="public"
- returntype="array"
- output="false"
- hint="I search for patterns within a given ">
-
- <!--- Define arguments. --->
- <cfargument
- name="target"
- type="any"
- required="true"
- hint="I am the target struct being searched."
- />
-
- <cfargument
- name="pattern"
- type="string"
- required="true"
- hint="I am the pattern being searched."
- />
-
- <cfargument
- name="scope"
- type="string"
- required="false"
- default="one"
- hint="I am the scope of the search: one or all."
- />
-
- <cfargument
- name="path"
- type="string"
- required="false"
- default=""
- hint="The path to the current target (for recursive calling). ** NOTE: This is used internally for recursion - this is NOT an expected argument to be passed in by the user."
- />
-
- <!--- Define the local scope. --->
- <cfset var local = {} />
-
- <!--- Create an array --->
- <cfset local.results = [] />
-
- <!---
- Loop over target.
- NOTE: This uses a ColdFusion custom tag that unifies
- the interface for looping over both structure and
- arrays.
- http://www.bennadel.com/go/each-iteration
- --->
- <cf_each
- item="local.item"
- collection="#arguments.target#">
-
- <!--- Create a variable to store the base path. --->
- <cfset local.path = arguments.path />
-
- <!--- Add the current key to the path. --->
- <cfset local.path &= "[ ""#local.item.key#"" ]" />
-
- <!--- Get a handle on the new target. --->
- <cfset local.target = local.item.value />
-
- <!---
- Check to see if this new target is a string (or
- if it is another complex object that we need to
- iterate over).
- --->
- <cfif isSimpleValue( local.target )>
-
- <!---
- Check it for the pattern match on the target
- value. For now, we are going to be using
- ColdFusion's Match() method which means a sub
- set of regular expression usage. Furthermore,
- we are going to use NoCASE for each of coding.
- --->
- <cfif arrayLen( reMatchNoCase( arguments.pattern, local.target ) )>
-
- <!---
- The regular expression patther was found at
- least once in the target value. This is a
- valid match. Add it to the results.
- --->
- <cfset local.result = {
- key = local.item.key,
- owner = arguments.target,
- path = local.path
- } />
-
- <!--- Add this result to the current results. --->
- <cfset arrayAppend( local.results, local.result ) />
-
- </cfif>
-
- <!---
- Make sure this complex nested target is one that
- we can actually iterate over (all others will be
- skipped).
- --->
- <cfelseif (
- isStruct( local.target ) ||
- isArray( local.target )
- )>
-
- <!---
- The nested taret is not a simple value. Therefore,
- we need to perform a depth-first, recusive search
- of it for our matching pattern.
- --->
- <cfset local.childResults = reStructFindValue(
- local.target,
- arguments.pattern,
- arguments.scope,
- local.path
- ) />
-
- <!---
- Add the results from our nested search to the
- current results collection.
- --->
- <cfloop
- index="local.childResult"
- array="#local.childResults#">
-
- <!--- Add this result to the current results. --->
- <cfset arrayAppend( local.results, local.childResult ) />
-
- </cfloop>
-
- </cfif>
-
-
- <!---
- At the end of a single iteration, let's check to see
- if we were only searching for one target. If we are,
- AND we found it, we can simply return the single
- element rather than continuing on with our recursion.
- --->
- <cfif (
- (arguments.scope eq "one") &&
- arrayLen( local.results )
- )>
-
- <!---
- We found at least one item - trim the results
- set in case the last iteration found more than
- one.
- --->
- <cfset local.trimmedResults = [ local.results[ 1 ] ] />
-
- <!--- Return the trimmed result set. --->
- <cfreturn local.trimmedResults />
-
- </cfif>
-
- </cf_each>
-
-
- <!--- Return the found results. --->
- <cfreturn local.results />
- </cffunction>
Notice that the UDF above takes four arguments. As I mentioned above, only the first three are meant to be provided by the user. The fourth argument, "Path," is provided by the method itself to keep track of nesting during recursive calls. The regular expression matching is performed by ColdFusion's REMatchNoCase() tag. This means that the regular expressions used in this UDF are subject to the limitations of the REMatchNoCase() method and cannot make use of some advanced pattern constructs.
To be honest, I've never actually used the StructFindValue() method, so I am not really sure what the best use cases are; that said, I hope that this UDF might come in handy to those that do use it often.
Reader Comments
Too funny. I just submitted two new functions to cflib.org: REStructFindValue() and REStructFindValueNoCase().
One thing to note, your REStructFindValue() implementation searches both arrays and structures. StructFindValue() will iterate through arrays, but will only return results from structures. Not a big deal, just something to be aware of if you are looking to use this in place of StructFindValue().
nice! Your mad coding over the past two days has inspired me to take a crack at that potentially useful "StructVisitor" implementation I was talking about, as well.
I love this kind of practice coding. Thanks Ben!
@Nathan,
Good times :) I hadn't used StructFindValue() before and to be honest, the explanation of the various struct "find" methods confused me a bit. I just tried to deduce what it was doing by running some tests and dumping out the results. So, more than likely (as you are saying) my functionality is not going to be as parallel with the native one.
@Nathan, @Marc,
Out of curiosity, what are the use cases for these kind of methods?
@Marc,
Yeah, this stuff is fun. I can't wait to see what you come up with.
@Ben - I have no idea. After seeing Marc's tweet the other day I was just thinking it would be fun to code :)
@Nathan,
Marc is just a source of inspiration :)
when you can't code for s**t, you gotta inspire other people to do it for you :-)
@Marc,
Ha ha ha :) Oh marc.
On a somewhat related note, a while back I blogged about structFind()--a UDF I wrote for filtering a structure to just specific keys matching a RegEx:
@Dan,
Oh man, you used REFind() in your struct search! Sometimes I feel so retarded :) I used REMatch(), which served no purpose (as the matches weren't be gathered), to see if the target value matched the given regular expression. I should have totally used REFind().
Thanks for the wake up call :)
Adam Presley took this problem and went the Java route. Very interesting, but a bit complicated as it dives into Java vectors:
http://blog.adampresley.com/2009/coldfusion-searching-structure-values-with-regular-expressions/
Yeah Ben, it came out fast, especially with LARGE structures, but delving into the CF Java types was certainly a challenge. Also can't pass an array to mine, as it simply only takes a Vector (Struct). Overall a fun excercise.
Thanks for the inspirational idea!
@Adam,
Struct or not, it was definitely fascinating.
Hi Ben,
A very learnsome post, as usual :)
I have just written the function using my iterator component, which takes an arbitrary collection, and added a maxdepth variable to constrain the search
http://code.google.com/p/dbseries/source/browse/trunk/cfc/RECollectionFindValues.cfm
And I have some nice thoughts on how to go on with my iterator project as well now, thanks!
@DeepDown,
I really like the idea of having an Iterator interface that can be used to iterate over just about anything. That's sort of where I was going with my "Each.cfm" custom tag, but your solution allows for much more extension. Very cool!
Well this has just gone a major way to solving a JSON problem with leading zeros on strings that look like numbers.
Using REStructFindValue() to find all the key/value pairs in a structure that have leading zeros and then loop through the resulting data using a function I found on the Adobe Forums site here: http://forums.adobe.com/message/2101252, I can force all leading zeros data in the structure to remain as leading zero data in the JSON produced by SerializeJSON. :-)
@Paul - You may also want to check out JSONUtil on RIAForge, http://jsonutil.riaforge.org/. It has a strict encoding option that will keep leading zeros on string data types.
@Nathan - Thanks that looks pretty cool and is a neater solution than what I came up with :-)




