REStructFindValue() - Adding Regular Expression Searching To StructFindValue()

Posted July 9, 2009 at 10:08 AM

Tags: ColdFusion

In my previous blog post on creating a unified interface for iterating over structs and arrays in ColdFusion, I mentioned that I had been talking to Marc Esher on Twitter. We had been talking about adding regular expression (RegEx) search capabilities to ColdFusion's StructFindValue() function. I had needed the unified iteration tag because the StructFindValue() method can recursively search over both structs and arrays. In order to not duplicate the logic for each type of object (both of which are essentially key-based collections), I built the following method, REStructFindValue(), using the Each.cfm ColdFusion custom tag.

I'm not completely happy with the way that the recursion was built in the following solution; specifically since the method takes an argument that is not meant to be user-provided. As the method recurses through the nested collections, it passes a fourth "hidden argument" with the subsequent method calls in order to keep track of the growing target path. This could have been solved by creating a sister function, but I didn't like that solution either.

Before we get into the code, however, let's take a look at the context. In the following test, I'm creating a structure with nested data collections. Then, I am going to search for values contained within the total collection based on a regular expression pattern rather than an exact value match.

  • <!--- Create a test data structure. --->
  • <cfset myData = {
  • hotGirls = [
  • {
  • name = "Tricia",
  • hair = "Brunette"
  • },
  • {
  • name = "Kim",
  • hair = "Blonde"
  • }
  • ],
  • athleticGirls = [
  • {
  • name = "Tricia",
  • hair = "Brunette"
  • },
  • {
  • name = "Jen",
  • hair = "Black"
  • }
  • ]
  • } />
  •  
  •  
  • <!--- Get all the values that are either brunette OR black. --->
  • <cfset results = reStructFindValue(
  • myData,
  • "brunette|brown|black",
  • "all"
  • ) />
  •  
  • <!--- Dump out the search results. --->
  • <cfdump
  • var="#results#"
  • label="reStructFindValue( 'brunette|brown|black' )"
  • />

As you can see, I am taking my nested structure and searching for values that match the following regular expression:

brunette|brown|black

This will find values that contain the phrases "brunette", "brown", or "black." And, just like ColdFusion's native StructFindValue() method, this returns an array of matches:

 
 
 
 
 
 
REStructFindValue() - Regular Expression Searching With StructFindValue(). 
 
 
 

As you can see, it found the three values in the nested structure that matched the above regular expression. I tried to keep the format of the results as close to that of the StructFindValue() result collection; however, I simplified my "Path" key to always use array notation and never dot notation. Seeing as both arrays and structs can use this (array notation), I felt that the uniformity of the generated path was a good idea.

Now that we see what the new method is doing, let's take a look at the code. Keep in mind that collection iteration used by this method is performed by my Each.cfm ColdFusion custom tag. This is not required, but it does simplify the method greatly.

REStructFindValue( Target, Pattern, Scope )

  • <cffunction
  • name="reStructFindValue"
  • access="public"
  • returntype="array"
  • output="false"
  • hint="I search for patterns within a given ">
  •  
  • <!--- Define arguments. --->
  • <cfargument
  • name="target"
  • type="any"
  • required="true"
  • hint="I am the target struct being searched."
  • />
  •  
  • <cfargument
  • name="pattern"
  • type="string"
  • required="true"
  • hint="I am the pattern being searched."
  • />
  •  
  • <cfargument
  • name="scope"
  • type="string"
  • required="false"
  • default="one"
  • hint="I am the scope of the search: one or all."
  • />
  •  
  • <cfargument
  • name="path"
  • type="string"
  • required="false"
  • default=""
  • hint="The path to the current target (for recursive calling). ** NOTE: This is used internally for recursion - this is NOT an expected argument to be passed in by the user."
  • />
  •  
  • <!--- Define the local scope. --->
  • <cfset var local = {} />
  •  
  • <!--- Create an array --->
  • <cfset local.results = [] />
  •  
  • <!---
  • Loop over target.
  • NOTE: This uses a ColdFusion custom tag that unifies
  • the interface for looping over both structure and
  • arrays.
  • http://www.bennadel.com/go/each-iteration
  • --->
  • <cf_each
  • item="local.item"
  • collection="#arguments.target#">
  •  
  • <!--- Create a variable to store the base path. --->
  • <cfset local.path = arguments.path />
  •  
  • <!--- Add the current key to the path. --->
  • <cfset local.path &= "[ ""#local.item.key#"" ]" />
  •  
  • <!--- Get a handle on the new target. --->
  • <cfset local.target = local.item.value />
  •  
  • <!---
  • Check to see if this new target is a string (or
  • if it is another complex object that we need to
  • iterate over).
  • --->
  • <cfif isSimpleValue( local.target )>
  •  
  • <!---
  • Check it for the pattern match on the target
  • value. For now, we are going to be using
  • ColdFusion's Match() method which means a sub
  • set of regular expression usage. Furthermore,
  • we are going to use NoCASE for each of coding.
  • --->
  • <cfif arrayLen( reMatchNoCase( arguments.pattern, local.target ) )>
  •  
  • <!---
  • The regular expression patther was found at
  • least once in the target value. This is a
  • valid match. Add it to the results.
  • --->
  • <cfset local.result = {
  • key = local.item.key,
  • owner = arguments.target,
  • path = local.path
  • } />
  •  
  • <!--- Add this result to the current results. --->
  • <cfset arrayAppend( local.results, local.result ) />
  •  
  • </cfif>
  •  
  • <!---
  • Make sure this complex nested target is one that
  • we can actually iterate over (all others will be
  • skipped).
  • --->
  • <cfelseif (
  • isStruct( local.target ) ||
  • isArray( local.target )
  • )>
  •  
  • <!---
  • The nested taret is not a simple value. Therefore,
  • we need to perform a depth-first, recusive search
  • of it for our matching pattern.
  • --->
  • <cfset local.childResults = reStructFindValue(
  • local.target,
  • arguments.pattern,
  • arguments.scope,
  • local.path
  • ) />
  •  
  • <!---
  • Add the results from our nested search to the
  • current results collection.
  • --->
  • <cfloop
  • index="local.childResult"
  • array="#local.childResults#">
  •  
  • <!--- Add this result to the current results. --->
  • <cfset arrayAppend( local.results, local.childResult ) />
  •  
  • </cfloop>
  •  
  • </cfif>
  •  
  •  
  • <!---
  • At the end of a single iteration, let's check to see
  • if we were only searching for one target. If we are,
  • AND we found it, we can simply return the single
  • element rather than continuing on with our recursion.
  • --->
  • <cfif (
  • (arguments.scope eq "one") &&
  • arrayLen( local.results )
  • )>
  •  
  • <!---
  • We found at least one item - trim the results
  • set in case the last iteration found more than
  • one.
  • --->
  • <cfset local.trimmedResults = [ local.results[ 1 ] ] />
  •  
  • <!--- Return the trimmed result set. --->
  • <cfreturn local.trimmedResults />
  •  
  • </cfif>
  •  
  • </cf_each>
  •  
  •  
  • <!--- Return the found results. --->
  • <cfreturn local.results />
  • </cffunction>

Notice that the UDF above takes four arguments. As I mentioned above, only the first three are meant to be provided by the user. The fourth argument, "Path," is provided by the method itself to keep track of nesting during recursive calls. The regular expression matching is performed by ColdFusion's REMatchNoCase() tag. This means that the regular expressions used in this UDF are subject to the limitations of the REMatchNoCase() method and cannot make use of some advanced pattern constructs.

To be honest, I've never actually used the StructFindValue() method, so I am not really sure what the best use cases are; that said, I hope that this UDF might come in handy to those that do use it often.




Reader Comments

Jul 9, 2009 at 12:17 PM // reply »
30 Comments

Too funny. I just submitted two new functions to cflib.org: REStructFindValue() and REStructFindValueNoCase().

One thing to note, your REStructFindValue() implementation searches both arrays and structures. StructFindValue() will iterate through arrays, but will only return results from structures. Not a big deal, just something to be aware of if you are looking to use this in place of StructFindValue().


Jul 9, 2009 at 12:18 PM // reply »
32 Comments

nice! Your mad coding over the past two days has inspired me to take a crack at that potentially useful "StructVisitor" implementation I was talking about, as well.

I love this kind of practice coding. Thanks Ben!


Jul 9, 2009 at 1:31 PM // reply »
8,824 Comments

@Nathan,

Good times :) I hadn't used StructFindValue() before and to be honest, the explanation of the various struct "find" methods confused me a bit. I just tried to deduce what it was doing by running some tests and dumping out the results. So, more than likely (as you are saying) my functionality is not going to be as parallel with the native one.

@Nathan, @Marc,

Out of curiosity, what are the use cases for these kind of methods?

@Marc,

Yeah, this stuff is fun. I can't wait to see what you come up with.


Jul 9, 2009 at 1:41 PM // reply »
30 Comments

@Ben - I have no idea. After seeing Marc's tweet the other day I was just thinking it would be fun to code :)


Jul 9, 2009 at 1:45 PM // reply »
8,824 Comments

@Nathan,

Marc is just a source of inspiration :)


Jul 9, 2009 at 1:59 PM // reply »
32 Comments

when you can't code for s**t, you gotta inspire other people to do it for you :-)


Jul 9, 2009 at 2:10 PM // reply »
8,824 Comments

@Marc,

Ha ha ha :) Oh marc.


Jul 9, 2009 at 3:07 PM // reply »
127 Comments

On a somewhat related note, a while back I blogged about structFind()--a UDF I wrote for filtering a structure to just specific keys matching a RegEx:

http://blog.pengoworks.com/index.cfm/2009/6/17/structFilter-UDF--Filtering-a-structure-based-upon-a-regular-expression


Jul 9, 2009 at 4:30 PM // reply »
8,824 Comments

@Dan,

Oh man, you used REFind() in your struct search! Sometimes I feel so retarded :) I used REMatch(), which served no purpose (as the matches weren't be gathered), to see if the target value matched the given regular expression. I should have totally used REFind().

Thanks for the wake up call :)


Jul 12, 2009 at 5:15 PM // reply »
8,824 Comments

Adam Presley took this problem and went the Java route. Very interesting, but a bit complicated as it dives into Java vectors:

http://blog.adampresley.com/2009/coldfusion-searching-structure-values-with-regular-expressions/


Jul 12, 2009 at 10:45 PM // reply »
25 Comments

Yeah Ben, it came out fast, especially with LARGE structures, but delving into the CF Java types was certainly a challenge. Also can't pass an array to mine, as it simply only takes a Vector (Struct). Overall a fun excercise.

Thanks for the inspirational idea!


Jul 13, 2009 at 6:31 PM // reply »
8,824 Comments

@Adam,

Struct or not, it was definitely fascinating.


Jul 14, 2009 at 7:29 PM // reply »
5 Comments

Hi Ben,

A very learnsome post, as usual :)

I have just written the function using my iterator component, which takes an arbitrary collection, and added a maxdepth variable to constrain the search

http://code.google.com/p/dbseries/source/browse/trunk/cfc/RECollectionFindValues.cfm

And I have some nice thoughts on how to go on with my iterator project as well now, thanks!


Jul 18, 2009 at 2:00 PM // reply »
8,824 Comments

@DeepDown,

I really like the idea of having an Iterator interface that can be used to iterate over just about anything. That's sort of where I was going with my "Each.cfm" custom tag, but your solution allows for much more extension. Very cool!


Aug 19, 2009 at 7:12 PM // reply »
2 Comments

Well this has just gone a major way to solving a JSON problem with leading zeros on strings that look like numbers.

Using REStructFindValue() to find all the key/value pairs in a structure that have leading zeros and then loop through the resulting data using a function I found on the Adobe Forums site here: http://forums.adobe.com/message/2101252, I can force all leading zeros data in the structure to remain as leading zero data in the JSON produced by SerializeJSON. :-)


Aug 19, 2009 at 8:52 PM // reply »
30 Comments

@Paul - You may also want to check out JSONUtil on RIAForge, http://jsonutil.riaforge.org/. It has a strict encoding option that will keep leading zeros on string data types.


Aug 19, 2009 at 9:30 PM // reply »
2 Comments

@Nathan - Thanks that looks pretty cool and is a neater solution than what I came up with :-)



Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Formatting: <strong>bold</strong> <em>italic<em>







  • Help Wanted - Find Your Next ColdFusion Job
Recent Blog Comments
Sep 5, 2010 at 6:35 PM
Muscle: Confessions Of An Unlikely Bodybuilder By Samuel Wilson Fussell
@Ben, Certainly will/ Thanks Sean ... read »
Sep 5, 2010 at 6:26 PM
Experimenting With HTML5's Cache Manifest For Offline Web Applications
@Ben, Yes, I am using Firefox Portable. At the moment I run a portable web server on the stick which holds and serves all files. The good thing is, I can run PHP pages on the stick to do requests to ... read »
Sep 5, 2010 at 5:05 PM
Ask Ben: Finding XML Nodes That Have Children With The Given Case-Insensitive Phrase
@Murray, Good point on the clarification. ... read »
Sep 5, 2010 at 4:40 PM
Ask Ben: Finding XML Nodes That Have Children With The Given Case-Insensitive Phrase
Actually, for the benefit of anyone reading this who might want to make sense of the question post, the first <td> had a bold tag surrounding the numeral 6. So, the problem was that the xmlSear ... read »
Sep 5, 2010 at 4:35 PM
Ask Ben: Finding XML Nodes That Have Children With The Given Case-Insensitive Phrase
Thanks Ben. Much appreciated. ... read »
Sep 5, 2010 at 3:39 PM
jQuery forEach() Experiment For Branch-Wise Implicit Iteration
@Sereal, Wow - what a super flattering thing to say :) I really appreciate that! I'm so happy that this stuff is providing value for you. ... read »
Sep 5, 2010 at 3:32 PM
Escaping Form Values - Understanding The ColdFusion htmlEditFormat() Life Cycle
@Ben, There's also a performance benefit to escaping on database insert since it only needs to be done ONCE - when inserting. When you escape on output, this needs to be done every time you output ... read »
Sep 5, 2010 at 3:30 PM
XML Building / Parsing / Traversing Speed In ColdFusion
@Don, I've played around with a couple of approaches to dealing with XML documents that are too large to be parsed in one shot. In one, approach, I use Regular Expression to try and parse one tag a ... read »