Searching Directories And File Content Using ColdFusion

Posted March 1, 2007 at 8:06 AM by Ben Nadel

Tags: ColdFusion

Earlier this week, Nick G over on the CF-Talk list asked about searching through the content of the files in a given directory. I would say this is a task best performed by something other than ColdFusion... but of course, I am not one to turn down the chance to write some sweet ass ColdFusion code. And so, last night, I wrote this ColdFusion user defined function (UDF) that takes either a directory path or an array of file paths and a phrase to search for and returns an array of file paths that contain the given phrase. The search can be done either as a literal text search or as a regular expression search.

Here is the SearchFiles() ColdFusion UDF:

  • <cffunction
  • name="SearchFiles"
  • access="public"
  • returntype="array"
  • output="false"
  • hint="Searchs files for the given values. Returns an array of file paths.">
  •  
  • <!--- Define arguments. --->
  • <cfargument
  • name="Path"
  • type="any"
  • required="true"
  • hint="This is either a directory path or an array of file paths which we will be searching."
  • />
  •  
  • <cfargument
  • name="Criteria"
  • type="string"
  • required="true"
  • hint="The values for which we are searching the file contents."
  • />
  •  
  • <cfargument
  • name="Filter"
  • type="string"
  • required="false"
  • default="cfm,css,htm,html,js,txt,xml"
  • hint="List of file extensions that we are going to allow."
  • />
  •  
  • <cfargument
  • name="IsRegex"
  • type="boolean"
  • required="false"
  • default="false"
  • hint="Flags whether or not the search criteria is a regular expression."
  • />
  •  
  •  
  • <!--- Define the local scope. --->
  • <cfset var LOCAL = StructNew() />
  •  
  •  
  • <!---
  • Check to see if we are dealing with a directory path.
  • If we are, we are going to want to get those paths
  • and convert it to an array of file paths.
  • --->
  • <cfif IsSimpleValue( ARGUMENTS.Path )>
  •  
  • <!---
  • Get all the files in the given directory. We are
  • going to ensure that only files are returned in
  • the resultant query. We don't want to deal with
  • any directories.
  • --->
  • <cfdirectory
  • action="LIST"
  • directory="#ARGUMENTS.Path#"
  • name="LOCAL.FileQuery"
  • filter="*.*"
  • />
  •  
  • <!---
  • Now that we have the query, we want to create an
  • array of the file names.
  • --->
  • <cfset LOCAL.Paths = ArrayNew( 1 ) />
  •  
  • <!--- Loop over the query and set up the values. --->
  • <cfloop query="LOCAL.FileQuery">
  •  
  • <cfset ArrayAppend(
  • LOCAL.Paths,
  • (LOCAL.FileQuery.directory & "\" & LOCAL.FileQuery.name)
  • ) />
  •  
  • </cfloop>
  •  
  • <cfelse>
  •  
  • <!---
  • For consistency sake, just store the path argument
  • into our local paths value so that we can refer to
  • this and the query-route the same way (see above).
  • --->
  • <cfset LOCAL.Paths = ARGUMENTS.Path />
  •  
  • </cfif>
  •  
  •  
  • <!---
  • ASSERT: At this point, whether we were passed in a
  • directory path or an array of file paths, we now have
  • an array of file paths that we are going to search
  • in the variable LOCAL.Paths.
  • --->
  •  
  •  
  • <!---
  • Create an array in which we will store the file paths
  • that had matching criteria.
  • --->
  • <cfset LOCAL.MatchingPaths = ArrayNew( 1 ) />
  •  
  •  
  • <!---
  • Clean up the filter to be used in a regular expression.
  • We are going to turn the list into an OR reg ex.
  • --->
  • <cfset ARGUMENTS.Filter = ARGUMENTS.Filter.ReplaceAll(
  • "[^\w\d,]+",
  • ""
  • ).ReplaceAll(
  • ",",
  • "|"
  • ) />
  •  
  •  
  • <!--- Loop over the file paths in our paths array. --->
  • <cfloop
  • index="LOCAL.PathIndex"
  • from="1"
  • to="#ArrayLen( LOCAL.Paths )#"
  • step="1">
  •  
  •  
  • <!---
  • Get a short hand to the current path. This is
  • not necessary but just makes referencing the
  • path easier.
  • --->
  • <cfset LOCAL.Path = LOCAL.Paths[ LOCAL.PathIndex ] />
  •  
  •  
  • <!---
  • Check to see if this file path is allowed. Either
  • we have no file filters or we do and this file
  • has one of them.
  • --->
  • <cfif (
  • (NOT Len( ARGUMENTS.Filter )) OR
  • (
  • REFindNoCase(
  • "(#ARGUMENTS.Filter#)$",
  • LOCAL.Path
  • )
  • ))>
  •  
  •  
  • <!---
  • This is a file that we can use. Read in the
  • contents of the file.
  • --->
  • <cffile
  • action="READ"
  • file="#LOCAL.Path#"
  • variable="LOCAL.FileData"
  • />
  •  
  •  
  • <!---
  • Check to see what kind of search we are going.
  • Is it a straight-up value search or is it a
  • regular expression search?
  • --->
  • <cfif (
  • (
  • ARGUMENTS.IsRegex AND
  • REFindNoCase(
  • ARGUMENTS.Criteria,
  • LOCAL.FileData
  • )
  • ) OR
  • (
  • (NOT ARGUMENTS.IsRegex) AND
  • FindNoCase(
  • ARGUMENTS.Criteria,
  • LOCAL.FileData
  • )
  • )
  • )>
  •  
  • <!---
  • This is a good file path. Add it to the
  • list of successful file paths.
  • --->
  • <cfset ArrayAppend(
  • LOCAL.MatchingPaths,
  • LOCAL.Path
  • ) />
  •  
  • </cfif>
  •  
  • </cfif>
  •  
  • </cfloop>
  •  
  •  
  • <!--- Return the array of matching file paths. --->
  • <cfreturn LOCAL.MatchingPaths />
  •  
  • </cffunction>

As you can see, there is no real magic going on here. The algorithm just loops over the file paths, checks them against the file extension filter, reads in the content, searches for the phrase, and then returns all matching file paths. The only difference between a standard search and a regular expression search is that the standard search uses FindNoCase() where as the regular expression search uses REFindNoCase().

Here is an example of how to search the current directory:

  • <!--- Search entire directory. --->
  • <cfset arrMatchingPaths = SearchFiles(
  • Path = ExpandPath( "./" ),
  • Criteria = "she pondered"
  • ) />

... and here is how you might call it using an array of file paths:

  • <!--- Create an array of file paths to search. --->
  • <cfset arrPaths = ArrayNew( 1 ) />
  •  
  • <!--- Add paths to the array. --->
  • <cfset ArrayAppend(
  • arrPaths,
  • ExpandPath( "./file_search_data.htm" )
  • ) />
  •  
  • <cfset ArrayAppend(
  • arrPaths,
  • ExpandPath( "./file_search_data.html" )
  • ) />
  •  
  • <cfset ArrayAppend(
  • arrPaths,
  • ExpandPath( "./file_search_data.txt" )
  • ) />
  •  
  •  
  • <!--- Search given files for the regular expression match. --->
  • <cfset arrMatchingPaths = SearchFiles(
  • Path = arrPaths,
  • Criteria = "she (pondered|licked|kissed)",
  • Filter = "txt",
  • IsRegex = true
  • ) />

So that's that. I am sure there are many ways of doing this in ColdFusion that have already been done, but you know me - I love to reinvent the wheel (no matter what Sean might say - I love getting the machinery firing full blast). One modification that could be neat would be to search the file name itself. This would be an easy modification (perhaps for the next attempt).



Reader Comments

Mar 1, 2007 at 4:44 PM // reply »
11 Comments

Why not just use Verity (or Lucene for the Blue Dragon crowd)?


Mar 1, 2007 at 5:03 PM // reply »
10,640 Comments

Overhead... and that sort of a demo would be beyond my area of expertise. Plus, verity requires duplicating data (for the index). This can take random directories / file paths on the fly. This doesn't require any planning.


Mar 18, 2007 at 9:10 AM // reply »
1 Comments

Hi all,

Thanks for the code!!
I tried to search for a file content by using your code and it works well! :)

However, it only works well when searching for english text content, but asian lanaguages (eg. chinese, japanese, etc) cannot.

Just wondering if it is possible to search for the asian language content from a file? Any change to the code itself?

Regards,
Ronald


Mar 19, 2007 at 7:36 AM // reply »
10,640 Comments

@Ronald,

I am not sure of how you would go about this. My thoughts, and this is probably NOT the way to go, would be to run regular expression searches and just replace all the foreign extended characters with something like .{1} where it matches one character.

So, something like "Espanol" where is has the "n" with the tilde, you would maybe search for "Espa.{1}ol". Of course, this does not guarantee a good match. There has god to be a much better way to do this.


Mar 23, 2010 at 9:26 AM // reply »
1 Comments

Works like a charm Ben!

One question, though. Is there a way to limit the area of the file that is searched? I am using this to search through some pages on our website. The only problem is, when the search term happens to be in the meta tags of the pages, those pages come up whether or not the search term is in the readable content of the page.

Thanks!


Mar 24, 2010 at 11:00 PM // reply »
10,640 Comments

@Jason,

Hmmm, that's a tough question. When you have to matching something NOT within something else, I'll typically break the problem down into two separate matches; or rather, I'll match what I DO want and what I DON'T want and then I'll make the judgement call per-match.

While this is not what you are asking, I used a similar approach when replacing a string that was NOT within another string:

http://www.bennadel.com/blog/1861-Ask-Ben-Replacing-A-String-That-Is-Not-Inside-Of-Another-String.htm

In that example, I'm replacing; but, you could rework something like that to work with find, not replace.

If your pages are XML-compliant (strict XHTML), then you could also parse the XHTML into an XML document and search based on node text. This would give you a bit more control; but, it will only work if your markup is rather strict.


May 12, 2010 at 11:12 AM // reply »
1 Comments

Hi,

I'm a CF newbie...I inherited this site at our company that is built in CF.

I am trying to find the pages that would contain certain strings...so this function will be very useful...

If only, I knew how/ where to put these code snippets.

Thanks.

RD


May 13, 2010 at 10:33 PM // reply »
10,640 Comments

@RDev,

There's a lot of strategies for defining user defined functions in a ColdFusion application. Typically, people put them in a file and then CFInclude them into each page. Or, you can create some sort of UDF object and then instantiate it and cache it within your Application scope (for example).

Good luck!


Nov 2, 2010 at 9:05 AM // reply »
6 Comments

Great code example! Thanks for this.

I think this works if you only want to search one directory but doesn't work if your looking for a recursive file search. How would you go about creating a search that works similar but recursive? A file index?

Thanks,
Paul


Nov 3, 2010 at 10:44 AM // reply »
10,640 Comments

@Paul,

Good question - I think what you could do is turn the function into a recursive one base on the type of path being passed in. So, for example, if you pass in a path that is a simple value (ie. a string) and then you run:

directoryExists( arguments.path )

... to see if it's a directory. If it *is* a directory, you can query for all entries in that directory and then recursively call the searchFiles() function for each nested path.

Does that make sense? That could be a fun little blog post.


Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
InVision App - Prototyping Made Beautiful With Prototyping Tools Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
Feb 12, 2012 at 3:37 AM
Learning ColdFusion 8: CFImage Part III - Watermarks And Transparency
Hi Ben, Just to ask currently it is placed bottom right corner, if i need to replace the same rendered image on the bottom left side or in the bottom center, how that can be calculated. bottom ce ... read »
Feb 11, 2012 at 9:29 PM
Use jQuery's SlideDown() With Fixed-Width Elements To Prevent Jumping
I can't say how glad I am that I found your post. Thank you very much. ... read »
Feb 10, 2012 at 7:21 PM
jQuery AJAX Strips Script Tags And Inserts Them After Parent-Most Elements
Update! Instead of $(eval(options.insertAfter)).after(data['insertData']); I now use: var ajaxNode = document.createElement('span'); var parent = $(eval(options.insertAfter))[0].parentNode; ... read »
Feb 10, 2012 at 6:18 PM
jQuery AJAX Strips Script Tags And Inserts Them After Parent-Most Elements
encountered this same, what I consider, jQuery bug last week. I'm building a site in which I load some content via AJAX. This content contains Linkedin share button placeholders which Linkedin API ne ... read »
Feb 10, 2012 at 11:30 AM
Cross-Origin Resource Sharing (CORS) AJAX Requests Between jQuery And Node.js
After you understand the concepts here, this is an awesome cheatsheet for enabling CORS in just about anything http://enable-cors.org/ ... read »
JM
Feb 10, 2012 at 9:10 AM
My Safari Browser SQLite Database Hello World Example
@Amy, Here is a very good tutorial on how to use JOIN: http://www.sqltutorial.org/sqljoin-innerjoin.aspx ... read »
Feb 10, 2012 at 4:42 AM
Building A Twitter-Inspired RESTful API Architecture In ColdFusion
This is great, very useful Ben. I spotted a small typo in the api.cgm listing: <cfthrow type="Unauthroized" /> Cheers Stefan ... read »
Feb 9, 2012 at 10:35 PM
CFDirectory Filtering Uses Pipe Character For Multiple Filters (Thanks Steve Withington)
I was wondering if there would be a filter you could apply so that you got everything but what you included in the filter. As in show me all docs that are not a .pdf. ... read »