Javascript Exec() Method For Regular Expression Matching

Posted May 10, 2007 at 1:21 PM

Tags: Javascript / DHTML

Earlier today, Steve of Flagrant Badassery introduced me to the Exec() method of the Javascript Regular Expression object (RegExp::exec()). I had never heard of this before and I love working with regular expressions, so naturally I had to dive right in and do a little experimentation. From my initial play time, it looks like the exec() method allows me to use the regular expression object somewhat like the Java Pattern Matcher that I love using so much in ColdFusion.

First, I had to look up the details of the method and how it works. The Mozilla Developer Center has some very clean, straightforward documentation. Once I had that, I set up this test page.

This test page takes some song lyrics and finds the pattern defined as words that modify other words (I had to think of something to test with). Once we have our target text and our pattern, we do a conditional loop until the RegExp::exec() method no longer returns a valid match array. The array that gets returned acts just like the Matcher::Group() method in Java, so this felt very natural:

 Launch code in new window » Download code as text file »

  • <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  • <html>
  • <head>
  • <title>Javascript Regular Expression Exec()</title>
  •  
  • <script type="text/javascript">
  •  
  • // Define the text that we are going to search.
  • // This text was taken from Sir Mix-a-Lot's hit
  • // single, "I Like Big Butts".
  • var strText = "\
  • I like big butts and I can not lie \
  • You other brothers can't deny \
  • That when a girl walks in with an itty bitty waist \
  • And a round thing in your face \
  • You get sprung \
  • Wanna pull up tough \
  • Cuz you notice that butt was stuffed \
  • Deep in the jeans she's wearing \
  • I'm hooked and I can't stop staring \
  • Oh, baby I wanna get with ya \
  • And take your picture \
  • ";
  •  
  •  
  • // Get a pattern that we want to search on. This
  • // defines certain modifiers and the words that
  • // the modify.
  • var rePattern = new RegExp(
  • "(big|round|bitty)(?:\\s+)([^\\s]+)",
  • "gi"
  • );
  •  
  • </script>
  • </head>
  • <body>
  •  
  • <script type="text/javascript">
  •  
  • // Define our match array. This will be populated for
  • // each iteration of the Exec() method.
  • var arrMatch = null;
  •  
  •  
  • // Keep looping over the target text while we can
  • // find matches. If no matches can be found,
  • // arrMatch is null and will end the while loop.
  • while (arrMatch = rePattern.exec( strText )){
  •  
  • document.write(
  • arrMatch[ 1 ].toUpperCase() +
  • " modifies " +
  • arrMatch[ 2 ].toUpperCase() +
  • "<br />"
  • );
  •  
  •  
  • // Explore the modified properties of both
  • // the returned array as well as the regular
  • // expression object. The returned array,
  • // unlike traditional Javascript arrays, is
  • // given pattern-matching related properties.
  •  
  •  
  • document.write(
  • "........ " +
  • "Phrase: " +
  • arrMatch[ 0 ] +
  • "<br />"
  • );
  •  
  • document.write(
  • "........ " +
  • "Start Index: " +
  • arrMatch.index +
  • "<br />"
  • );
  •  
  • document.write (
  • "........ " +
  • "End Index: " +
  • rePattern.lastIndex +
  • "<br />"
  • );
  •  
  • document.write (
  • "........ " +
  • "Index Substring: " +
  • arrMatch.input.substring(
  • arrMatch.index,
  • rePattern.lastIndex
  • ) +
  • "<br /><br />"
  • );
  • }
  •  
  • </script>
  •  
  • </body>
  • </html>

Running the above page, we get the following output:

BIG modifies BUTTS
........ Phrase: big butts
........ Start Index: 10
........ End Index: 19
........ Index Substring: big butts

BITTY modifies WAIST
........ Phrase: bitty waist
........ Start Index: 113
........ End Index: 124
........ Index Substring: bitty waist

ROUND modifies THING
........ Phrase: round thing
........ Start Index: 134
........ End Index: 145
........ Index Substring: round thing

What really surprised me was that the returned match array had additional properties that were related to the pattern matching itself. I am not used to seeing this as usually, when it comes to arrays, I am just accessing indexes and checking lengths. It does concern me a little bit that the entire target text is copied into a property of the array (Array.input). Since strings are copied by value, this means we have lots of copies of this text running around now. Of course, come on, we are talking about Javascript :) I don't think variable size/count or performance is really much of an issue.

Anyway, this is very cool stuff. I wish I had known about this earlier. This feels like a very elegant solution, even more so that harnessing the power of the String::replace() method to accomplish the same ends.

Thanks Steve!

Download Code Snippet ZIP File

Post Comment  |  Ask Ben  |  Permalink  |  Other Searches  |  Print Page




Learning ColdFusion 9 - ColdFusion 9 tutorials, samples, examples, demos

Reader Comments

May 10, 2007 at 2:21 PM // reply »
164 Comments

"The Mozilla Developer Center has some very clean, straightforward documentation."

Kind of. See my related comments on your previous blog post:

http://www.bennadel.com/blog/695-Ask-Ben-Getting-Query-String-Values-In-JavaScript.htm#comments_3377


May 10, 2007 at 3:01 PM // reply »
6,516 Comments

Hey, what the heck... my hash didn't link properly ! I will have to look into that.

Ok, so be clean and straightforward, I guess maybe I meant the actual formatting of the page. It just looked nice. But yeah, the last example has no place or purpose being there.


May 10, 2007 at 4:33 PM // reply »
164 Comments

"What really surprised me was that the returned match array had additional properties that were related to the pattern matching itself."

If you think about it, it's not really any weirder than other types (e.g., functions) having properties or methods. At it's core, pretty much every type in JavaScript is really an object.


May 10, 2007 at 4:58 PM // reply »
164 Comments

For some fun with JavaScript types, run these in Firebug for possibly a few surprises, depending on one's understanding of JavaScript types and constructors:

console.log(typeof null); // object
console.log(typeof [1,2,3]); // object
console.log(typeof new Boolean()); // object
console.log(new String() instanceof Object); // true

console.log(typeof NaN); // number
console.log(NaN == NaN); // false
console.log(typeof /regex/); // object in IE; function in Firefox

console.log(new Boolean(false)); // false (probably expected)
console.log(new Boolean(false) == false); // true (probably expected)
console.log(false === false); // true (probably expected)
console.log(new Boolean(false) === false); // false (probably unexpected)


May 10, 2007 at 9:11 PM // reply »
164 Comments

I don't mean to spam your comments, but here a simple example of exec() in action that I used recently...

I needed to created an array containing the indices of each tab character within a string, so I used something like the following:

------------------------
var tabIndices = [];
var tabMatchInfo;
var tabRegex = /\t/g;

while (tabMatchInfo = tabRegex.exec(string)) {
tabIndices.push(tabMatchInfo.index);
}
------------------------

Nice and easy. To do this without exec, it would require something like the following:

------------------------
var tabIndices = [];
var thisTabIndex;

for (var i = 0; i < string.length; i++) {
thisTabIndex = string.indexOf(String.fromCharCode(9), i);
if (thisTabIndex === -1) {
break;
} else {
tabIndices.push(thisTabIndex);
i = thisTabIndex;
}
}
------------------------

Of course, there are cases where it would be much more difficult than this (or maybe even impossible) to reproduce functionality achieved using exec() with other methods.

One word of caution... although "while (arr = /\t/g.exec(string)) {}" works fine in Firefox, it would create an infinite loop in IE (and possibly other browsers) if there is even one tab character in the string, since IE recompiles the regex for each iteration of the loop, effectively reseting lastIndex and never advancing.


May 10, 2007 at 9:13 PM // reply »
164 Comments

In case that last paragraph wasn't clear, the problem results from creating the regex within the loop. The fix is to simply create the regex before entering the loop.


May 11, 2007 at 10:46 AM // reply »
6,516 Comments

@Steve,

I like the Tab index example. I think that's a great demonstration of how exec() can be used. Thanks!


Post Comment  |  Ask Ben

Recent Blog Comments
Nov 20, 2009 at 11:32 PM
Five Months Without Hungarian Notation And I'm Loving It
I've used headless camel case for years for not only ColdFusion variables, but also SQL tables and fields... pretty much everything involving code. I also subscribe to the "don't abbreviate and clea ... read »
Nov 20, 2009 at 11:00 PM
Five Months Without Hungarian Notation And I'm Loving It
@Marcel, Yeah, I always err on the side of longer but more readable variable names. As for the camel casing of CF methods and the headless camel casing of custom items, I get around this by always ... read »
Nov 20, 2009 at 10:56 PM
Five Months Without Hungarian Notation And I'm Loving It
I use the following and love it: my.namespace.MyComponents.functionMethodsOrUDF() CONSTANT_VALUES_OR_PROPERTIES One thing I always try is to CamelCaseBuiltInColdFusionFunctions() so others can tell ... read »
Nov 20, 2009 at 5:38 PM
Learning ColdFusion 8: CFImage Part I - Reading And Writing Images
Hi Ben, Great article. I've been looking around to see if ColdFusion image engine can programatically create the following "wrap around" effect: http://www.creativepro.com/article/photoshop-s-she ... read »
Nov 20, 2009 at 5:35 PM
Maintaining ColdFusion Sessions Across SMS Text Message Requests Without Cookies
@Dave: I talked to Gert he suggested: <cfhttp method="get" url="http://{some cf website}" result="stuff" addtoken="yes" /> Note the addition of cfhttp attribute addtoken. That should persist y ... read »
Nov 20, 2009 at 5:23 PM
Maintaining ColdFusion Sessions Across SMS Text Message Requests Without Cookies
@Todd, Ahh, gotcha, yeah that makes sense. ... read »
Nov 20, 2009 at 5:17 PM
Maintaining ColdFusion Sessions Across SMS Text Message Requests Without Cookies
Ben, sorry if I didn't make this clear. You can make it work like that if you want, just put <cfset session.foo = 1> (and <cfset application.foo = 1>) in your OnRequestStart() and it reve ... read »