Javascript Exec() Method For Regular Expression Matching
Posted May 10, 2007 at 1:21 PM
Earlier today, Steve of Flagrant Badassery introduced me to the Exec() method of the Javascript Regular Expression object (RegExp::exec()). I had never heard of this before and I love working with regular expressions, so naturally I had to dive right in and do a little experimentation. From my initial play time, it looks like the exec() method allows me to use the regular expression object somewhat like the Java Pattern Matcher that I love using so much in ColdFusion.
First, I had to look up the details of the method and how it works. The Mozilla Developer Center has some very clean, straightforward documentation. Once I had that, I set up this test page.
This test page takes some song lyrics and finds the pattern defined as words that modify other words (I had to think of something to test with). Once we have our target text and our pattern, we do a conditional loop until the RegExp::exec() method no longer returns a valid match array. The array that gets returned acts just like the Matcher::Group() method in Java, so this felt very natural:
Launch code in new window » Download code as text file »
- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
- <html>
- <head>
- <title>Javascript Regular Expression Exec()</title>
-
- <script type="text/javascript">
-
- // Define the text that we are going to search.
- // This text was taken from Sir Mix-a-Lot's hit
- // single, "I Like Big Butts".
- var strText = "\
- I like big butts and I can not lie \
- You other brothers can't deny \
- That when a girl walks in with an itty bitty waist \
- And a round thing in your face \
- You get sprung \
- Wanna pull up tough \
- Cuz you notice that butt was stuffed \
- Deep in the jeans she's wearing \
- I'm hooked and I can't stop staring \
- Oh, baby I wanna get with ya \
- And take your picture \
- ";
-
-
- // Get a pattern that we want to search on. This
- // defines certain modifiers and the words that
- // the modify.
- var rePattern = new RegExp(
- "(big|round|bitty)(?:\\s+)([^\\s]+)",
- "gi"
- );
-
- </script>
- </head>
- <body>
-
- <script type="text/javascript">
-
- // Define our match array. This will be populated for
- // each iteration of the Exec() method.
- var arrMatch = null;
-
-
- // Keep looping over the target text while we can
- // find matches. If no matches can be found,
- // arrMatch is null and will end the while loop.
- while (arrMatch = rePattern.exec( strText )){
-
- document.write(
- arrMatch[ 1 ].toUpperCase() +
- " modifies " +
- arrMatch[ 2 ].toUpperCase() +
- "<br />"
- );
-
-
- // Explore the modified properties of both
- // the returned array as well as the regular
- // expression object. The returned array,
- // unlike traditional Javascript arrays, is
- // given pattern-matching related properties.
-
-
- document.write(
- "........ " +
- "Phrase: " +
- arrMatch[ 0 ] +
- "<br />"
- );
-
- document.write(
- "........ " +
- "Start Index: " +
- arrMatch.index +
- "<br />"
- );
-
- document.write (
- "........ " +
- "End Index: " +
- rePattern.lastIndex +
- "<br />"
- );
-
- document.write (
- "........ " +
- "Index Substring: " +
- arrMatch.input.substring(
- arrMatch.index,
- rePattern.lastIndex
- ) +
- "<br /><br />"
- );
- }
-
- </script>
-
- </body>
- </html>
Running the above page, we get the following output:
BIG modifies BUTTS
........ Phrase: big butts
........ Start Index: 10
........ End Index: 19
........ Index Substring: big butts
BITTY modifies WAIST
........ Phrase: bitty waist
........ Start Index: 113
........ End Index: 124
........ Index Substring: bitty waist
ROUND modifies THING
........ Phrase: round thing
........ Start Index: 134
........ End Index: 145
........ Index Substring: round thing
What really surprised me was that the returned match array had additional properties that were related to the pattern matching itself. I am not used to seeing this as usually, when it comes to arrays, I am just accessing indexes and checking lengths. It does concern me a little bit that the entire target text is copied into a property of the array (Array.input). Since strings are copied by value, this means we have lots of copies of this text running around now. Of course, come on, we are talking about Javascript :) I don't think variable size/count or performance is really much of an issue.
Anyway, this is very cool stuff. I wish I had known about this earlier. This feels like a very elegant solution, even more so that harnessing the power of the String::replace() method to accomplish the same ends.
Thanks Steve!
Download Code Snippet ZIP File
Post Comment | Ask Ben | Permalink | Other Searches | Print Page
Newer Post
Sorting Really Big Files (Inspired By Sammy Larbi)
Older Post
Thinspiration... What About Thickspiration? (Saucy)
Reader Comments
"The Mozilla Developer Center has some very clean, straightforward documentation."
Kind of. See my related comments on your previous blog post:
http://www.bennadel.com/blog/695-Ask-Ben-Getting-Query-String-Values-In-JavaScript.htm#comments_3377
Hey, what the heck... my hash didn't link properly ! I will have to look into that.
Ok, so be clean and straightforward, I guess maybe I meant the actual formatting of the page. It just looked nice. But yeah, the last example has no place or purpose being there.
"What really surprised me was that the returned match array had additional properties that were related to the pattern matching itself."
If you think about it, it's not really any weirder than other types (e.g., functions) having properties or methods. At it's core, pretty much every type in JavaScript is really an object.
For some fun with JavaScript types, run these in Firebug for possibly a few surprises, depending on one's understanding of JavaScript types and constructors:
console.log(typeof null); // object
console.log(typeof [1,2,3]); // object
console.log(typeof new Boolean()); // object
console.log(new String() instanceof Object); // true
console.log(typeof NaN); // number
console.log(NaN == NaN); // false
console.log(typeof /regex/); // object in IE; function in Firefox
console.log(new Boolean(false)); // false (probably expected)
console.log(new Boolean(false) == false); // true (probably expected)
console.log(false === false); // true (probably expected)
console.log(new Boolean(false) === false); // false (probably unexpected)
I don't mean to spam your comments, but here a simple example of exec() in action that I used recently...
I needed to created an array containing the indices of each tab character within a string, so I used something like the following:
------------------------
var tabIndices = [];
var tabMatchInfo;
var tabRegex = /\t/g;
while (tabMatchInfo = tabRegex.exec(string)) {
tabIndices.push(tabMatchInfo.index);
}
------------------------
Nice and easy. To do this without exec, it would require something like the following:
------------------------
var tabIndices = [];
var thisTabIndex;
for (var i = 0; i < string.length; i++) {
thisTabIndex = string.indexOf(String.fromCharCode(9), i);
if (thisTabIndex === -1) {
break;
} else {
tabIndices.push(thisTabIndex);
i = thisTabIndex;
}
}
------------------------
Of course, there are cases where it would be much more difficult than this (or maybe even impossible) to reproduce functionality achieved using exec() with other methods.
One word of caution... although "while (arr = /\t/g.exec(string)) {}" works fine in Firefox, it would create an infinite loop in IE (and possibly other browsers) if there is even one tab character in the string, since IE recompiles the regex for each iteration of the loop, effectively reseting lastIndex and never advancing.
In case that last paragraph wasn't clear, the problem results from creating the regex within the loop. The fix is to simply create the regex before entering the loop.
@Steve,
I like the Tab index example. I think that's a great demonstration of how exec() can be used. Thanks!




