Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
Ben Nadel at CFinNC 2009 (Raleigh, North Carolina) with: John Mason
Ben Nadel at CFinNC 2009 (Raleigh, North Carolina) with: John Mason@john_mason_ )

Using Regular Expressions (RLIKE) Pattern Matching In MySQL Queries

By Ben Nadel on
Tags: SQL

If you've known me for a while, you know that I'm absolutely in love with Regular Expressions. I use them in ColdFusion and JavaScript all the time. And, the other day, for the very first time, I used them in MySQL. As a celebration of this milestone, I wanted to put together a quick post. This is not intended to be a tutorial, but rather a beacon of hope for anyone who didn't realize that using Regular Expressions in MySQL was even a possibility.

The Regular Expression (REGEXP / RLIKE) support in MySQL is not as robust as it is in other programming languages. And, the Regular Expression engine can't leverage table indices; so, you're basically doing a full set scan for pattern matching. But, they are still very awesome and can make running data reports much easier.

To start with, I wanted to see what kind of performance hit the REGEXP / RLIKE operator had when compared to the less robust, less flexible LIKE operator. While not necessarily a scientific test, I tried to query for email addresses, in a 30-million record table, that ended with "@bennadel.com":

  • -- When using the LIKE operator, three attempts ran in: 34s, 22s, 22s.
  • SELECT
  • u.id,
  • u.email
  • FROM
  • user u
  • WHERE
  • u.email LIKE '%@bennadel.com'
  • ;
  •  
  • -- When using the RLIKE operator, three attempts ran in: 144s, 141s, 140s.
  • SELECT
  • u.id,
  • u.email
  • FROM
  • user u
  • WHERE
  • u.email RLIKE '@bennadel\.com$'
  • ;

While neither of these operations (suffix matching) could leverage table indices, the LIKE operator ran 5-6 times faster than the RLIKE / REGEXP operator to find the same data. I don't point this out to say that RLIKE is bad; merely, that it has some overhead and should be used when its the "right" tool for the job - not as a replacement for basic LIKE operations.

That said, here's what some of the possible pattern matching looks like:

  • SELECT
  • -- Using basic character classes and not-operator.
  • -- Result: 1
  • ( 'ben@bennadel.com' RLIKE '[^@]+@[^.](\.[^.]+)+' ) isEmail,
  •  
  • -- Using optional character matching.
  • -- Result: 1
  • ( 'data.xls' RLIKE '\.xlsx?' ) AS isXLS,
  •  
  • -- Using digit-matching and variable-length patterns. Unfortunately, RLIKE
  • -- doesn't support notations like \d and \s. Instead, you have to use special
  • -- character class names.
  •  
  • -- Result: 0 -- doesn't support \d
  • ( '212-555-3932' RLIKE '\d{3}-\d{3}-\d{4}' ) AS isTele,
  •  
  • -- Result: 1
  • ( '212-555-3932' RLIKE '[[:digit:]]{3}-[[:digit:]]{3}-[[:digit:]]{4}' ) AS isTele2,
  •  
  • -- Result: 1 -- using support for character ranges.
  • ( '212-555-3932' RLIKE '[0-9]{3}-[0-9]{3}-[0-9]{4}' ) AS isTele3,
  •  
  • -- Testing word-boundaries. And, again, since RLIKE doesn't support notations
  • -- like \b, you have to use special character class names.
  •  
  • -- Result: 0 -- doesn't support \b
  • ( 'You have a warrant?' RLIKE '[[:<:]]war[[:>:]]' ) AS isWar,
  •  
  • -- Result: 1
  • ( 'Yo, war is lame.' RLIKE '[[:<:]]war[[:>:]]' ) AS isWar2,
  •  
  • -- RLIKE is not case sensitive, except when used with binary strings.
  • -- Result: 1 (for both tests)
  • ( 'BEN' RLIKE 'ben' ) AS isBen,
  • ( 'BEN' RLIKE 'BEN' ) AS isBen2
  • ;

As you can see, there is an unfortunate lack of support for core character class notation in the form of \w, \b, \s, etc. But, you can used named character classes like [:digit:] to make up for that. It's a bit more cumbersome, but it works just the same.

So there you are - Regular Expression power meets relation database power for much sexiness! And while this might not affect much of my day-to-day SQL authoring, I am quite sure that I'll be making use of this in my SQL reporting efforts.




Reader Comments

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.