Skip to main content
Ben Nadel at the New York ColdFusion User Group (Jan. 2008) with: Clark Valberg and Adam Lehman and Javier Julio
Ben Nadel at the New York ColdFusion User Group (Jan. 2008) with: Clark Valberg@clarkvalberg ) , Adam Lehman@adrocknaphobia ) , and Javier Julio

Using Regular Expressions (RLIKE) Pattern Matching In MySQL Queries

By on

If you've known me for a while, you know that I'm absolutely in love with Regular Expressions. I use them in ColdFusion and JavaScript all the time. And, the other day, for the very first time, I used them in MySQL. As a celebration of this milestone, I wanted to put together a quick post. This is not intended to be a tutorial, but rather a beacon of hope for anyone who didn't realize that using Regular Expressions in MySQL was even a possibility.

The Regular Expression (REGEXP / RLIKE) support in MySQL is not as robust as it is in other programming languages. And, the Regular Expression engine can't leverage table indices; so, you're basically doing a full set scan for pattern matching. But, they are still very awesome and can make running data reports much easier.

To start with, I wanted to see what kind of performance hit the REGEXP / RLIKE operator had when compared to the less robust, less flexible LIKE operator. While not necessarily a scientific test, I tried to query for email addresses, in a 30-million record table, that ended with "":

-- When using the LIKE operator, three attempts ran in: 34s, 22s, 22s.
	user u

-- When using the RLIKE operator, three attempts ran in: 144s, 141s, 140s.
	user u
WHERE RLIKE '@bennadel\.com$'

While neither of these operations (suffix matching) could leverage table indices, the LIKE operator ran 5-6 times faster than the RLIKE / REGEXP operator to find the same data. I don't point this out to say that RLIKE is bad; merely, that it has some overhead and should be used when its the "right" tool for the job - not as a replacement for basic LIKE operations.

That said, here's what some of the possible pattern matching looks like:

	-- Using basic character classes and not-operator.
	-- Result: 1
	( '' RLIKE '[^@]+@[^.](\.[^.]+)+' ) isEmail,

	-- Using optional character matching.
	-- Result: 1
	( 'data.xls' RLIKE '\.xlsx?' ) AS isXLS,

	-- Using digit-matching and variable-length patterns. Unfortunately, RLIKE
	-- doesn't support notations like \d and \s. Instead, you have to use special
	-- character class names.

	-- Result: 0 -- doesn't support \d
	( '212-555-3932' RLIKE '\d{3}-\d{3}-\d{4}' ) AS isTele,

	-- Result: 1
	( '212-555-3932' RLIKE '[[:digit:]]{3}-[[:digit:]]{3}-[[:digit:]]{4}' ) AS isTele2,

	-- Result: 1 -- using support for character ranges.
	( '212-555-3932' RLIKE '[0-9]{3}-[0-9]{3}-[0-9]{4}' ) AS isTele3,

	-- Testing word-boundaries. And, again, since RLIKE doesn't support notations
	-- like \b, you have to use special character class names.

	-- Result: 0 -- doesn't support \b
	( 'You have a warrant?' RLIKE '[[:<:]]war[[:>:]]' ) AS isWar,

	-- Result: 1
	( 'Yo, war is lame.' RLIKE '[[:<:]]war[[:>:]]' ) AS isWar2,

	-- RLIKE is not case sensitive, except when used with binary strings.
	-- Result: 1 (for both tests)
	( 'BEN' RLIKE 'ben' ) AS isBen,
	( 'BEN' RLIKE 'BEN' ) AS isBen2

As you can see, there is an unfortunate lack of support for core character class notation in the form of \w, \b, \s, etc. But, you can used named character classes like [:digit:] to make up for that. It's a bit more cumbersome, but it works just the same.

So there you are - Regular Expression power meets relation database power for much sexiness! And while this might not affect much of my day-to-day SQL authoring, I am quite sure that I'll be making use of this in my SQL reporting efforts.

Want to use code from this post? Check out the license.

Reader Comments