Skip to main content
Ben Nadel at the Stammari Suberbowl XLIV Party (Feb. 2010) with: David Stamm
Ben Nadel at the Stammari Suberbowl XLIV Party (Feb. 2010) with: David Stamm ( @davidstamm )

Performing A Case-Sensitive Search On A Case-Insensitive Column Using COLLATE In MySQL 5.6.49

By on
Tags:

Applications evolve over time. And, when you've been working on an application for close to a decade, you can bet dollars-to-donuts that the database schema isn't optimal. Usually, old MySQL databases aren't using utf8mb4 character-sets; and, the column collations might not be appropriate for their use-case. Recently, I've been thinking about changing the way one column in particular is used, moving from a HEX-encoded value to a Base64-encoded value. The issue there being that HEX is not case-sensitive whereas Base64 is case-sensitive. And, I wanted to see if I could use the COLLATE clause to apply a case-sensitive search to a case-insensitive column in MySQL 5.6.49.

To set the context here, imagine that I have a secret_message table in which secret messages can be looked up using a unique slug value. Given the age of this table, it wasn't defined with any deep thought and might have a schema like this:

CREATE TABLE `secret_message` (
	`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
	`slug` varchar(50) NOT NULL,
	`message` varchar(300) NOT NULL,
	PRIMARY KEY (`id`),
	KEY `ix_bySlug` (`slug`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

While not shown in this schema, the columns are using a utf8_general_ci collation (I'm assuming it's not showing here because that's the default collation in the database?).

When the slug column is populated with HEX values, the case-sensitivity doesn't really matter because, with HEX, aaa and AAA are the same value. However, when it comes to Base64, case means something. Which means that, with Base64, aaa and AAA are distinct values.

So, when I start storing Base64 values in this table, I need my WHERE clause to be able to distinguish between aaa and AAA. With MySQL, we can do this by using the COLLATE clause and the utf8_bin collation (in my particular case):

WHERE slug COLLATE utf8_bin = 'aaa'

But, we have to be careful about our database indexes. Just as when using a Function on a column, attempting to apply a COLLATE will bypass an index. To see what I mean, let's populate the secret_message table with some similar looking slugs:

INSERT INTO secret_message
	( slug, message )
VALUES
	( 'aaaa', 'You are a rock star!' ),
	( 'AAAA', 'You are a doofus!' ),
	( 'aAaA', 'You are a magical unicorn!' )
;

Now, let's attempt to query for one of the slugs using just the COLLATE clause:

EXPLAIN SELECT
	m.id,
	m.slug,
	m.message
FROM
	secret_message m
WHERE
	m.slug COLLATE utf8_bin = 'aaaa'
;

When we run this EXPLAIN statement, we get the following result:

MySQL EXPLAIN showing a full-table scan when using COLLATE.

As you can see, despite the fact that the slug column is indexed, attempting to search the column using COLLATE leaves us performing a full-table scan. To fix this, we need to perform both a case-insensitive search and a case-sensitive search:

EXPLAIN SELECT
	m.id,
	m.slug,
	m.message
FROM
	secret_message m
WHERE
	-- NOTE: We have to perform the CASE INSENSITIVE search first so that we
	-- can leverage the INDEX on the table column, `slug`. If we try to go
	-- directly to the COLLATE directive, the query won't use the index and
	-- we do a full-table scan.
	m.slug = 'aaaa'
AND
	m.slug COLLATE utf8_bin = 'aaaa'
;

And, when we run this EXPLAIN statement, we get the following result:

MySQL EXPLAIN showing an index when using a case-insensitive search and COLLATE.

As you can see, now that we are using both comparisons, the database is only searching three rows for the three similar looking slugs:

  • aaaa
  • AAAA
  • aAaA

This is because the first WHERE condition performs a case-insensitive search on the slug column and then the second WHERE condition further filters the results based on a case-sensitive collation.

And, to demonstrate that this is actually locating the one correct row, here's the same query without the EXPLAIN directive:

MySQL locates the correct row using a combined case-insensitive and case-sensitive search approach.

As you can see, we located the aaaa row and ignored the similar AAAA and aAaA rows.

Obviously, in a perfect world, the schema of the table would just be updated to use a different collation so that we don't have to do anything special in our SQL statement. But, welcome to the real world where data is messy and deadlines are tight and you don't always get to change things to suit your needs. In such cases, it's nice to see that we can overlay a case-sensitive search on top of a case-insensitive column definition in MySQL.

Want to use code from this post? Check out the license.

Reader Comments

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel