Skip to main content
Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.

Performing A Case-Sensitive Search On A Case-Insensitive Column Using COLLATE In MySQL 5.6.49

By Ben Nadel on
Tags: SQL

Applications evolve over time. And, when you've been working on an application for close to a decade, you can bet dollars-to-donuts that the database schema isn't optimal. Usually, old MySQL databases aren't using utf8mb4 character-sets; and, the column collations might not be appropriate for their use-case. Recently, I've been thinking about changing the way one column in particular is used, moving from a HEX-encoded value to a Base64-encoded value. The issue there being that HEX is not case-sensitive whereas Base64 is case-sensitive. And, I wanted to see if I could use the COLLATE clause to apply a case-sensitive search to a case-insensitive column in MySQL 5.6.49.

To set the context here, imagine that I have a secret_message table in which secret messages can be looked up using a unique slug value. Given the age of this table, it wasn't defined with any deep thought and might have a schema like this:

CREATE TABLE `secret_message` (
	`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
	`slug` varchar(50) NOT NULL,
	`message` varchar(300) NOT NULL,
	PRIMARY KEY (`id`),
	KEY `ix_bySlug` (`slug`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

While not shown in this schema, the columns are using a utf8_general_ci collation (I'm assuming it's not showing here because that's the default collation in the database?).

When the slug column is populated with HEX values, the case-sensitivity doesn't really matter because, with HEX, aaa and AAA are the same value. However, when it comes to Base64, case means something. Which means that, with Base64, aaa and AAA are distinct values.

So, when I start storing Base64 values in this table, I need my WHERE clause to be able to distinguish between aaa and AAA. With MySQL, we can do this by using the COLLATE clause and the utf8_bin collation (in my particular case):

WHERE slug COLLATE utf8_bin = 'aaa'

But, we have to be careful about our database indexes. Just as when using a Function on a column, attempting to apply a COLLATE will bypass an index. To see what I mean, let's populate the secret_message table with some similar looking slugs:

INSERT INTO secret_message
	( slug, message )
VALUES
	( 'aaaa', 'You are a rock star!' ),
	( 'AAAA', 'You are a doofus!' ),
	( 'aAaA', 'You are a magical unicorn!' )
;

Now, let's attempt to query for one of the slugs using just the COLLATE clause:

EXPLAIN SELECT
	m.id,
	m.slug,
	m.message
FROM
	secret_message m
WHERE
	m.slug COLLATE utf8_bin = 'aaaa'
;

When we run this EXPLAIN statement, we get the following result:

MySQL EXPLAIN showing a full-table scan when using COLLATE.

As you can see, despite the fact that the slug column is indexed, attempting to search the column using COLLATE leaves us performing a full-table scan. To fix this, we need to perform both a case-insensitive search and a case-sensitive search:

EXPLAIN SELECT
	m.id,
	m.slug,
	m.message
FROM
	secret_message m
WHERE
	-- NOTE: We have to perform the CASE INSENSITIVE search first so that we
	-- can leverage the INDEX on the table column, `slug`. If we try to go
	-- directly to the COLLATE directive, the query won't use the index and
	-- we do a full-table scan.
	m.slug = 'aaaa'
AND
	m.slug COLLATE utf8_bin = 'aaaa'
;

And, when we run this EXPLAIN statement, we get the following result:

MySQL EXPLAIN showing an index when using a case-insensitive search and COLLATE.

As you can see, now that we are using both comparisons, the database is only searching three rows for the three similar looking slugs:

  • aaaa
  • AAAA
  • aAaA

This is because the first WHERE condition performs a case-insensitive search on the slug column and then the second WHERE condition further filters the results based on a case-sensitive collation.

And, to demonstrate that this is actually locating the one correct row, here's the same query without the EXPLAIN directive:

MySQL locates the correct row using a combined case-insensitive and case-sensitive search approach.

As you can see, we located the aaaa row and ignored the similar AAAA and aAaA rows.

Obviously, in a perfect world, the schema of the table would just be updated to use a different collation so that we don't have to do anything special in our SQL statement. But, welcome to the real world where data is messy and deadlines are tight and you don't always get to change things to suit your needs. In such cases, it's nice to see that we can overlay a case-sensitive search on top of a case-insensitive column definition in MySQL.



Reader Comments

What has two thumbs and hopes you leave a comment? This Guy! (Ben Nadel).

Post A Comment

You — Get Out Of My Dreams, Get Into My Blog
Live in the Now
Oops!
NEW: Some basic markdown formatting is now supported: bold, italic, blockquotes, lists, fenced code-blocks. Read more about markdown syntax »
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.