Yesterday at InVision, I was going through the MySQL Slow Query logs, as I often do, and came across a query that was consistently doing full table scans. If you know anything about relational database systems, the phrase, "full table scan," probably makes you squirm a little bit - they're generally bad for performance. So, I started investigating the problem only to be completely stumped. The query looked like it should have been using an index. Luckily, Jesse Dearing on our data services team was able to step in and point out that it was an issue of type coercion during MySQL query planning.
The slow query that I witnessed doing full table scans looked like this:
SELECT e.id, e.userID, e.appID, e.createdAt FROM external_app e WHERE e.appID = 48702918 ;
Super basic, right? And, when I looked at the data in the external_app table, it looked like this:
The data and the incoming query parameter looked copacetic. And, when the query ran, it did indeed return a single row. But, when I ran an EXPLAIN on the SQL query, it was clear that no index was being used:
As Jesse Dearing later pointed out, the issue was that the query parameter - 48702918 - was being passed-in as an Int when it needed to be passed-in as a String. See, while the data in the external_app table looked like it contained numbers, the SHOW CREATE TABLE told a different story:
CREATE TABLE `external_app` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `userID` int(10) unsigned NOT NULL, `appID` varchar(50) NOT NULL, `createdAt` datetime NOT NULL, PRIMARY KEY (`id`), KEY `ix_byApp` (`appID`) USING BTREE ) ENGINE=InnoDB AUTO_INCREMENT=100001 DEFAULT CHARSET=utf8
It turns out, the "appID" column was actually a VARCHAR(50) column, not an INT column. Apparently, some of the appID values contain non-numeric characters. But, since the vast majority of the appID values were numbers, the VARCHAR nature of the column was not obvious to me when I looked at the data directly. So, when we were passing an Int in to match on a VARCHAR field, the query's WHERE clause would do the type coercion to locate the proper record; but, the data type mismatch was bypassing the intended Index selection during query planning.
To fix this, all we have to do is pass the query parameter in as a string and the query planner will start using the proper index:
In retrospect, I feel silly having missed this; but, I think it was the first time that I've dealt with a non-numeric column whose name contained "ID". Excuses aside, however, it's good to know that type coercion / type mismatches have an impact on index selection during MySQL's query planning.