Skip to main content
Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.

Exploring Type-Coercion And Value Comparisons In A JSON Column In MySQL 5.7.32

By Ben Nadel on
Tags: SQL

Generally speaking, SQL is pretty lenient when it comes to simple types and value comparisons. Meaning, from a SQL execution standpoint, 1 and "1" are equal because the SQL engine will coerce the values as needed (much like ColdFusion). However, document databases like MongoDB are much less lenient and will not cast values on-the-fly. This got me thinking about the new JSON support in MySQL 5.7: will simple values get coerced when comparing an input to a JSON path?

CAUTION: While MySQL will happily coerce values on-the-fly during a comparison, note that this does have implications. The most critical of which is that type coercion will bypass index selection during query planning which may result in a full-table scan.

When it comes to the JSON data structures in MySQL 5.7, you can "extract" a value using JSON_EXTRACT() (or ->); and then, you can "unquote" an extracted value using JSON_UNQUOTE() (or ->>). The latter of these two methods will remove the quotes ("") from an extracted String, which may or may not change the actual data-type of the resolved value. Before digging into value comparisons, it wasn't clear to me how all of this would come together.

To test, we can create an in-memory JSON value using JSON_OBJECT(). Then, we can just run a number of comparisons against String and Number values:

SET @json = JSON_OBJECT( 'string', '1', 'number', 1 );

SELECT
	-- Control comparisons across types.
	( 1 = 1 AND 1 = '1' AND '1' = '1' ),

	-- Test JSON String against various inputs.
	JSON_EXTRACT( @json, '$.string' ) = '1',
	JSON_EXTRACT( @json, '$.string' ) = 1,
	JSON_UNQUOTE( JSON_EXTRACT( @json, '$.string' ) ) = '1',
	JSON_UNQUOTE( JSON_EXTRACT( @json, '$.string' ) ) = 1,

	-- Test JSON Number against various inputs.
	JSON_EXTRACT( @json, '$.number' ) = '1',
	JSON_EXTRACT( @json, '$.number' ) = 1,
	JSON_UNQUOTE( JSON_EXTRACT( @json, '$.number' ) ) = '1',
	JSON_UNQUOTE( JSON_EXTRACT( @json, '$.number' ) ) = 1
;

As you can see, we have a JSON payload that contains two keys: one that points to a "numeric" 1 and one that points to a "string" "1". Then, we proceed to compare those two values to both string and numeric inputs. And, when we run this SQL query in MySQL 5.7.32, we get the following output:

JSON values are type-sensitive in MySQL 5.7.32

As you can see, everything came back as TRUE except these two comparisons:

  • JSON_EXTRACT( @json, '$.string' ) = 1
  • JSON_EXTRACT( @json, '$.number' ) = '1'

When comparing an input to a JSON value, MySQL does not coerce the operands. Meaning, in the context of a JSON object, 1 and "1" are not the equal. However, what we can see is that when we "unwrap" the JSON value using JSON_UNQUOTE(), MySQL will fall-back to coercing strings and numbers on-the-fly.

JSON Columns: Garbage In, Garbage Out

One of the splendiferous features of a relational database table is that is enforces a schema at the storage level. Which means, if you have an INT column, your input is going to be stored as a number regardless of how you provide said value (1 vs "1"). A JSON data structure - in MySQL 5.7.32 - does not make any such guarantees. Beyond ensuring that said payload is, in fact, valid JSON (JavaScript Object Notation), the database makes no judgments about what is actually in said JSON payload.

Which means, garbage in is garbage out. In other words, it's up to you and your application layer to make sure all inputs to a JSON structure are consistent. Any mess-up and it means you're not going to get the expected query results. To see what I mean, let's look at this SQL which accidentally commingles a String value amongst a bunch of Numbers:

SELECT
	COUNT( * )
FROM
	(
		SELECT JSON_OBJECT( 'value', 1 ) AS payload UNION ALL
		SELECT JSON_OBJECT( 'value', 1 ) AS payload UNION ALL
		SELECT JSON_OBJECT( 'value', '1' ) AS payload UNION ALL -- OOPS!!!!
		SELECT JSON_OBJECT( 'value', 1 ) AS payload		
	) AS derived
WHERE
	derived.payload->'$.value' = 1
;

As you can see, we have 4 rows that contain a .value property. However, when we go to query for the rows in which the .value property is 1, we get the following COUNT(*):

3

We're missing a row - the one in which someone accidentally stored the .value property as a String.

ASIDE: As we saw in the previous section, MySQL will fall-back to coercing values on-the-fly if we "unwrap" the JSON values. Which means, we'll get back all 4 rows if we change payload->'$.value' to payload->>'$.value', which is a short-hand syntax for the JSON_UNQUOTE(JSON_EXTRACT()) combination.

With great flexibility comes great responsibility. When you opt-out of the strict schema provided by a relational database, it means that the onus of enforcement falls upon the developer. You now become the one responsible for data integrity. Proceed with caution!



Reader Comments

What has two thumbs and hopes you leave a comment? This Guy! (Ben Nadel).

Post A Comment

You — Get Out Of My Dreams, Get Into My Blog
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.