Continuing with the database-theme of the week, I wanted to share one piece of wisdom that I learned from our former director of Data, Brad Brewer. He taught me that you can use the
ORDER BY and
LIMIT clauses inside both
DELETE SQL statements within MySQL 5.6. While I wouldn't use this technique in my day-to-day CRUD (Create, Read, Update, Delete) code, I do use these options religiously when I am writing data clean-up and migration scripts (ie, those SQL scripts that you run outside of the main application).
The premise of the behavior is simple: you can use the
ORDER BY and
LIMIT clauses as a means to lock-down the scope of the resultant data mutation. When you are hand-crafting SQL statements that are about to alter production data, this added safety-guard can work wonders for your mental health and sense of safety.
To see this in action, let's look at an
UPDATE statement that will update the two most recently-added rows in a MySQL database able:
-- First, let's reset our test table for the demo. TRUNCATE TABLE limit_test ; INSERT INTO limit_test VALUES ( 1, 'One' ), ( 2, 'Two' ), ( 3, 'Three' ), ( 4, 'Four' ), ( 5, 'Five' ) ; -- Now that we have our demo-data in place, let's UPDATE the LAST TWO records in the -- table based on an ID-based sort. UPDATE limit_test SET name = CONCAT( name, ' ( Updated )' ) ORDER BY id DESC LIMIT 2 ;
As you can see, we are updating the
name column of the test table; but, we are using the
ORDER BY and
LIMIT clauses to ensure that we only touch the last two rows. And, when we look at the table afterwards, we get the following output:
mysql> SELECT * FROM limit_test; +----+------------------+ | id | name | +----+------------------+ | 1 | One | | 2 | Two | | 3 | Three | | 4 | Four ( Updated ) | | 5 | Five ( Updated ) | +----+------------------+ 5 rows in set (0.00 sec)
As you can see, only the last two records were mutated.
The same things works with
DELETE statements. In the following query, we're going to delete the last two rows:
-- First, let's reset our test table for the demo. TRUNCATE TABLE limit_test ; INSERT INTO limit_test VALUES ( 1, 'One' ), ( 2, 'Two' ), ( 3, 'Three' ), ( 4, 'Four' ), ( 5, 'Five' ) ; -- Now that we have our demo-data in place, let's DELETE the LAST TWO records in the -- table based on an ID-based sort. DELETE FROM limit_test ORDER BY id DESC LIMIT 2 ;
As you can see, it's the exact same concept - we're using the
ORDER BY and
LIMIT clauses to limit the scope of the delete. And, when we look at the table afterwards, we get the following output:
mysql> SELECT * FROM limit_test; +----+-------+ | id | name | +----+-------+ | 1 | One | | 2 | Two | | 3 | Three | +----+-------+ 3 rows in set (0.00 sec)
As you can see, the last two rows in the table have been deleted.
Sometimes, if I am feeling extra nervous about deleting production data by hand, I'll even take this a step further and include several redundant columns in the
WHERE clause in addition to the
DELETE FROM critical_data WHERE id = 4 -- Primary key (this should be sufficient on its own). AND name = 'Mission statement' -- Redundant. AND startedAt = '2020-01-18' -- Redundant. AND endedAt IS NULL -- Redundant. LIMIT 1 ;
Here, I'm deleting a single row based on its primary key (
4). In reality, this should be a sufficient condition on its own. However, when it comes to production data and hand-crafted SQL statements, it just feels too easy to make a mistake. As such, matching several other known column values in the same row offers the peace-of-mind that prevent me from vomiting on my keyboard the moment after I hit the "Run Query" button.
UPDATE Not Just For Peace-of-Mind
LIMIT clause within a SQL
UPDATE statement isn't just for the peace-of-mind; there are some types of workflows that can be powered by
LIMIT within an application. Imagine, for example, that you are iterating over a database table using multiple threads. To do this, you can use an
LIMIT combination to "claim" a chunk of rows to be processed.
In the follow SQL statement, a given worker thread (with a unique ID) is going to claim the next 100 unclaimed rows:
-- Claim the next 100 rows for this worker. UPDATE table_to_be_processed SET claimedBy = @workerID WHERE claimedBy IS NULL ORDER BY id ASC LIMIT 100 ; -- Return the CLAIMED rows to be processed by this worker thread within the application. SELECT * FROM table_to_be_processed WHERE claimedBy = @workerID ORDER BY id ASC ;
This multi-statement SQL query first claims at-most 100 unclaimed rows from the head of the table; and then, returns whichever rows are currently claimed by the worker thread. Such a technique allows you to divide and conquer the processing of a table while also ensuring that two different worker threads aren't accidentally processing the same records.
In traditional CRUD-based application code, I wouldn't normally use
ORDER BY and
LIMIT clauses in my SQL queries. However, when hand-crafting SQL, I do find that such additional query constraints bring a much needed sense of safety and well-being.
In this post, I mentioned that you could use the
UPDATE combination to
drive a multi-thread migration. Since that's a fun little topic on its
own, I wanted to follow-up with a more robust look at that concept:
In this follow-up post, I use concurrent worker threads to safely traverse a single database table, claiming rows to be processed by a given worker.