I just love SQL. I've been writing web-applications on top of relational databases (RDBMS) for a while; and, SQL always feels like a luxurious combination of structure and flexibility. Yesterday, when working with some data scientists, I had to merge a few columns from a database table into a CSV (Comma-Separated Values) file that the data scientists had created. The CSV file was fairly small; so, I ended up using the awesome power of multiple-cursors in SublimeText 3 in order to write a SQL query that converted the CSV data into a derived table that I then consumed in an
INNER JOIN in MySQL.
To be clear, this is not the only way to consume CSV data in a relational database. But, I was working with a database in which I had read only permissions. As such, I couldn't load the CSV file into a "real table". Given the relatively small size of the CSV payload, what follows just felt like the appropriate level of effort.
That said, let's paint a picture of what was going on. Imagine that I had a database table called
friend with the following structure:
SHOW CREATE TABLE friend; -- Generates the following SQL statement: CREATE TABLE `friend` ( `id` int(11) NOT NULL, `name` varchar(50) NOT NULL, `age` int(11) NOT NULL, `catchPhrase` varchar(250) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8
Then, the data scientists, using an external data warehouse, generated a CSV file that contained the
friend.id value as the first column and a favorite meal as the second column:
id favorite_meal 1 "Pizza" 4 "Bella's Chicken Burritos" 5 "Vegan, Multi-Grain Pancakes"
Given this CSV file, they wanted to merge it with the
friend table so that they could see the
name value alongside the existing
favorite_meal values. To do this, I took their CSV data and - using SublimeText 3 - translated it into series of
UNION ALL statements:
( SELECT 1 AS id, "Pizza" AS favoriteMeal ) UNION ALL ( SELECT 4 AS id, "Bella's Chicken Burrito" AS favoriteMeal ) UNION ALL ( SELECT 5 AS id, "Vegan, Multi-Grain Pancakes" AS favoriteMeal )
This SQL generates a derived result-set in which each
SELECT statement represents a unique row. One of the hottest features of SQL is that I can then use a derived result-set as input to another query. As such, I was able to take this series of
SELECT statements and consume it as part of an
INNER JOIN on the
This left me with the following SQL:
SELECT f.id, f.name, tmp.favoriteMeal FROM -- Here, we are using the SELECT / UNION ALL technique to crate a derived table -- in which each CSV value becomes a column within the derived table. We can then -- JOIN to this derived table in order to augment it. ( ( SELECT 1 AS id, "Pizza" AS favoriteMeal ) UNION ALL ( SELECT 4 AS id, "Bella's Chicken Burrito" AS favoriteMeal ) UNION ALL ( SELECT 5 AS id, "Vegan, Multi-Grain Pancakes" AS favoriteMeal ) ) AS tmp INNER JOIN friend f ON f.id = tmp.id -- Joining `friend` to the derived table. ORDER BY f.id ASC ;
As you can see, I'm using my CSV-data-cum-temporary-table as the first part of my
INNER JOIN. This allows me to map the
friend.id column to the
tmp.id column, which subsequently gives me access to the
friend.name column in the results. And, when we run the above SQL in the MySQL database, we get the following output:
mysql> SELECT -> f.id, -> f.name, -> tmp.favoriteMeal -> FROM -> ( -> -> ( SELECT 1 AS id, "Pizza" AS favoriteMeal ) UNION ALL -> ( SELECT 4 AS id, "Bella's Chicken Burrito" AS favoriteMeal ) UNION ALL -> ( SELECT 5 AS id, "Vegan, Multi-Grain Pancakes" AS favoriteMeal ) -> -> ) AS tmp -> INNER JOIN -> friend f -> ON -> f.id = tmp.id -> ORDER BY -> f.id ASC -> ; +----+-------+-----------------------------+ | id | name | favoriteMeal | +----+-------+-----------------------------+ | 1 | Tina | Pizza | | 4 | Ralph | Bella's Chicken Burrito | | 5 | Sandi | Vegan, Multi-Grain Pancakes | +----+-------+-----------------------------+ 3 rows in set (0.00 sec)
Woot woot! As you can see, by merging the CSV data and the
friend table in a single SQL query, I was able to pull in the
friend.name column in order to augment the existing CSV column data.
Isn't SQL just thrilling?! I am sure there are good number of ways to accomplish what I demonstrated. But, in the "fog of war" on a Zoom call with several other people watching me live-type SQL, this felt like the right approach. But dang it, if SQL isn't just the cat's pajamas!
Want to use code from this post? Check out the license.