Using SELECT And UNION ALL To JOIN SQL Tables To CSV Data In MySQL

By Ben Nadel

Published 2020-05-09 in SQL — Comments (1)

I just love SQL. I've been writing web-applications on top of relational databases (RDBMS) for a while; and, SQL always feels like a luxurious combination of structure and flexibility. Yesterday, when working with some data scientists, I had to merge a few columns from a database table into a CSV (Comma-Separated Values) file that the data scientists had created. The CSV file was fairly small; so, I ended up using the awesome power of multiple-cursors in SublimeText 3 in order to write a SQL query that converted the CSV data into a derived table that I then consumed in an INNER JOIN in MySQL.

To be clear, this is not the only way to consume CSV data in a relational database. But, I was working with a database in which I had read only permissions. As such, I couldn't load the CSV file into a "real table". Given the relatively small size of the CSV payload, what follows just felt like the appropriate level of effort.

That said, let's paint a picture of what was going on. Imagine that I had a database table called friend with the following structure:

SHOW CREATE TABLE friend;
-- Generates the following SQL statement:

CREATE TABLE `friend` (
	`id` int(11) NOT NULL,
	`name` varchar(50) NOT NULL,
	`age` int(11) NOT NULL,
	`catchPhrase` varchar(250) NOT NULL,
	PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Then, the data scientists, using an external data warehouse, generated a CSV file that contained the friend.id value as the first column and a favorite meal as the second column:

id	favorite_meal
1	"Pizza"
4	"Bella's Chicken Burritos"
5	"Vegan, Multi-Grain Pancakes"

Given this CSV file, they wanted to merge it with the friend table so that they could see the name value alongside the existing id and favorite_meal values. To do this, I took their CSV data and - using SublimeText 3 - translated it into series of SELECT and UNION ALL statements:

( SELECT 1 AS id, "Pizza" AS favoriteMeal ) UNION ALL
( SELECT 4 AS id, "Bella's Chicken Burrito" AS favoriteMeal ) UNION ALL
( SELECT 5 AS id, "Vegan, Multi-Grain Pancakes" AS favoriteMeal )

This SQL generates a derived result-set in which each SELECT statement represents a unique row. One of the hottest features of SQL is that I can then use a derived result-set as input to another query. As such, I was able to take this series of SELECT statements and consume it as part of an INNER JOIN on the friend table:

Ben Nadel writing SQL that merges a CSV dataset with a database table in MySQL.

This left me with the following SQL:

SELECT
	f.id,
	f.name,
	tmp.favoriteMeal
FROM
	-- Here, we are using the SELECT / UNION ALL technique to crate a derived table
	-- in which each CSV value becomes a column within the derived table. We can then
	-- JOIN to this derived table in order to augment it.
	(

		( SELECT 1 AS id, "Pizza" AS favoriteMeal ) UNION ALL
		( SELECT 4 AS id, "Bella's Chicken Burrito" AS favoriteMeal ) UNION ALL
		( SELECT 5 AS id, "Vegan, Multi-Grain Pancakes" AS favoriteMeal )

	) AS tmp
INNER JOIN
	friend f
ON
	f.id = tmp.id -- Joining `friend` to the derived table.
ORDER BY
	f.id ASC
;

As you can see, I'm using my CSV-data-cum-temporary-table as the first part of my INNER JOIN. This allows me to map the friend.id column to the tmp.id column, which subsequently gives me access to the friend.name column in the results. And, when we run the above SQL in the MySQL database, we get the following output:

mysql> SELECT
    ->     f.id,
    ->     f.name,
    ->     tmp.favoriteMeal
    -> FROM
    ->     (
    -> 
    ->         ( SELECT 1 AS id, "Pizza" AS favoriteMeal ) UNION ALL
    ->         ( SELECT 4 AS id, "Bella's Chicken Burrito" AS favoriteMeal ) UNION ALL
    ->         ( SELECT 5 AS id, "Vegan, Multi-Grain Pancakes" AS favoriteMeal )
    -> 
    ->     ) AS tmp
    -> INNER JOIN
    ->     friend f
    -> ON
    ->     f.id = tmp.id
    -> ORDER BY
    ->     f.id ASC
    -> ;
+----+-------+-----------------------------+
| id | name  | favoriteMeal                |
+----+-------+-----------------------------+
|  1 | Tina  | Pizza                       |
|  4 | Ralph | Bella's Chicken Burrito     |
|  5 | Sandi | Vegan, Multi-Grain Pancakes |
+----+-------+-----------------------------+
3 rows in set (0.00 sec)

Woot woot! As you can see, by merging the CSV data and the friend table in a single SQL query, I was able to pull in the friend.name column in order to augment the existing CSV column data.

Isn't SQL just thrilling?! I am sure there are good number of ways to accomplish what I demonstrated. But, in the "fog of war" on a Zoom call with several other people watching me live-type SQL, this felt like the right approach. But dang it, if SQL isn't just the cat's pajamas!

Mugatu saying, That relational database, so hot right now!

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/3823

Reader Comments

Ben Nadel Mar 31, 2024 at 11:18 AM

16,256 Comments

As of MySQL 8.0.19, this whole concept gets even easier. There is now a VALUES / ROW construct that is purpose built for providing static data as a derived table:

www.bennadel.com/blog/4626-using-values-row-to-create-derived-table-from-static-values-in-mysql-8-0-19.htm

It removes all of the unnecessary noise and cruft.

Oh my chickens, this post is old!

Hit me up on LinkedIn if you want to discuss it further.