Ask Ben: Selecting From The Union Of Two SQL Tables
Posted October 23, 2007 at 7:57 AM by Ben Nadel
Hi Ben, it's me again... I have a MYSQL 5.X question for you: I have two tables - Table A: pru_properties and Table B: mls_properties. They have the exact same columns names, but different content inside of them since they get feed by different companies. The only matching column is MLSID. how can I do a CFQUERY for both tables using a GROUP BY MLSID?
I don't know much about MySQL, but I assume that it can do most the basic functions that Microsoft SQL Server can do, which is where I have the majority of my experience. As such, I can show you how to do this (testing it in MS SQL Server) and I am just going to guess that it is mostly transferable; afterall, isn't that why people choose MySQL? Because it is a free yet powerful SQL Server?
If you have two tables that have the same structure that you know is unique to both tables, the easiest method is to use the UNION ALL directive. UNION, on its own, gets the SQL Server to combine two queries into one result set. This can be a very slow process because the server will go through both result sets during the unioning action and try to eliminate duplicate records. By adding the ALL aspect, it signals to the server that duplicate checking is not required. This can greatly increase the speed of the unioning function.
That being said, if you need to run a GROUP BY on the resulting unioned table the easiest thing is probably to get the UNION ALL of two queries (one for each table) as the FROM table in the master query (the one containing the GROUP BY). Since, I don't have two tables in my system that line up like yours, I need to create two. Take a look at the example below - the first half of the code is just creation and population of two tables that simulate yours.
- Query for girls whose combined pool uses
- two different source tables.
- <cfquery name="qGirl" datasource="#REQUEST.DSN.Source#">
- @girlA TABLE (
- id INT,
- name VARCHAR( 30 ),
- hair VARCHAR( 10 )
- @girlB TABLE (
- id INT,
- name VARCHAR( 30 ),
- hair VARCHAR( 10 )
- <!--- Populate the [A] table with three records. --->
- INSERT INTO @girlA
- SELECT 1, 'Sarah Vivenzio', 'Brunette' UNION ALL
- SELECT 2, 'Julia Niles', 'Blonde' UNION ALL
- SELECT 3, 'Kim Doria', 'Brunette'
- <!--- Populate the [B] table with three records. --->
- INSERT INTO @girlB
- SELECT 7, 'Ashley Thomas', 'Brunette' UNION ALL
- SELECT 8, 'Maria Bello', 'Blonde' UNION ALL
- SELECT 9, 'Ellen DeGeneres', 'Blonde'
- ASSERT: At this point, we have two tables that have
- the same structure but unique data (especially the
- ID column). This is how we can simulate two different
- database tables.
- ( COUNT( * ) ) AS girl_count
- For our FROM clause, we are going to UNION ALL the
- records from each of our two tables defined above.
- We are going to use the UNION ALL (as opposed to
- just UNION) to make sure that the database doesn't
- waste time trying to eliminate duplicates.
- @girlA ga
- UNION ALL
- @girlB gb
- ) AS t <!--- Alias the inline temp table. --->
- GROUP BY
- ORDER BY
- t.hair ASC
- <!--- Dump out query. --->
- label="Two Table Select"
Notice that our FROM clause in the master query contains the UNION ALL'd SELECT of our two tables. This temp table is then aliased as "t" and the rest of the parent query can refer to "t." to gain access to that table. In my experience, the aliasing of the temp FROM table is required for this type of code to work.
Once we have this unioned table, I am then using a GROUP BY on the hair color and getting the aggregate count of the girls with that given hair color. Running the above code, we get the following CFDump output:
| || || |
| || |
| || || |
In my experience, both UNION and the UNION ALL actions are slow (with UNION ALL being somewhat faster). I am not sure I would recommend this type of action with any tables that have a ton of data in them. But then again, I hope that other can speak more to this sort of recommendation.
What Other People Are Searching For
[ local search ]
select from two tables
[ local search ] union all two tables
[ local search ] how to union all
[ local search ] how to select from two tables
[ local search ] how to join two tables into one recordset
It may be worth your time to try the query both ways: SELECT of a UNION, and UNION of SELECTs.
SELECT a, b FROM (
SELECT a, b FROM t1
SELECT a, b FROM t2
) AS a
WHERE (c = 1) AND (d = 2)
-- versus --
SELECT a, b FROM t1 WHERE (c = 1) AND (d = 2)
SELECT a, b FROM t2 WHERE (c = 1) AND (d = 2)
It looks like you are just duplicating the WHERE clause to each table in the UNION, but you are also modifying the order of operations. Do you want to mash everything into one big pile and then pick through it, or do you want to pick through smaller piles and mash the results together at the end? I find the latter to be more efficient in most cases, especially for multi-processor machines, but YMMV.
Many RDBMSes will optimize the query to transform one into the other, but for large queries it's worth the extra few minutes of debugging to find out. (Especially since you aren't doing much more than cut-n-paste.)
Thanks. I was hoping you might swoop in and drop some good advice. I see exactly what you are saying as to where the filtering is done and where the unioning is done. Good stuff.
Thank you guys... nice and elegant way to do it.
I have a question about this JOIN:
I have "Table A" and "Table B" which now is a JOIN-Query with all data from both tables. Now, is there a way in which to tell the Query to overwrite fields of "Table B" with matching fields in "Table A" and no the other way around?.
I need to do this since the data from Table A it's more valuable and complete.
I hope my question makes any sence.
Thank in advance guys.
I just recently learned how to do just that - using a JOIN in a SQL UPDATE statement. I blogged about it not so long ago. Take a look here:
Hope that helps.
Ben's on the right track if you want to permanently overwrite the data, but what if you want to leave the source data alone and do it all within the query? That's a bit trickier.
Given tables A and B, where A is the more complete and useful table, with primary key PK and useful data column DD, your JOIN looks something like this:
COALESCE(a.pk, b.pk) AS pk,
COALESCE(a.dd, b.dd) AS dd
FROM a FULL OUTER JOIN b ON (a.pk = b.pk)
The COALESCE function translates to "take the first value in this list that isn't null". Since you put table A first, it'll use that data if it is available, and only use B if A is not there.
If your system doesn't have FULL OUTER JOIN, you could do it like this:
SELECT pk, COALESCE(MIN(dda),MIN(ddb)) AS dd
SELECT pk, dd AS dda, NULL as ddb FROM a
SELECT pk, NULL AS dda, dd AS ddb FROM b
) AS c
GROUP BY pk
That's doing the same basic thing, but you're letting the GROUP BY handle the key-matching instead of the JOIN. Of course, this only works if you have a unique PK and aren't looking for multiple rows with the same PK.
If you don't like the GROUP BY solution and want to stick with a JOIN, there's a another option if you can't do a FULL OUTER JOIN:
SELECT a.pk, COALESCE(a.dd, b.dd) AS dd
FROM a LEFT OUTER JOIN b ON (a.pk = b.pk)
SELECT b.pk b.dd
FROM b LEFT OUTER JOIN a ON (b.pk = a.pk)
WHERE (a.pk IS NULL)
That's a bit trickier -- the first query gets everything from A, including all of the matches in B, while the second gets everything in B that doesn't have a match in A.
Nice!... next question will be "...is there anything, that you don't know how to do it...?" ja!
Thanks again Ben.