Ask Ben: Selecting From The Union Of Two SQL Tables

Posted October 23, 2007 at 7:57 AM

Tags: SQL, Ask Ben

Hi Ben, it's me again... I have a MYSQL 5.X question for you: I have two tables - Table A: pru_properties and Table B: mls_properties. They have the exact same columns names, but different content inside of them since they get feed by different companies. The only matching column is MLSID. how can I do a CFQUERY for both tables using a GROUP BY MLSID?

I don't know much about MySQL, but I assume that it can do most the basic functions that Microsoft SQL Server can do, which is where I have the majority of my experience. As such, I can show you how to do this (testing it in MS SQL Server) and I am just going to guess that it is mostly transferable; afterall, isn't that why people choose MySQL? Because it is a free yet powerful SQL Server?

If you have two tables that have the same structure that you know is unique to both tables, the easiest method is to use the UNION ALL directive. UNION, on its own, gets the SQL Server to combine two queries into one result set. This can be a very slow process because the server will go through both result sets during the unioning action and try to eliminate duplicate records. By adding the ALL aspect, it signals to the server that duplicate checking is not required. This can greatly increase the speed of the unioning function.

That being said, if you need to run a GROUP BY on the resulting unioned table the easiest thing is probably to get the UNION ALL of two queries (one for each table) as the FROM table in the master query (the one containing the GROUP BY). Since, I don't have two tables in my system that line up like yours, I need to create two. Take a look at the example below - the first half of the code is just creation and population of two tables that simulate yours.

 Launch code in new window » Download code as text file »

  • <!---
  • Query for girls whose combined pool uses
  • two different source tables.
  • --->
  • <cfquery name="qGirl" datasource="#REQUEST.DSN.Source#">
  • DECLARE
  • @girlA TABLE (
  • id INT,
  • name VARCHAR( 30 ),
  • hair VARCHAR( 10 )
  • )
  • ;
  •  
  • DECLARE
  • @girlB TABLE (
  • id INT,
  • name VARCHAR( 30 ),
  • hair VARCHAR( 10 )
  • )
  • ;
  •  
  •  
  • <!--- Populate the [A] table with three records. --->
  • INSERT INTO @girlA
  • (
  • id,
  • name,
  • hair
  • )(
  • SELECT 1, 'Sarah Vivenzio', 'Brunette' UNION ALL
  • SELECT 2, 'Julia Niles', 'Blonde' UNION ALL
  • SELECT 3, 'Kim Doria', 'Brunette'
  • );
  •  
  •  
  • <!--- Populate the [B] table with three records. --->
  • INSERT INTO @girlB
  • (
  • id,
  • name,
  • hair
  • )(
  • SELECT 7, 'Ashley Thomas', 'Brunette' UNION ALL
  • SELECT 8, 'Maria Bello', 'Blonde' UNION ALL
  • SELECT 9, 'Ellen DeGeneres', 'Blonde'
  • );
  •  
  •  
  • <!---
  • ASSERT: At this point, we have two tables that have
  • the same structure but unique data (especially the
  • ID column). This is how we can simulate two different
  • database tables.
  • --->
  •  
  •  
  • SELECT
  • t.hair,
  • ( COUNT( * ) ) AS girl_count
  • FROM
  • <!---
  • For our FROM clause, we are going to UNION ALL the
  • records from each of our two tables defined above.
  • We are going to use the UNION ALL (as opposed to
  • just UNION) to make sure that the database doesn't
  • waste time trying to eliminate duplicates.
  • --->
  • (
  •  
  • (
  • SELECT
  • ga.id,
  • ga.name,
  • ga.hair
  • FROM
  • @girlA ga
  • )
  •  
  • UNION ALL
  •  
  • (
  • SELECT
  • gb.id,
  • gb.name,
  • gb.hair
  • FROM
  • @girlB gb
  • )
  •  
  • ) AS t <!--- Alias the inline temp table. --->
  • GROUP BY
  • t.hair
  • ORDER BY
  • t.hair ASC
  • </cfquery>
  •  
  •  
  • <!--- Dump out query. --->
  • <cfdump
  • var="#qGirl#"
  • label="Two Table Select"
  • />

Notice that our FROM clause in the master query contains the UNION ALL'd SELECT of our two tables. This temp table is then aliased as "t" and the rest of the parent query can refer to "t." to gain access to that table. In my experience, the aliasing of the temp FROM table is required for this type of code to work.

Once we have this unioned table, I am then using a GROUP BY on the hair color and getting the aggregate count of the girls with that given hair color. Running the above code, we get the following CFDump output:


 
 
 

 
SELECT From Two Tables Being Joined By UNION ALL SQL Directive  
 
 
 

In my experience, both UNION and the UNION ALL actions are slow (with UNION ALL being somewhat faster). I am not sure I would recommend this type of action with any tables that have a ton of data in them. But then again, I hope that other can speak more to this sort of recommendation.

Download Code Snippet ZIP File

Post Comment  |  Ask Ben  |  Other Searches  |  Print Page



Learning ColdFusion 9 - ColdFusion 9 tutorials, samples, examples, demos

Reader Comments

Oct 23, 2007 at 9:16 AM // reply »
153 Comments

It may be worth your time to try the query both ways: SELECT of a UNION, and UNION of SELECTs.

SELECT a, b FROM (
SELECT a, b FROM t1
UNION ALL
SELECT a, b FROM t2
) AS a
WHERE (c = 1) AND (d = 2)

-- versus --

SELECT a, b FROM t1 WHERE (c = 1) AND (d = 2)
UNION ALL
SELECT a, b FROM t2 WHERE (c = 1) AND (d = 2)

It looks like you are just duplicating the WHERE clause to each table in the UNION, but you are also modifying the order of operations. Do you want to mash everything into one big pile and then pick through it, or do you want to pick through smaller piles and mash the results together at the end? I find the latter to be more efficient in most cases, especially for multi-processor machines, but YMMV.

Many RDBMSes will optimize the query to transform one into the other, but for large queries it's worth the extra few minutes of debugging to find out. (Especially since you aren't doing much more than cut-n-paste.)


Oct 23, 2007 at 9:30 AM // reply »
7,572 Comments

@Rick,

Thanks. I was hoping you might swoop in and drop some good advice. I see exactly what you are saying as to where the filtering is done and where the unioning is done. Good stuff.


Oct 24, 2007 at 12:38 AM // reply »
3 Comments

Thank you guys... nice and elegant way to do it.

I have a question about this JOIN:

I have "Table A" and "Table B" which now is a JOIN-Query with all data from both tables. Now, is there a way in which to tell the Query to overwrite fields of "Table B" with matching fields in "Table A" and no the other way around?.

I need to do this since the data from Table A it's more valuable and complete.

I hope my question makes any sence.

Thank in advance guys.


Oct 24, 2007 at 7:31 AM // reply »
7,572 Comments

@Felipe,

I just recently learned how to do just that - using a JOIN in a SQL UPDATE statement. I blogged about it not so long ago. Take a look here:

http://www.bennadel.com/index.cfm?dax=blog:938.view

Hope that helps.


Oct 24, 2007 at 9:56 AM // reply »
153 Comments

Felipe-

Ben's on the right track if you want to permanently overwrite the data, but what if you want to leave the source data alone and do it all within the query? That's a bit trickier.

Given tables A and B, where A is the more complete and useful table, with primary key PK and useful data column DD, your JOIN looks something like this:

SELECT
COALESCE(a.pk, b.pk) AS pk,
COALESCE(a.dd, b.dd) AS dd
FROM a FULL OUTER JOIN b ON (a.pk = b.pk)

The COALESCE function translates to "take the first value in this list that isn't null". Since you put table A first, it'll use that data if it is available, and only use B if A is not there.

If your system doesn't have FULL OUTER JOIN, you could do it like this:

SELECT pk, COALESCE(MIN(dda),MIN(ddb)) AS dd
FROM (
SELECT pk, dd AS dda, NULL as ddb FROM a
UNION ALL
SELECT pk, NULL AS dda, dd AS ddb FROM b
) AS c
GROUP BY pk

That's doing the same basic thing, but you're letting the GROUP BY handle the key-matching instead of the JOIN. Of course, this only works if you have a unique PK and aren't looking for multiple rows with the same PK.

If you don't like the GROUP BY solution and want to stick with a JOIN, there's a another option if you can't do a FULL OUTER JOIN:

SELECT a.pk, COALESCE(a.dd, b.dd) AS dd
FROM a LEFT OUTER JOIN b ON (a.pk = b.pk)
UNION ALL
SELECT b.pk b.dd
FROM b LEFT OUTER JOIN a ON (b.pk = a.pk)
WHERE (a.pk IS NULL)

That's a bit trickier -- the first query gets everything from A, including all of the matches in B, while the second gets everything in B that doesn't have a match in A.

HTH,
-R


Oct 26, 2007 at 7:46 PM // reply »
3 Comments

Nice!... next question will be "...is there anything, that you don't know how to do it...?" ja!

Thanks again Ben.

Felipe Serrano


Post Comment  |  Ask Ben

Recent Blog Comments
Mar 21, 2010 at 6:32 AM
ColdFusion CFPOP - My First Look
Apologies... The field name in the db for C. is "BounceCode" It stores the code / message which is returned in the email. Sorry for the confusion. ... read »
Mar 21, 2010 at 6:29 AM
ColdFusion CFPOP - My First Look
@Jose Galdamez, Hi Ben and Jose 1st of all.. big thanks to Jose for his Skype chat a few weeks back. Your time was much appreciated. I have come up with a rather unelegant solution to my problem a ... read »
Mar 21, 2010 at 3:42 AM
A New Wrist Pain
Chiropractic treatment is one of the best methods for treating numerous health problems naturally. After years of experience being a chiropractor, I have found that it is a powerful way to solve many ... read »
Mar 20, 2010 at 12:07 PM
Drawing On The iPhone Canvas With jQuery And ColdFusion
Simply awesome. Saved my day. ... read »
Mar 20, 2010 at 9:00 AM
Building A Fixed-Position Bottom Menu Bar (ala FaceBook)
I would like to say thx for an easy way to create a bottom bar. I do have a ?. Is it possible to center the bar if i want to resize it to ex 85%. Regards Offenbach ... read »
Mar 19, 2010 at 7:26 PM
MySQL 3/4 - com.mysql.jdbc.Driver And allowMultiQueries=true
Thank you very much for this post. Adding allowMultiQueries="true" in context.xml didn't help until I added it to url as allowMultiQueries=true Good idea is to use prepared statements and it will he ... read »
Jim
Mar 19, 2010 at 4:49 PM
Nobody Puts Baby In The Corner!
Wow. This is like suddenly finding a support group for your secret shame. I'm not alone! I always liked this movie, even though it is extremely cheesy. I just wish Jennifer Grey hadn't gotten the ... read »
Mar 19, 2010 at 4:47 PM
Application.cfc OnRequest() Method Affects OnError() Arguments
@Jason and @Ben, I've been doing some CF9 refactoring on our systems and noticed an odd occurrence with onError as well. Found a way to work around my problem, but what I saw was... Background: Our ... read »