Ask Ben: Selecting From The Union Of Two SQL Tables

Posted October 23, 2007 at 7:57 AM by Ben Nadel

Tags: SQL, Ask Ben

Hi Ben, it's me again... I have a MYSQL 5.X question for you: I have two tables - Table A: pru_properties and Table B: mls_properties. They have the exact same columns names, but different content inside of them since they get feed by different companies. The only matching column is MLSID. how can I do a CFQUERY for both tables using a GROUP BY MLSID?

I don't know much about MySQL, but I assume that it can do most the basic functions that Microsoft SQL Server can do, which is where I have the majority of my experience. As such, I can show you how to do this (testing it in MS SQL Server) and I am just going to guess that it is mostly transferable; afterall, isn't that why people choose MySQL? Because it is a free yet powerful SQL Server?

If you have two tables that have the same structure that you know is unique to both tables, the easiest method is to use the UNION ALL directive. UNION, on its own, gets the SQL Server to combine two queries into one result set. This can be a very slow process because the server will go through both result sets during the unioning action and try to eliminate duplicate records. By adding the ALL aspect, it signals to the server that duplicate checking is not required. This can greatly increase the speed of the unioning function.

That being said, if you need to run a GROUP BY on the resulting unioned table the easiest thing is probably to get the UNION ALL of two queries (one for each table) as the FROM table in the master query (the one containing the GROUP BY). Since, I don't have two tables in my system that line up like yours, I need to create two. Take a look at the example below - the first half of the code is just creation and population of two tables that simulate yours.

  • <!---
  • Query for girls whose combined pool uses
  • two different source tables.
  • --->
  • <cfquery name="qGirl" datasource="#REQUEST.DSN.Source#">
  • DECLARE
  • @girlA TABLE (
  • id INT,
  • name VARCHAR( 30 ),
  • hair VARCHAR( 10 )
  • )
  • ;
  •  
  • DECLARE
  • @girlB TABLE (
  • id INT,
  • name VARCHAR( 30 ),
  • hair VARCHAR( 10 )
  • )
  • ;
  •  
  •  
  • <!--- Populate the [A] table with three records. --->
  • INSERT INTO @girlA
  • (
  • id,
  • name,
  • hair
  • )(
  • SELECT 1, 'Sarah Vivenzio', 'Brunette' UNION ALL
  • SELECT 2, 'Julia Niles', 'Blonde' UNION ALL
  • SELECT 3, 'Kim Doria', 'Brunette'
  • );
  •  
  •  
  • <!--- Populate the [B] table with three records. --->
  • INSERT INTO @girlB
  • (
  • id,
  • name,
  • hair
  • )(
  • SELECT 7, 'Ashley Thomas', 'Brunette' UNION ALL
  • SELECT 8, 'Maria Bello', 'Blonde' UNION ALL
  • SELECT 9, 'Ellen DeGeneres', 'Blonde'
  • );
  •  
  •  
  • <!---
  • ASSERT: At this point, we have two tables that have
  • the same structure but unique data (especially the
  • ID column). This is how we can simulate two different
  • database tables.
  • --->
  •  
  •  
  • SELECT
  • t.hair,
  • ( COUNT( * ) ) AS girl_count
  • FROM
  • <!---
  • For our FROM clause, we are going to UNION ALL the
  • records from each of our two tables defined above.
  • We are going to use the UNION ALL (as opposed to
  • just UNION) to make sure that the database doesn't
  • waste time trying to eliminate duplicates.
  • --->
  • (
  •  
  • (
  • SELECT
  • ga.id,
  • ga.name,
  • ga.hair
  • FROM
  • @girlA ga
  • )
  •  
  • UNION ALL
  •  
  • (
  • SELECT
  • gb.id,
  • gb.name,
  • gb.hair
  • FROM
  • @girlB gb
  • )
  •  
  • ) AS t <!--- Alias the inline temp table. --->
  • GROUP BY
  • t.hair
  • ORDER BY
  • t.hair ASC
  • </cfquery>
  •  
  •  
  • <!--- Dump out query. --->
  • <cfdump
  • var="#qGirl#"
  • label="Two Table Select"
  • />

Notice that our FROM clause in the master query contains the UNION ALL'd SELECT of our two tables. This temp table is then aliased as "t" and the rest of the parent query can refer to "t." to gain access to that table. In my experience, the aliasing of the temp FROM table is required for this type of code to work.

Once we have this unioned table, I am then using a GROUP BY on the hair color and getting the aggregate count of the girls with that given hair color. Running the above code, we get the following CFDump output:


 
 
 

 
SELECT From Two Tables Being Joined By UNION ALL SQL Directive  
 
 
 

In my experience, both UNION and the UNION ALL actions are slow (with UNION ALL being somewhat faster). I am not sure I would recommend this type of action with any tables that have a ton of data in them. But then again, I hope that other can speak more to this sort of recommendation.



Reader Comments

Oct 23, 2007 at 9:16 AM // reply »
153 Comments

It may be worth your time to try the query both ways: SELECT of a UNION, and UNION of SELECTs.

SELECT a, b FROM (
SELECT a, b FROM t1
UNION ALL
SELECT a, b FROM t2
) AS a
WHERE (c = 1) AND (d = 2)

-- versus --

SELECT a, b FROM t1 WHERE (c = 1) AND (d = 2)
UNION ALL
SELECT a, b FROM t2 WHERE (c = 1) AND (d = 2)

It looks like you are just duplicating the WHERE clause to each table in the UNION, but you are also modifying the order of operations. Do you want to mash everything into one big pile and then pick through it, or do you want to pick through smaller piles and mash the results together at the end? I find the latter to be more efficient in most cases, especially for multi-processor machines, but YMMV.

Many RDBMSes will optimize the query to transform one into the other, but for large queries it's worth the extra few minutes of debugging to find out. (Especially since you aren't doing much more than cut-n-paste.)


Oct 23, 2007 at 9:30 AM // reply »
11,314 Comments

@Rick,

Thanks. I was hoping you might swoop in and drop some good advice. I see exactly what you are saying as to where the filtering is done and where the unioning is done. Good stuff.


Oct 24, 2007 at 12:38 AM // reply »
3 Comments

Thank you guys... nice and elegant way to do it.

I have a question about this JOIN:

I have "Table A" and "Table B" which now is a JOIN-Query with all data from both tables. Now, is there a way in which to tell the Query to overwrite fields of "Table B" with matching fields in "Table A" and no the other way around?.

I need to do this since the data from Table A it's more valuable and complete.

I hope my question makes any sence.

Thank in advance guys.


Oct 24, 2007 at 7:31 AM // reply »
11,314 Comments

@Felipe,

I just recently learned how to do just that - using a JOIN in a SQL UPDATE statement. I blogged about it not so long ago. Take a look here:

http://www.bennadel.com/index.cfm?dax=blog:938.view

Hope that helps.


Oct 24, 2007 at 9:56 AM // reply »
153 Comments

Felipe-

Ben's on the right track if you want to permanently overwrite the data, but what if you want to leave the source data alone and do it all within the query? That's a bit trickier.

Given tables A and B, where A is the more complete and useful table, with primary key PK and useful data column DD, your JOIN looks something like this:

SELECT
COALESCE(a.pk, b.pk) AS pk,
COALESCE(a.dd, b.dd) AS dd
FROM a FULL OUTER JOIN b ON (a.pk = b.pk)

The COALESCE function translates to "take the first value in this list that isn't null". Since you put table A first, it'll use that data if it is available, and only use B if A is not there.

If your system doesn't have FULL OUTER JOIN, you could do it like this:

SELECT pk, COALESCE(MIN(dda),MIN(ddb)) AS dd
FROM (
SELECT pk, dd AS dda, NULL as ddb FROM a
UNION ALL
SELECT pk, NULL AS dda, dd AS ddb FROM b
) AS c
GROUP BY pk

That's doing the same basic thing, but you're letting the GROUP BY handle the key-matching instead of the JOIN. Of course, this only works if you have a unique PK and aren't looking for multiple rows with the same PK.

If you don't like the GROUP BY solution and want to stick with a JOIN, there's a another option if you can't do a FULL OUTER JOIN:

SELECT a.pk, COALESCE(a.dd, b.dd) AS dd
FROM a LEFT OUTER JOIN b ON (a.pk = b.pk)
UNION ALL
SELECT b.pk b.dd
FROM b LEFT OUTER JOIN a ON (b.pk = a.pk)
WHERE (a.pk IS NULL)

That's a bit trickier -- the first query gets everything from A, including all of the matches in B, while the second gets everything in B that doesn't have a match in A.

HTH,
-R


Oct 26, 2007 at 7:46 PM // reply »
3 Comments

Nice!... next question will be "...is there anything, that you don't know how to do it...?" ja!

Thanks again Ben.

Felipe Serrano


Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
seb
Jun 20, 2013 at 2:32 AM
Working With Inherited Collections In AngularJS
@mike, @ben, The best article about scope and prototypal prototypical inheritance in angularjs is http://stackoverflow.com/questions/14049480/what-are-the-nuances-of-scope-prototypal-prototypical- ... read »
Jun 20, 2013 at 2:17 AM
ColdFusion NumberFormat() Exploration
Nice read thanks Ben, Is there a way to mask a negative number? Long story short in the finance sector when you go 'short' on a stock you want the price to fall this is a good thing because you are ... read »
Jun 20, 2013 at 1:09 AM
The Beauty Of The jQuery Each() Method
my html code : <html> <head> <script type="text/javascript" src="jquery.js"></script> <script type="text/javascript" src="nss.js"> ... read »
Jun 19, 2013 at 11:31 PM
Directive Link, $observe, And $watch Functions Execute Inside An AngularJS Context
@Ben, bunch to learn indeed, but thats fun part : ) ... read »
Jun 19, 2013 at 10:41 PM
Referencing ColdFusion Query Columns In A Loop Using Both Array And Dot Notation
Burdock-roots Are you going fat day by day? You need to be good for your family and make some money too. So we bring for you a best product that helps you to be more energetic every day. You will b ... read »
Jun 19, 2013 at 9:52 PM
Working With Inherited Collections In AngularJS
I recognize the applicability of your solution, and how easy it makes to share data across multiple views or even "submodules" of rather simple application. But it seems to me that it creat ... read »
Jun 19, 2013 at 9:38 PM
Directive Link, $observe, And $watch Functions Execute Inside An AngularJS Context
@Alesei, Glad you like it. Even after working with AngularJS for months, I still get a bunch of unexpected, "$digest is already in progress". So hard to debug sometimes! ... read »
Jun 19, 2013 at 9:36 PM
Working With Inherited Collections In AngularJS
@Mike, The relationship of $scope values is definitely an interesting thing! But it's not simple - it really forces you to understand prototypal inheritance, which is not at all a simple topic! Gla ... read »
InVision App - Prototyping Made Beautiful With Prototyping Tools