Ask Ben: Selecting From The Union Of Two SQL Tables

Posted October 23, 2007 at 7:57 AM by Ben Nadel

Tags: SQL, Ask Ben

Hi Ben, it's me again... I have a MYSQL 5.X question for you: I have two tables - Table A: pru_properties and Table B: mls_properties. They have the exact same columns names, but different content inside of them since they get feed by different companies. The only matching column is MLSID. how can I do a CFQUERY for both tables using a GROUP BY MLSID?

I don't know much about MySQL, but I assume that it can do most the basic functions that Microsoft SQL Server can do, which is where I have the majority of my experience. As such, I can show you how to do this (testing it in MS SQL Server) and I am just going to guess that it is mostly transferable; afterall, isn't that why people choose MySQL? Because it is a free yet powerful SQL Server?

If you have two tables that have the same structure that you know is unique to both tables, the easiest method is to use the UNION ALL directive. UNION, on its own, gets the SQL Server to combine two queries into one result set. This can be a very slow process because the server will go through both result sets during the unioning action and try to eliminate duplicate records. By adding the ALL aspect, it signals to the server that duplicate checking is not required. This can greatly increase the speed of the unioning function.

That being said, if you need to run a GROUP BY on the resulting unioned table the easiest thing is probably to get the UNION ALL of two queries (one for each table) as the FROM table in the master query (the one containing the GROUP BY). Since, I don't have two tables in my system that line up like yours, I need to create two. Take a look at the example below - the first half of the code is just creation and population of two tables that simulate yours.

  • <!---
  • Query for girls whose combined pool uses
  • two different source tables.
  • --->
  • <cfquery name="qGirl" datasource="#REQUEST.DSN.Source#">
  • DECLARE
  • @girlA TABLE (
  • id INT,
  • name VARCHAR( 30 ),
  • hair VARCHAR( 10 )
  • )
  • ;
  •  
  • DECLARE
  • @girlB TABLE (
  • id INT,
  • name VARCHAR( 30 ),
  • hair VARCHAR( 10 )
  • )
  • ;
  •  
  •  
  • <!--- Populate the [A] table with three records. --->
  • INSERT INTO @girlA
  • (
  • id,
  • name,
  • hair
  • )(
  • SELECT 1, 'Sarah Vivenzio', 'Brunette' UNION ALL
  • SELECT 2, 'Julia Niles', 'Blonde' UNION ALL
  • SELECT 3, 'Kim Doria', 'Brunette'
  • );
  •  
  •  
  • <!--- Populate the [B] table with three records. --->
  • INSERT INTO @girlB
  • (
  • id,
  • name,
  • hair
  • )(
  • SELECT 7, 'Ashley Thomas', 'Brunette' UNION ALL
  • SELECT 8, 'Maria Bello', 'Blonde' UNION ALL
  • SELECT 9, 'Ellen DeGeneres', 'Blonde'
  • );
  •  
  •  
  • <!---
  • ASSERT: At this point, we have two tables that have
  • the same structure but unique data (especially the
  • ID column). This is how we can simulate two different
  • database tables.
  • --->
  •  
  •  
  • SELECT
  • t.hair,
  • ( COUNT( * ) ) AS girl_count
  • FROM
  • <!---
  • For our FROM clause, we are going to UNION ALL the
  • records from each of our two tables defined above.
  • We are going to use the UNION ALL (as opposed to
  • just UNION) to make sure that the database doesn't
  • waste time trying to eliminate duplicates.
  • --->
  • (
  •  
  • (
  • SELECT
  • ga.id,
  • ga.name,
  • ga.hair
  • FROM
  • @girlA ga
  • )
  •  
  • UNION ALL
  •  
  • (
  • SELECT
  • gb.id,
  • gb.name,
  • gb.hair
  • FROM
  • @girlB gb
  • )
  •  
  • ) AS t <!--- Alias the inline temp table. --->
  • GROUP BY
  • t.hair
  • ORDER BY
  • t.hair ASC
  • </cfquery>
  •  
  •  
  • <!--- Dump out query. --->
  • <cfdump
  • var="#qGirl#"
  • label="Two Table Select"
  • />

Notice that our FROM clause in the master query contains the UNION ALL'd SELECT of our two tables. This temp table is then aliased as "t" and the rest of the parent query can refer to "t." to gain access to that table. In my experience, the aliasing of the temp FROM table is required for this type of code to work.

Once we have this unioned table, I am then using a GROUP BY on the hair color and getting the aggregate count of the girls with that given hair color. Running the above code, we get the following CFDump output:


 
 
 

 
SELECT From Two Tables Being Joined By UNION ALL SQL Directive  
 
 
 

In my experience, both UNION and the UNION ALL actions are slow (with UNION ALL being somewhat faster). I am not sure I would recommend this type of action with any tables that have a ton of data in them. But then again, I hope that other can speak more to this sort of recommendation.



Reader Comments

Oct 23, 2007 at 9:16 AM // reply »
153 Comments

It may be worth your time to try the query both ways: SELECT of a UNION, and UNION of SELECTs.

SELECT a, b FROM (
SELECT a, b FROM t1
UNION ALL
SELECT a, b FROM t2
) AS a
WHERE (c = 1) AND (d = 2)

-- versus --

SELECT a, b FROM t1 WHERE (c = 1) AND (d = 2)
UNION ALL
SELECT a, b FROM t2 WHERE (c = 1) AND (d = 2)

It looks like you are just duplicating the WHERE clause to each table in the UNION, but you are also modifying the order of operations. Do you want to mash everything into one big pile and then pick through it, or do you want to pick through smaller piles and mash the results together at the end? I find the latter to be more efficient in most cases, especially for multi-processor machines, but YMMV.

Many RDBMSes will optimize the query to transform one into the other, but for large queries it's worth the extra few minutes of debugging to find out. (Especially since you aren't doing much more than cut-n-paste.)


Oct 23, 2007 at 9:30 AM // reply »
11,238 Comments

@Rick,

Thanks. I was hoping you might swoop in and drop some good advice. I see exactly what you are saying as to where the filtering is done and where the unioning is done. Good stuff.


Oct 24, 2007 at 12:38 AM // reply »
3 Comments

Thank you guys... nice and elegant way to do it.

I have a question about this JOIN:

I have "Table A" and "Table B" which now is a JOIN-Query with all data from both tables. Now, is there a way in which to tell the Query to overwrite fields of "Table B" with matching fields in "Table A" and no the other way around?.

I need to do this since the data from Table A it's more valuable and complete.

I hope my question makes any sence.

Thank in advance guys.


Oct 24, 2007 at 7:31 AM // reply »
11,238 Comments

@Felipe,

I just recently learned how to do just that - using a JOIN in a SQL UPDATE statement. I blogged about it not so long ago. Take a look here:

http://www.bennadel.com/index.cfm?dax=blog:938.view

Hope that helps.


Oct 24, 2007 at 9:56 AM // reply »
153 Comments

Felipe-

Ben's on the right track if you want to permanently overwrite the data, but what if you want to leave the source data alone and do it all within the query? That's a bit trickier.

Given tables A and B, where A is the more complete and useful table, with primary key PK and useful data column DD, your JOIN looks something like this:

SELECT
COALESCE(a.pk, b.pk) AS pk,
COALESCE(a.dd, b.dd) AS dd
FROM a FULL OUTER JOIN b ON (a.pk = b.pk)

The COALESCE function translates to "take the first value in this list that isn't null". Since you put table A first, it'll use that data if it is available, and only use B if A is not there.

If your system doesn't have FULL OUTER JOIN, you could do it like this:

SELECT pk, COALESCE(MIN(dda),MIN(ddb)) AS dd
FROM (
SELECT pk, dd AS dda, NULL as ddb FROM a
UNION ALL
SELECT pk, NULL AS dda, dd AS ddb FROM b
) AS c
GROUP BY pk

That's doing the same basic thing, but you're letting the GROUP BY handle the key-matching instead of the JOIN. Of course, this only works if you have a unique PK and aren't looking for multiple rows with the same PK.

If you don't like the GROUP BY solution and want to stick with a JOIN, there's a another option if you can't do a FULL OUTER JOIN:

SELECT a.pk, COALESCE(a.dd, b.dd) AS dd
FROM a LEFT OUTER JOIN b ON (a.pk = b.pk)
UNION ALL
SELECT b.pk b.dd
FROM b LEFT OUTER JOIN a ON (b.pk = a.pk)
WHERE (a.pk IS NULL)

That's a bit trickier -- the first query gets everything from A, including all of the matches in B, while the second gets everything in B that doesn't have a match in A.

HTH,
-R


Oct 26, 2007 at 7:46 PM // reply »
3 Comments

Nice!... next question will be "...is there anything, that you don't know how to do it...?" ja!

Thanks again Ben.

Felipe Serrano


Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
May 20, 2013 at 11:45 AM
Using jQuery's Animate() Step Callback Function To Create Custom Animations
This is really useful. I found out that you don't actually have to use a dummy css property (surprisingly). To animate a property in a linear-gradient for instance I did this this.css('someLinearGra ... read »
May 20, 2013 at 10:51 AM
Using A Dynamic Column Name With ValueList() In ColdFusion
@Josh, Oh snap! You're totally right! I'm not sure I've ever tried that. I did know that you can call a number of other array-methods on ColdFusion query columns: http://www.bennadel.com/blog/167 ... read »
May 20, 2013 at 10:45 AM
Using A Dynamic Column Name With ValueList() In ColdFusion
@Ben - I believe you can achieve the same functionality with ColdFusion's built in ArrayToList() function. ArrayToList( users[ "id" ] ); ... read »
May 20, 2013 at 10:21 AM
My Experience With AngularJS - The Super-heroic JavaScript MVW Framework
Is there any error logging and handling framework in angularjs, if not then in what way I can do this. ... read »
May 19, 2013 at 2:31 PM
My Experience With AngularJS - The Super-heroic JavaScript MVW Framework
It's funny really just how well that image describes the way I would imagine most people that go with angular for some project is. I have had a similar roller-coaster ride with it as well, but not qu ... read »
May 17, 2013 at 7:42 PM
HashKeyCopier - An AngularJS Utility Class For Merging Cached And Live Data
Ben - thanks so much for posting these Angular articles and findings, they've been a huge help towards learning one of the more 'complex' JavaScript frameworks out there (IMO). I have been using Angu ... read »
May 16, 2013 at 5:01 PM
UPDATE: Parsing CSV Data Files In ColdFusion With csvToArray()
Your code was the closest thing I've found to obtaining some direction for converting ISO fields to values that CF can translate properly. Thank you for posting! ... read »
May 15, 2013 at 6:07 PM
Making SOAP Web Service Requests With ColdFusion And CFHTTP
Ben, you once again saved my bacon at work. Thank you, thank you, thank you! ... read »
InVision App - Prototyping Made Beautiful With Prototyping Tools