# Using A Rough Box Model To Gather Near-By Zip Codes

By on
Tags:

Yesterday, I explored the mathematics behind finding the approximate distance between two latitude/longitude points on a map. This worked well, but don't be mislead - this calculation is still a rough estimation. Because the static values used in the equation are less accurate as you move away from the equator (the longitudinal lines get closer together), the distance becomes less accurate the closer you get to the poles. However, since we usually only care about these calculations in populated areas (ie. away from the Poles) and in a small radii, we can somewhat discount the outlier inaccuracies.

Originally, I was going to use these mathematical equations to write up a demo on finding all zip codes within an arbitrary radius to a given zip code. But based on some articles that I read and on some comments that were made to my previous post, I am feeling that this is completely unnecessary. Once you realize that a zip code can cover a large area, you realize that getting latitude and longitude from a zip code builds in a high degree of inaccuracy right from the very start. And, since this is going to be somewhat inaccurate to being with, we might as well use a fuzzier calculation that is much easier to use.

This fuzzier calculation is the bounding box model. Imagine that we have a starting zip code (10016): Now, given this origin, rather than worry about a circular radius, we are going to create a bounding box that is plus and minus the radius in each plane of movement: From the graphic, you can see that the areas covered by the circular radius (dotted black line) and the bounding box are roughly the same. Sure, the bounding box covers a greater area, but since the whole latitude/longitude-from-zip-code reading is somewhat inaccurate to begin with, I don't think this larger area is much cause for concern.

Now that we have settled on giving the bounding box model a try, how do we figure out what are our radius is? Well, we have the rough estimation that each degree of latitude on the map represents 69.09 miles. By using some simple algebra, we can find the degree-radius by taking a percentage based on this previously stated static value:

So, let's say we want to get the +/- radius in degrees for a 10 mile radius, the equation would be:

RiD = (10 / 69.09) = 0.14474 degrees

Ok, cool. Now, let's put this to a test - I am going to gather all zip codes in a three mile radius to my origin (10016) and then I'm going to map those using the Google Maps API:

``````<!---
Set the approximate miles per latitude. We will use this to
create a rough box model used to find proximal zip codes.
--->
<cfset flMilesPerDegree = 69.09 />

<!---
Now, let's determine what kind of mileage we want to use to
gather near-by zip codes.
--->

<!---
We need to convert our mile-bases search to a degree-delta
that can be used to find min and max latitude and longitude.
To do so, we are going to find what percentage of the
miles-per-degree equates to our mile radius.
--->

<!---
To calculate the rough box model, we need to get the
latitude and longitude of our start zip code.
--->
<cfquery name="qOrigin" datasource="#REQUEST.DSN.Source#">
SELECT
z.id,
z.city,
z.state,
z.zip,
z.latitude,
z.longitude,
CAST( z.longitude AS DECIMAL(10,8) ) AS test
FROM
zip_code_index z

<!--- Original location in NYC (my office!) --->
WHERE
z.zip = '10016'
</cfquery>

<!---
Now that we have our target zip code's latitude and
longitude value, we can calculate the min and max
longitude and latitude values of our rough box model.
This will be the +/- degree radiues in each plane.
--->
<cfset flMinLatitude = (qOrigin.latitude - flDegreeRadius) />
<cfset flMaxLatitude = (qOrigin.latitude + flDegreeRadius) />
<cfset flMinLongitude = (qOrigin.longitude - flDegreeRadius) />
<cfset flMaxLongitude = (qOrigin.longitude + flDegreeRadius) />

<!---
Using the rough box model, get all of the zip codes that
fall within the box.
--->
<cfquery name="qZipCode" datasource="#REQUEST.DSN.Source#">
SELECT
z.id,
z.city,
z.state,
z.zip,
z.latitude,
z.longitude,

<!--- Set an empty value for distance. --->
( 0 ) AS distance
FROM
zip_code_index z
WHERE
z.latitude >= <cfqueryparam value="#flMinLatitude#" cfsqltype="cf_sql_float" />
AND
z.latitude <= <cfqueryparam value="#flMaxLatitude#" cfsqltype="cf_sql_float" />
AND
z.longitude >= <cfqueryparam value="#flMinLongitude#" cfsqltype="cf_sql_float" />
AND
z.longitude <= <cfqueryparam value="#flMaxLongitude#" cfsqltype="cf_sql_float" />
</cfquery>

<!---
Now that we have the estimated zip codes, let's loop over
them and update the column for the more accurate (but still
just an appoximation) distance. By doing this, we can compare
the limitations of the rough box model to the somewhat more
accurate mathematical calculation.
--->
<cfloop query="qZipCode">

<!--- Store mathematical distnace. --->
<cfset qZipCode[ "distance" ][ qZipCode.CurrentRow ] = JavaCast(
"int",
GetLatitudeLongitudeProximity(
qOrigin.latitude,
qOrigin.longitude,
qZipCode.latitude,
qZipCode.longitude
)
) />

</cfloop>

<!--- Now that we have all of the zip codes, let's map them. --->

<!--- Create Google map holder. --->
<div id="map" style="height: 500px ; width: 540px ;"></div>

<!-- Include the Google Maps script. -->
<script type="text/javascript" src="........"></script>
<script type="text/javascript">

// This Initializes the makeout map.
function InitMap() {
var objMap = null;
var arrLocations = new Array();
var intI = 0;

// Create the Google map using the DIV container.
objMap = new GMap2( document.getElementById( "map" ) );

// Loop over each zip code in our search and create a
// location for each latitude and longitude.
<cfloop query="qZipCode">

arrLocations[ arrLocations.length ] = {
Point: new GLatLng(
#qZipCode.latitude#,
#qZipCode.longitude#
),
Text: "#qZipCode.zip#"
};

</cfloop>

// Update the map controlls and center.

// Center the map on our original location.
objMap.setCenter(
new GLatLng( #qOrigin.latitude#, #qOrigin.longitude# ),
12
);

// Loop over the locations in our queue and actually
// add them to the map as markers.
for (intI = 0 ; intI < arrLocations.length ; intI++){

// Add the marker to the map.
objMap, arrLocations[ intI ].Point,
arrLocations[ intI ].Text
);

}
}

// This adds a marker to the given map.
var objMarker = new GMarker( Point );

// Add the click handler for the marker.
objMarker,
"click",
function() {
objMarker.openInfoWindowHtml( Text );
}
);

}

// Init map after a slight delay. I have found that using a
// slight delay helps the map load for some reason.
setTimeout( InitMap, 500 );

</script>
``````

As you can see, we use our radius to calculate the minimum and maximum latitude and longitude values of our rough box model. Then, rather than messing around with any complicated mathematical formulas, we simply gather all zip codes whose latitude and longitude fall within this fuzzy box model. When we run this code, we get the following map: Now, Manhattan is not the best example as it is surrounded by water (which cannot have zip codes on it obviously); but, I hope you can see that the large majority of the zip codes covered by the box model fall within the more accurate circular radius. This seems pretty darn good to me.

Based on the graphic, we can obviously see that some of the locations fall outside what would have been the mathematically calculated coverage. But, is that true? In the code above, you'll notice that after I gathered the zip codes, I then went and stored the mathematical distance calculation based on latitude and longitude back into the query. What happens when we check our zip code query to get returned zip codes that are outside the more accurate circular radius:

``````<!---
Gather all zip codes whose calcualted distnace (based on our
mathematical formulas) is farther away from our origin than
--->
SELECT
zip,
distance
FROM
qZipCode
WHERE
distance > <cfqueryparam value="#intMileRadius#" cfsqltype="cf_sql_integer" />
ORDER BY
distance DESC
</cfquery>

<!--- Check to see if we have any bad zips. --->

<!--- Output bad zip codes (ones that are too far away). --->

</cfloop>

<cfelse>

<!--- There were no bad zip codes. --->
All zip codes within calculated radius!

</cfif>
``````

When we run this code, checking for bad zip codes, we get the following output:

All zip codes within calculated radius!

Very interesting! Even though our Google map indicates that some of the locations would fall outside our circular area, all zip codes returned by the fuzzy box model (in my particular example) are just as accurate as the mathematical calculations would have been. I think that this demonstrates two things:

1. The mathematical approach is not fully accurate to begin with (at least at our level of mathematical complexity), and should not be thought of as such.

2. The fuzzy box model approach is accurate enough, when compared to the mathematical calculations, to justify using it for the sake of much greater simplicity.

So, in conclusion, when you need to find all the zip codes in proximity to a given zip code, the bounding box approach is going to be accurate enough for most of your everyday use cases, and faster to run than any mathematical calculations.

Want to use code from this post? Check out the license. Fantastic to see the changes, and the google map implementation. Nice findings Ben. I'm even thinking about a nice app involving maps now :) Thanks fellas. It's good to have this knowledge in the tool-belt before I go out and kill myself trying to deal with Cosines and Sines and Tangents (oh my!). I have to wonder how your performance would change if you built denormalized tables for common radius values. That is:

CREATE TABLE Zip_50MI {
start_zip INT NOT NULL,
near_zip INT NOT NULL,
distance_mi TINYINT NOT NULL
}
ALTER TABLE Zip_50MI ADD PRIMARY KEY (start_zip, near_zip)

Then, run a SQL query to get the zip codes within a 50-mile radius of start_zip and populate the table. Yes, the table would be larger than a zip code table, but with that primary key in there it would still probably be faster than doing the math each and every time.

SELECT close_zip, distance_mi
FROM Zip_50mi
WHERE (start_zip = 90036)
ORDER BY distance_mi DESC

Build a Zip_50mi table, a Zip_25mi table, etc, and you are good to go. The Zip_25mi table would be a subset of the Zip_50mi table, etc.

You could try just a Zip_100mi table and no others, then use the distance_mi field as part of the WHERE clause, but I'd be willing to bet that it wouldn't be nearly as fast. Maybe the optimizer is smart enough, but it would take some testing. Derf. Curly braces. Sorry. That's what I get for typing comments directly into the box and not testing them first. But you get my point. Having done extensive work in this area of mapping, keep in mind that Zip codes are postal delivery routes. Specifically, they are street/road segment demarkations to optimize mail delivery. The lat-lon values associated with Zip Codes are the values for the centroid for that mail delivery area, so you do have some zipcodes ("C" shaped & "L" shaped) where it's centroid isn't even within the zipcodes "boundaries" (I use that term loosely with zipcodes).

That being said, if gross approximations of nearby zip-codes is the task at hand, then by all means, the proposed methods are fine. If something more accurate is expected, zipcode centroids shouldn't be relied upon.

Lastly, what is the application? In other words, why use zipcodes? In my opinion zipcode postal delivery routes for locating geographies on a map are now somewhat irrelevant. Unless you really love your postal mail(wo)man, it usually makes a lot more sense to search by neighborhood, subdivision, intersection, etc.

Good work though. One of the signs of best-of-breed software development is to simplify things. Well done.

Paul Cormier
besthomepro.com
Search. Find. Move In. Seriously, I'm amazed! The ColdFusion community needs more people like you. Thank you for contributing your knowledge and time into the CF community. If you put this much effort into an "Ask Ben" question, I could only imagine the work you must do for your company. It really shows how much you love what you do. @Rick,

Hmm, interesting. Yeah, that would be a huge table :) But, I suppose you could leverage it in a bi-directional way; meaning, that if a start zip is 50 miles to a near zip, then we know that the near zip is 50 miles to the start zip. This would not require double storage of this knowledge. Of course, based on indexing, you might want to double up on the storage. Would be interesting to see.

@Paul,

When I think about this type of application, I think about "Store Locators" on vendor web sites; as in, find a Home Depot within 10 miles of me. With that said, I think the gross estimation is totally fine. Especially when you think that people aren't ignorant about their own surroundings. After all, people aren't *really* using these types of applications to calculate distance - rather they are simply using zip code location as a way to find locations. Once they see a location, they can determine on their own if its within a reasonable distance and how to best go about getting there.

As far as why we use zip codes - I think it's the easiest thing that everyone knows that doesn't have to be location specific. For example, in Manhattan, sure it would be nice to say, look up locations by "Midtown" or "Downtown", but that makes sense to me only cause I know the area. If I was in the country, maybe I could look up "County"? But even then, what if I don't know the area? What if I am visiting? It always easy to grab a piece of mail and see the zip code on the address label - the information is so readily available.

@Brian,

Always glad to help out! Plus, I learn something new myself every time :) Awesome, Ben. This is going to be really useful for a project I'm working on. Thanks a million. @Tony,

Sounds good. Please free to let us know how it works out and any insights you may gain along the way. This is code and logic I spent a long time trying to find implemented in CF. I appreciate you taking the time to sort out a public solution. You could also argue that, in many cases, doing any of this on the db server at all is now a moot point. If you use the Google Maps API correctly, you can have it send an AJAX request with the Lat/Lon bounding box. No math needed!

I wrote up some proof-of-concept code for this a while back:

http://nowrists.com/where/

In that app, the database is just a set of Lat/Lon coords and annotations. As the user drills down into the map, the browser requests the correct bounding box, so the SQL query is just a BETWEEN statement -- no icky trig. @Rick,

You gotta love the power of Google :) Way to leverage their tools. @Rick O : Your lazy man's method (which I heartily applaud) reminds me of some advice given to me by my father-in-law many years ago. He told me, "Hal, the person who can do the work of ten men is amazing, but it's far better to be the person who can get ten men to do your work." @RICK O - That is an awesome demo of using CF & Google together. I'd love to see the code behind it if you're open to sharing it.

@BEN - You rock as always. I love your blog as it always challenges viewers to think outside the box (or inside the box in this case). Bravo! @RICK O - Thanks. I appreciate you sharing and will definitely take a look. @Todd,

Thanks a lot man :) Great idea - I was trying to figure out how to make this function faster for my site. This approximation brought my query time from 3 seconds down to about 0.2 seconds, with very little loss of "accuracy".

One thing I changed - distance per degree of latitude will always be about 69.09 miles, but the distance per degree of longitude is different than the distance per degree of latitude. And also, distance per degree of longitude will vary as you move away from the equator. What I found was that in the continental US, it's about 45 miles per degree of longitude in the north and about 63 miles in the south. So I went to the next level of detail and saved the "Miles per degree longitude" on my zip codes table (a one-time UPDATE statement using the "true" distance calculation), and then used that in my queries. @Jay,

That sounds like a good improvement with little overhead! Thanks Ben! Might come in handy with my current project. @John,

Cool man, let us know if it works out. Does this code work for mysql? I seem to be having issues running it.

First the line:
CAST( z.longitude AS DECIMAL(10,8) ) AS test
throws me this nasty error: You have an error in your SQL syntax;

When I remove that line of code, I get the following error:
Variable GETLATITUDELONGITUDEPROXIMITY is undefined.

Any idea why this is happening, I literally copied and pasted the code and changed the necessary sql table/column data and that was it. I would love to be able to get this code working! Thanks! I think it's great. I'm going to use a variation of this in an app I'm building to determine location of kids for sports teams. In other words, if they live close together, put them on the same team to keep travel time to a minimum.

Thanks again because you did it in CF!!! Rarely do I come across something this unique using CF. @Joey,

Glad to be of help - sounds like a really useful application that you are building. A little follow up. This worked great with a yahoo map mashup I have been working on. It has been accurate. @John,

Awesome to hear; it's one thing in theory - another thing to actually know it works in practice. Long time reader, first time poster. Yeah, not to make your head any bigger, but every time I Google a somewhat complicated CF issue...there you are. I just used REMatch() after reading your post on it. (I gave you credit in my blog:)

I can usually adjust your code to make it work for me, but was wondering if you had a "right" solution before I tried.

Is it possible to "reorder" the qZipCode query after you calculate the distances using GetLatitudeLongitudeProximity? I want to show a list of stores ordered by distance. The results would produce several pages. I've done this before with ASP, except instead of using SQL I just shoved the results into an array and sorted that. That worked fine on 1 page, but displaying multiple pages might get messy.

I apologize for not trying it myself first, but I thought your readers might benefit from this too.

Thanks! @Nathan,

Thanks my man - that is really flattering to hear :) As far as the sorting, if you have a query that you want to re-order, a lot of times, I'll go ahead and actually add a new column to the query using queryAddColumn(). Then, I'll loop over the query to populate that new column with the "test" value that I want to sort on. Then, it's simply a matter of query of queries to get the query in the right order.

I hope that helps point you in the right direction if you haven't already solved this (sorry - I'm behind on my blog commenting). Awesome! I was searching for a solution to this and when I saw that you had posted something on your blog, I got a little giddy (admittedly) because your posts are always so clear and concise. Thanks for taking the time to document your discoveries.

With my code, I ran a query of queries to resort the results by distance and included an additional WHERE clause to limit the distance to the original "intMileRadius" value. It ended up being a quick and easy way to refine the accuracy of the rough box model. One more thing to add.... everyone has a different style of coding. I took this code along with the functions from the previous blog post and converted them into a single CFC. Here's an example call to the CFC:

<cfinvoke component="#cfcRoot#.zipcodes" method="getProximityZips" returnvariable="qGetProximityZips">
<cfinvokeargument name="dsn" value="#yourDSN#">
<cfinvokeargument name="originZip" value="#originZip#">
</cfinvoke>

And here's the CFC code:

<!--- --------------------------------------------------------------------------------------- ----
Blog Entry:
Using A Rough Box Model To Gather Near-By Zip Codes

Authors:
Ben Nadel / Kinky Solutions (original coding)
Matt Glaze (conversion to CFC)

Original Post:
Feb 10, 2009 at 9:58 AM
CFC Conversion:
Feb 24, 2011 at 2:02 PM
---- --------------------------------------------------------------------------------------- --->

<cfcomponent displayname="Zipcodes Component">
<cffunction name="getProximityZips" access="public" returntype="query">
<cfargument name="dsn" type="string" required="true">
<cfargument name="originZip" type="numeric" required="true">

<cfset milesPerDegree = 69.09>

<cfinvoke method="getOriginZipDetails" returnvariable="qGetOriginZipDetails">
<cfinvokeargument name="dsn" value="#Arguments.dsn#">
<cfinvokeargument name="originZip" value="#Arguments.originZip#">
</cfinvoke>

<cfset minLatitude = (qGetOriginZipDetails.latitude - degreeRadius)>
<cfset maxLatitude = (qGetOriginZipDetails.latitude + degreeRadius)>
<cfset minLongitude = (qGetOriginZipDetails.longitude - degreeRadius)>
<cfset maxLongitude = (qGetOriginZipDetails.longitude + degreeRadius)>

<cfinvoke method="getMatchingZips" returnvariable="qGetMatchingZips">
<cfinvokeargument name="dsn" value="#Arguments.dsn#">
<cfinvokeargument name="minLatitiude" value="#minLatitude#">
<cfinvokeargument name="maxLatitiude" value="#maxLatitude#">
<cfinvokeargument name="minLongitude" value="#minLongitude#">
<cfinvokeargument name="maxLongitude" value="#maxLongitude#">
</cfinvoke>

<cfloop query="qGetMatchingZips">
<cfinvoke method="getLatLongProximity" returnvariable="zipDistance">
<cfinvokeargument name="fromLatitude" value="#qGetOriginZipDetails.latitude#">
<cfinvokeargument name="fromLongitude" value="#qGetOriginZipDetails.longitude#">
<cfinvokeargument name="toLatitude" value="#qGetMatchingZips.latitude#">
<cfinvokeargument name="toLongitude" value="#qGetMatchingZips.longitude#">
<cfinvokeargument name="milesPerDegree" value="#milesPerDegree#">
</cfinvoke>
<cfset qGetMatchingZips[ "distance" ][ qGetMatchingZips.CurrentRow ] = JavaCast("int",zipDistance) />
</cfloop>

<cfinvoke method="reOrderZips" returnvariable="qGetProximityZips">
<cfinvokeargument name="qGetMatchingZips" value="#qGetMatchingZips#">
</cfinvoke>

<cfreturn qGetProximityZips>
</cffunction>

<cffunction name="getOriginZipDetails" access="private" returntype="query">
<cfargument name="dsn" type="string" required="true">
<cfargument name="originZip" type="numeric" required="true">
<cfquery name="qGetOriginZipDetails" datasource="#Arguments.dsn#">
SELECT latitude,
longitude
FROM zipcodes
WHERE zipcode = <cfqueryparam value="#Arguments.originZip#" cfsqltype="cf_sql_integer" null="false">
</cfquery>
<cfreturn qGetOriginZipDetails>
</cffunction>

<cffunction name="getMatchingZips" access="private" returntype="query">
<cfargument name="dsn" type="string" required="true">
<cfargument name="minLatitiude" type="numeric" required="true">
<cfargument name="maxLatitiude" type="numeric" required="true">
<cfargument name="minLongitude" type="numeric" required="true">
<cfargument name="maxLongitude" type="numeric" required="true">
<cfquery name="qGetMatchingZips" datasource="#Arguments.dsn#">
SELECT zipcode,
city,
statecode,
statename,
latitude,
longitude,
<!--- Set an empty value for distance. --->
( 0 ) AS distance
FROM zipcodes
WHERE latitude >= <cfqueryparam value="#Arguments.minLatitiude#" cfsqltype="cf_sql_float" />
AND latitude <= <cfqueryparam value="#Arguments.maxLatitiude#" cfsqltype="cf_sql_float" />
AND longitude >= <cfqueryparam value="#Arguments.minLongitude#" cfsqltype="cf_sql_float" />
AND longitude <= <cfqueryparam value="#Arguments.maxLongitude#" cfsqltype="cf_sql_float" />
</cfquery>
<cfreturn qGetMatchingZips>
</cffunction>

<cffunction name="reOrderZips" access="private" returntype="query">
<cfargument name="qGetMatchingZips" type="query" required="true">
<cfquery name="qReOrderZips" dbtype="query">
SELECT *
FROM qGetMatchingZips
WHERE distance <= <cfqueryparam value="#Arguments.mileRadius#" cfsqltype="cf_sql_integer" null="false">
ORDER BY distance,
zipcode
</cfquery>
<cfreturn qReOrderZips>
</cffunction>

<cffunction name="getLatLongProximity" access="private" returntype="numeric" output="false">
<cfargument name="fromLatitude" type="numeric" required="true">
<cfargument name="fromLongitude" type="numeric" required="true">
<cfargument name="toLatitude" type="numeric" required="true">
<cfargument name="toLongitude" type="numeric" required="true">
<cfargument name="milesPerDegree" type="numeric" required="true">
ACos(
(
Sin( DegreesToRadians( ARGUMENTS.FromLatitude ) ) *
)
+
(
Cos( DegreesToRadians( ARGUMENTS.FromLatitude ) ) *
Cos( DegreesToRadians( ARGUMENTS.ToLatitude ) ) *
Cos( DegreesToRadians( ARGUMENTS.ToLongitude - ARGUMENTS.FromLongitude ) )
)
)
) />
<cfreturn degreeDistance>
</cffunction>

<cfargument name="Degrees" type="numeric" required="true" hint="I am the degree value to be converted to radians.">
<cfreturn (ARGUMENTS.Degrees * Pi() / 180) />
</cffunction>

<cfargument name="Radians" type="numeric" required="true" hint="I am the radian value to be converted to degrees.">
<cfreturn (ARGUMENTS.Radians * 180 / Pi()) />
</cffunction>
</cfcomponent>  