Yesterday, I explored the mathematics behind finding the approximate distance between two latitude/longitude points on a map. This worked well, but don't be mislead - this calculation is still a rough estimation. Because the static values used in the equation are less accurate as you move away from the equator (the longitudinal lines get closer together), the distance becomes less accurate the closer you get to the poles. However, since we usually only care about these calculations in populated areas (ie. away from the Poles) and in a small radii, we can somewhat discount the outlier inaccuracies.
Originally, I was going to use these mathematical equations to write up a demo on finding all zip codes within an arbitrary radius to a given zip code. But based on some articles that I read and on some comments that were made to my previous post, I am feeling that this is completely unnecessary. Once you realize that a zip code can cover a large area, you realize that getting latitude and longitude from a zip code builds in a high degree of inaccuracy right from the very start. And, since this is going to be somewhat inaccurate to being with, we might as well use a fuzzier calculation that is much easier to use.
This fuzzier calculation is the bounding box model. Imagine that we have a starting zip code (10016):
Now, given this origin, rather than worry about a circular radius, we are going to create a bounding box that is plus and minus the radius in each plane of movement:
From the graphic, you can see that the areas covered by the circular radius (dotted black line) and the bounding box are roughly the same. Sure, the bounding box covers a greater area, but since the whole latitude/longitude-from-zip-code reading is somewhat inaccurate to begin with, I don't think this larger area is much cause for concern.
Now that we have settled on giving the bounding box model a try, how do we figure out what are our radius is? Well, we have the rough estimation that each degree of latitude on the map represents 69.09 miles. By using some simple algebra, we can find the degree-radius by taking a percentage based on this previously stated static value:
RadiusInDegrees = (RadiusInMiles / MilesPerDegree)
So, let's say we want to get the +/- radius in degrees for a 10 mile radius, the equation would be:
RiD = (10 / 69.09) = 0.14474 degrees
Ok, cool. Now, let's put this to a test - I am going to gather all zip codes in a three mile radius to my origin (10016) and then I'm going to map those using the Google Maps API:
As you can see, we use our radius to calculate the minimum and maximum latitude and longitude values of our rough box model. Then, rather than messing around with any complicated mathematical formulas, we simply gather all zip codes whose latitude and longitude fall within this fuzzy box model. When we run this code, we get the following map:
Now, Manhattan is not the best example as it is surrounded by water (which cannot have zip codes on it obviously); but, I hope you can see that the large majority of the zip codes covered by the box model fall within the more accurate circular radius. This seems pretty darn good to me.
Based on the graphic, we can obviously see that some of the locations fall outside what would have been the mathematically calculated coverage. But, is that true? In the code above, you'll notice that after I gathered the zip codes, I then went and stored the mathematical distance calculation based on latitude and longitude back into the query. What happens when we check our zip code query to get returned zip codes that are outside the more accurate circular radius:
<!--- Gather all zip codes whose calcualted distnace (based on our mathematical formulas) is farther away from our origin than our given radius. ---> <cfquery name="qBadZip" dbtype="query"> SELECT zip, distance FROM qZipCode WHERE distance > <cfqueryparam value="#intMileRadius#" cfsqltype="cf_sql_integer" /> ORDER BY distance DESC </cfquery> <!--- Check to see if we have any bad zips. ---> <cfif qBadZip.RecordCount> <!--- Output bad zip codes (ones that are too far away). ---> <cfloop query="qBadZip"> #qBadZip.zip# - #qBadZip.distance#<br /> </cfloop> <cfelse> <!--- There were no bad zip codes. ---> All zip codes within calculated radius! </cfif>
When we run this code, checking for bad zip codes, we get the following output:
All zip codes within calculated radius!
Very interesting! Even though our Google map indicates that some of the locations would fall outside our circular area, all zip codes returned by the fuzzy box model (in my particular example) are just as accurate as the mathematical calculations would have been. I think that this demonstrates two things:
The mathematical approach is not fully accurate to begin with (at least at our level of mathematical complexity), and should not be thought of as such.
The fuzzy box model approach is accurate enough, when compared to the mathematical calculations, to justify using it for the sake of much greater simplicity.
So, in conclusion, when you need to find all the zip codes in proximity to a given zip code, the bounding box approach is going to be accurate enough for most of your everyday use cases, and faster to run than any mathematical calculations.
Want to use code from this post? Check out the license.