ZIP Code Database Created from the 2000 Census Data
A few months ago, I wrote a blog entry describing how you could use ASP.NET AJAX along with the Virtual Earth Map Control to create a web based map application that locates ZIP Code boundaries by drawing a shaded region on the maps surface. You can view the interactive demo for this here.
The number one comment I received from that post was:
That's cool, but where did you get the ZIP Code dataset from?
Well ... I got this data from the U.S Census Bureau's Cartographic web site. The site contains links that allow you to download the data files for each of the 50 states along with Puerto Rico and District of Columbia. The data is from the 2000 census, so it is already almost eight years old. Nevertheless it makes for a pretty good test dataset (if you need a more up to date dataset there are plently of vendors that specialize in this).
Unfortunately the data files on the Census site are in their own special format, so before I could use them I had to learn the files schema. Below are two screenshots of small portions of each of these files.
The first data file (Figure 1) contains the 5 digit ZIP Code along with a pointer (237 in this example) into the second data file (Figure 2). The second data file contains the actual latitude/longitude values for each of the coordinates that define the ZIP Code's perimeter. The very first latitude/longitude pair for each of the ZIP Code's is the center point of the ZIP Code.
So after I understood the format of both files and how they were related to each other, I created a .Net console application that parsed the two files and inserted the corresponding latitude/longitude pairs into a database on my local SQL Server. Here is the schema for the table that I used ...
After the database and table existed I processed the data files for all 50 states by running my parsing application. After processing the files and inserting the boundary values, I was able to query the table and locate the coordinates for ZIP Code 32224 like so ...
I have exported the zip code data for all 50 states into a csv file that you can download here (it's 70 MB). After you have it downloaded, you can run the following script to import it into your local Microsoft SQL Server. Make sure you replace my ToDo with the path to the csv file you just downloaded.
use master GO -- create the database create database [5digit_zcta] GO use [5digit_zcta] GO -- create the zip_code_boundary table create table [dbo].[zip_code_boundary] ( [id] int not null, [state] varchar(256) not null, [zip_code] char(5) not null, [latitude] float not null, [longitude] float not null, constraint [pk_zip_code_boundary] primary key clustered ( [id] asc ) ) GO -- insert the data bulk insert [5digit_zcta].[dbo].[zip_code_boundary] from '<<ToDo: add path to zip_code_boundary.csv>>' with ( fieldterminator = ',', rowterminator = '\n' ) GO
That's it. Enjoy!
Comments
Awesome thanks for sharing the data Matt!
It will make great test data for showing off polygons in VE, and testing performance etc.
John.
Thanks for the data, Matt. I don't have a current project that needs this, but I *know* that I will.
Very cool Matt. Thanks for posting this. I looked at that schema awhile back and never made time to do the import. Thanks for posting your hard work for us. :-)
I just wanted to add my "Thanks!" also. Great job on this write up.
I downloaded you file and I'm adding it to my MySQL database right now.
It's going to be used to help visualize where job candidates are coming from.
Thanks again for the GREAT work!
Bob
Matt,
Nice post and thank you so much for sharing your thoughts with actual data. You cant make it simpler than this.
Baski
You can get more up to date ZIP Code boundaries from Maponics. There's also a great link there about how ZCTA's are not really ZIP Codes.
DarrinClement:
from Maponics wich package is the one that you recommend? they have just too many choices.
Hi Fernando,
We only offer one product of ZIP Code boundaries, but Maponics does offer a ton of (other) different products. If you are interested in map databases, the best thing to do is call 800-762-5158. Thanks!
Hi Matt,
Great stuff. I have made a virtual earth plotting program that plots census block group maps using the SharpMap toolset. I'm interested in possibly using your approach. There's lots of advantages because of the built in shape support.
Have you done any testing for performance? I'd be interested in how long it takes to plot the zip codes for a large state. Also, would you be willing to either post or email me the parsing scripts you created? Thanks.
Quick question - SQL 2008 has a new "geography" data type that requires polygons to begin and end with the same coordinates. Based on this datta it appears that in some cases (not counting "middle" coordinate)there appear to be "donut" polygons making the conversion to the new data type more cumbersome. The material at http://www.census.gov/geo/www/cob/ascii_info.html references: "Islands or exclusions within a polygon are flagged with an ID number of -99999. The outer cycle of a complex polygon is listed first, followed by any islands."
Does your file account for this?
@darrinclement - Using SQL server 2008 I returned all zip codes for MD in .04 seconds on a Lenovo T60p.
Any chance of getting your console app that you used to parse the census data? It would save me from having to write the same exact thing myself. Thanks!