Dugan Chen's Homepage

Various Things

Django with Yahoo’s Geoplanet Database

Most web applications eventually need to provide the following user experience:

  1. select your country
  2. select your “administrative division” from a list appropriate for your country

Where administrative division could be state, city, province, prefecture, or something else appropriate to your country.

It goes without saying that to pull this off, you need a populated backend database of country and state names. And if you don’t want to enter all 52 American states by hand, you’ll want to find a source of data that you can just import.

After spending several hours looking for recommendations on Stack Overflow and GIS – Stack Exchange, I settled on Yahoo! GeoPlanetâ„¢. I haven’t tried out their web service yet, but I did download and start playing with their Creative Commons-licensed data.

First, I set Django to use MySQL, because MySQL has tools that can import 1GB tab-delimited text files.

I also created a Django project called “geoplanet”, and an application called “example”.

In Django, you create your ORM models first, then generate your database from the models. Here is the only ORM model you need:

class Place(models.Model):

    woe = models.IntegerField(primary_key=True)
    iso = models.CharField(max_length=32)
    name = models.CharField(max_length=32, db_index=True)
    language = models.CharField(max_length=32)
    place_type = models.CharField(max_length=32, db_index=True)
    parent = models.ForeignKey('self', db_index=True, null=True)

The number of database records we’ll import is currently 5653977 (five million six hundred fifty three thousand nine hundred seventy seven), so we index the colums that we’ll need to search on.

Synchronizing the database creates a database called “geoplanet” and a table called “example_place”. To import it, I symlink the Yahoo data file (which, when I downloaded it, was named geoplanet_places_7.6.0.tsv) to example_place and then use MySQL’s tools:

mysqlimport -p geoplanet --ignore-lines=1 --fields-optionally-enclosed-by=\" -L example_place

With the data imported, using the database is easy.

In the Django shell, this will get you Canada’s WOEID (primary key). Of course, you can also just read it from the text file:

Place.objects.filter(name__exact='Canada').get(place_type__exact='Country').woe

This prints out “23424775L”.

One you have the country’s key, listing its provinces is easy:

[x.name for x in Place.objects.filter(place_type__exact='State').filter(parent__exact=23424775L)]

This returns a Python list of the names of Canada’s provinces and territories:

>>> [x.name for x in Place.objects.filter(place_type__exact='State').filter(parent__exact=23424775L)]

[u'Alberta', u'British Columbia', u'Manitoba', u'New Brunswick', u'Newfoundland and Labrador', u'Northwest Territories', u'Nova Scotia', u'Nunavut', u'Ontario', u'Prince Edward Island', u'Qu\xe9bec', u'Saskatchewan', u'Yukon Territory']