niedziela, 6 stycznia 2013

Geocoding places from paper traffic surveys

Long time since the last post - it doesn't mean however that i^2 was relaxing and playing guitar on the beach, contary: big projects - interesting, challenging, innovative.

I hope soon at least few more posts will be published.
Now just snippet from small bit I was playing with yesterday. I'm pretty proud of how clever complex issues can be solved ;)

Problem:
To geocode origins and destinations from their names and place them on the map.
Background:
We got survey results in xls files. There were around 5000 surveys from Małopolska (region in Poland with Kraków being it's capital) - respondents were asked to specify their origin and destination (classic OD survey). Perfect - but what's next? XLS has infinite capacity for data mining, but not for me - I just don't get it: why don't you program if it's that much easier, elegant and straighforward than horrible "vlookups" in excel... Nevermind: I created nice SQLite DB with neat primary, foreign keys, indexes, etc. - works quite fast. But for those trips we needed details: distance, where it started, what country, which województwo ~= county, etc. Adn that was the major problem to be solved. Solution 1: place it on the map, and for each trip determine it's distance (yuck!) Solution 2:
Solution:
~2hrs of python coding and full info on 5000 trips gathered.
a) xlrd to get all the trips data into SQLite
plik = xlrd.open_workbook(path)
arkusze = plik.sheet_names()
for nazwa_arkusza in arkusze:
   arkusz = plik.sheet_by_name(nazwa_arkusza)
    for rownum in range(9,arkusz.nrows):
    rzad=arkusz.row(rownum)
    

b) use python set() to define list of unique places observed during survey
Places.add(str(rzad.value).upper())
c) create 3 tables in SQL: 'places' , 'relations' , 'trips' . relation has two foreign keys from places and trip has one foreign key to relation
d) use googlemaps.pyhttp://pypi.python.org/pypi/googlemaps/ ) very easy, but clever library to use googlemaps data in your code (it uses json to gather data, and probably can be easily expanded to cover more queries):
for Place in Places:
   punkt=gmaps.geocode(Miejsce)
   lng, lat = punkt['Placemark'][0]['Point']['coordinates'][0:2]
   country=result['Placemark'][0]['AddressDetails']['Country'] ['CountryName']
   wojewodztwo=result['Placemark'][0]['AddressDetails']['Country']['AdministrativeArea']['AdministrativeAreaName']
   city=result['Placemark'][0]['AddressDetails']['Country']['AdministrativeArea']['SubAdministrativeArea']['SubAdministrativeAreaName']


e) to get distance I used the same  googlemaps.py with different query:
for relation in relations:
         result=gmaps.directions(relation.from,relation.to)
    dystans = result['Directions']['Distance']['meters']

Results:
See two screenshots below:
1) before (xls file)



Original xls file with place names

2) after (Visum network with origins/detinations placed in proper geocoding)
Geocoded places on Visum network
PS. It shows olny the bright side of the solution. There were many problems with proper geocoding (i.e. first row in DB was: 'Kosocice' suburban area in Kraków, which was suggested by google to be "Kosice" city in slovakia ~400km away. And such problems were numerous. Next: there is around 50 places in PL named: "Nowa Wieś" (means new village in polish).  etc. etc.


Brak komentarzy:

Prześlij komentarz