This tutorial provides an overview of tools for geocoding – converting addresses or names of locations into latitudes and longitudes – using the google maps API. Supplementary information is also provided below on what “API” actually means for those interested in a little demystification of this term, although it is not central to the tutorial.


1. Geo-coding with Google Maps

Google offers a service that allows users to submit requests for the latitudes and longitudes associated with different addresses or place names from within R and to get back results that are easy to work within R. This service is called a google geocoding API (Section 2 below discusses what an API is in general terms).

Basically, the google maps API will accept any query you could type into Google Maps and returns information on Google’s best guess for the latitude and longitude associated with your query. The tool for doing this from within R is found int he ggmap library, and the basic syntax is as follows:

library(ggmap)

addresses <- c("1600 Pennsylvania NW, Washington, DC", "denver, co")

locations <- geocode(addresses, source = "google", output = "more")
locations
         lon      lat           type     loctype
1  -77.03648 38.89768 street_address     rooftop
2 -104.99025 39.73924       locality approximate
                                              address    north    south
1 1600 pennsylvania ave nw, washington, dc 20500, usa 38.89902 38.89633
2                                     denver, co, usa 39.91425 39.61443
        east       west street_number                         route
1  -77.03513  -77.03783          1600 Pennsylvania Avenue Northwest
2 -104.60030 -105.10993          <NA>                          <NA>
          neighborhood   locality administrative_area_level_1
1 Northwest Washington Washington        District of Columbia
2                 <NA>     Denver                    Colorado
        country postal_code administrative_area_level_2
1 United States       20500                        <NA>
2 United States        <NA>               Denver County

Note the output option can be set to “latlon”, “latlona”, “more”, or “all” depending on how much information you want back. I would STRONGLY recommend always using the output="more" option so that you get information on how certain google is about its guess!

1.1 Interpreting geocode results

Geocoding results include two fields that are very important to understand: loctype and type.

type

Google thinks of the world as containing a number of different types of locations. Some are points (like houses), while others are areas (like cities). The type field tells you if google is giving you the location of a house, or just the centroid of a city. A full list of different types is available from google here, but the most common results (in my experience) are:

  • street_address: indicates a precise street address
  • locality: indicates an incorporated city or town political entity
  • point_of_interest: indicates a named point of interest. Typically, these “POI”s are prominent local entities that don’t easily fit in another category, such as “Empire State Building” or “Statue of Liberty.”
  • administrative_area_level_[SOME NUMBER]: these “civil entities”, where 0 is the country, 1 is the first administrative level below that (in the US, states), 2 is below that (in the US, counties), etc.

This is important because if you get a locality or administrative area, the latitude and longitude you get it just the centroid of the locality, and you should interpret it as such!

loctype

The loctype column provides similar but distinct information to type, including more information about street_address results. In order of precision, the possible values for this field are:

  • “ROOFTOP” indicates that the returned result is a precise geocode for which we have location information accurate down to street address precision.
  • “RANGE_INTERPOLATED” indicates that the returned result reflects an approximation (usually on a road) interpolated between two precise points (such as intersections). Interpolated results are generally returned when rooftop geocodes are unavailable for a street address.
  • “GEOMETRIC_CENTER” indicates that the returned result is the geometric center of a result such as a polyline (for example, a street) or polygon (region).
  • “APPROXIMATE” indicates that the returned result is approximate. localities and admin areas have this value!

1.2 Query Limits

Google limits individual (free) users to 2,500 queries per day and 10 queries per second. You can see how many queries you have remaining by typing geocodeQueryCheck(). You can also buy additional requests (up to 100,000 a day) for $0.50 / 1000 requests, and there is a paid subscription service that will provide up to 100,000 requests a day.

Because of this, it is important that you not test your code on your entire dataset or you’ll waste your queries when you’re debugging your code!


Exercise 1

Solutions at end of tutorial, but no cheating!

  1. Load the addresses_to_geocode.csv from the RGIS4_Data folder into a DataFrame using read.csv().

  2. Look at the locations in the data. Which do you think will succeed and which will fail? What types of results do you expect from each?
    • For non-Stanford users: “Encina Hall” is the building for the political science department at Stanford. Entry 3 (Engina Hall) is mis-spelled.
  3. Geocode the addresses using the geocode function. Note that if read.csv imported the addresses as a “factor” variable you may have to convert it to a character vector.

  4. Did you get latitudes and longitudes for all the addresses?

  5. Look at the results for Observation 3, which is mis-spelled. What did google do?

  6. Look at the results for Observation 4 (there is no place called “the zz room 123” at Stanford). What did google do? Do you like this behavior or not?

  7. Why might you want to take the latitude and longitude for Observation 6 with a grain of salt?


2. What is an API?

API is an (uninformative) acronym for “Application Programming Interface”.

It is often the case that a user working in one programming environment (like R) would like to take advantage of tools written in a different language. The goal of an API is to facilitate this by acting like as a cross between a translator and a messenger between two different programs.

Let’s us an example of an API that you are already quite familiar with, even if you didn’t realize it. The rgeos library from the RGIS2 tutorial is actually just an API for a program called GEOS written in a language called C++. When people started realizing that they wanted to do GIS analysis in R, the realized that it didn’t make sense to write a whole new set of tools in R for geometric operations when there was already a really sophisticated program written for this exact purpose. At the same time, however, they didn’t want to require R users to export their data to the harddrive, open a different program in a language they might not know, use that program to execute a calculation, then re-import the data.

In steps the API, rgeos. When you use rgeos, here’s what’s really happening:

That’s it. It’s just a middle man who “speaks” both R and GEOS, and who’s willing to run back and forth between these programs to make your life easier!

(Wanna know a secret? Most good libraries in R are actually just APIs for libraries written in C++!)

2.1 What is a Web API?

A web API is just a special kind of API that stands between an internet browser and the servers of a company like Google or Twitter. It’s job is to accept requests for data written in HTML, convert them into whatever language a company’s servers use, get the result requested, and return it to the user in a format that’s easy to work with (usually something called JSON).

This last point is key – most of the time, what a Web API does is take a web-page you are familiar with (like google maps) and strip away everything that is distracting to a computer program, like nice pictures and fancy formatting.

Web-API Requests

The way your computer accesses information on the internet is by composing a request in the form of a web address, known formally as a URL. Most of the time, however, this happens without you know it. For example, if you type “42” into google and click search, the result just appears. But look in the address bar above, and you can see the exact code used to get you those search results, which is sometimes as simple as https://www.google.com/#q=42 (though, depending on your computer, it may be much longer). Basically, when you click buttons in your browser, the browser converts your clicks into a URL and sends that request out to the internet.

With a Web API, we skip the step where a user clicks a button in the browser with their mouse, and instead just create customized URLs to ask for the data we need. This is a little less intuitive for humans, but is much easier for computers.

URL Components

There are a couple specific components of most URLs, and understanding these components will be helpful for both web APIs, and in other situations like web-scraping. For example, consider the following link to a youtube video: https://www.youtube.com/watch?v=dQw4w9WgXcQ.

  • https : This is the protocol you’re saying you want to use to talk to the server.
  • www.youtube.com : This is the server you want to pass your message to.
  • /watch : is how you tell the computer the general type of request you’re making (in this case, you want to go directly to a video).
  • ?v=dQw4w9WgXcQ : this is what’s called a “Query String”.

Query Strings are how you send extra information to an internet server. They generally start with a ? or a #, and are followed by a number of variable-value pairs. In this case, for example, you’re saying that you want youtube to know that in your request, the variable v has the value dQw4w9WgXcQ" (the internal name for this particular video). So basically, this youtube link says: “Hey, Youtube – I would like to access the program you keep in your /watch directory, and when you call that program, please tell it that the variable v should be set to dQw4w9WgXcQ.”

In this way, a URL is a lot like a function call in R where the query string contains the arguments you are passing to the function!

Web-API Responses

The main value of a Web API is that the response it provides to a query is written in a stripped-down format that’s easy for a computer to understand. Consider the two following results provided by google maps for the search term “Kalamazoo, MI”, one through google maps and one through the google maps API: