Welcome to Spatial Data in R! This first set of tutorials (in three parts) is provides an introduction to the two types of spatial data you will encounter in R: vector data and raster data. By the end of this tutorial, you should have a good sense of how R thinks about spatial data, and how to import and export spatial datasets you may get from other programs or sources.


1 Spatial Data in R: Building Objects from Scratch!

Almost all spatial vector data structures in R are based on the sp package. Even other libraries that may seem independent are usually built on top of sp, even if you can’t see it.

The sp package has three main types of spatial data we’ll work with: points, lines, and polygons. There are some differences between these different types, but they are all very similar.

To help you understand what these data structures are like, in this section we’ll create some spatial data from scratch. This is probably not how you’ll work with data in the future – most of the time you just import spatial data from a source – but this exercise will help give you a good foundation and help you know how to troubleshoot problems in the future.

There are three basic steps to creating spatial data by hand:


Exercise 0

Discuss with your neighbor: What do we need to specify in order to define spatial vector data?


1.1 SpatialPoints: Your First Spatial* Object!

Points are the most basic geometric shape, so we begin by building a SpatialPoints object.

Make Points.

A points is defined by a pair of x-y coordiantes, so we can create a set of points by (a) creating matrix of x-y points, and (b) passing them to the SpatialPoints function to create our first SpatialPoints object:

library(sp)
toy.coordinates <- rbind(c(1.5, 2), c(2.5, 2), c(0.5, 0.5), c(1, 0.25), c(1.5, 
    0), c(2, 0), c(2.5, 0), c(3, 0.25), c(3.5, 0.5))

toy.coordinates
      [,1] [,2]
 [1,]  1.5 2.00
 [2,]  2.5 2.00
 [3,]  0.5 0.50
 [4,]  1.0 0.25
 [5,]  1.5 0.00
 [6,]  2.0 0.00
 [7,]  2.5 0.00
 [8,]  3.0 0.25
 [9,]  3.5 0.50
my.first.points <- SpatialPoints(toy.coordinates)  # ..converted into a spatial object
plot(my.first.points)

To get a summary of how R sees these points, we can ask it for summary information in a couple different ways. Here’s a summary of available commands:

summary(my.first.points)
Object of class SpatialPoints
Coordinates:
          min max
coords.x1 0.5 3.5
coords.x2 0.0 2.0
Is projected: NA 
proj4string : [NA]
Number of points: 9
coordinates(my.first.points)
      coords.x1 coords.x2
 [1,]       1.5      2.00
 [2,]       2.5      2.00
 [3,]       0.5      0.50
 [4,]       1.0      0.25
 [5,]       1.5      0.00
 [6,]       2.0      0.00
 [7,]       2.5      0.00
 [8,]       3.0      0.25
 [9,]       3.5      0.50

Add a Coordinate Reference System (CRS)

Unlike a simple geometric object, a SpatialPoints object has the ability to keep track of how the coordinates of its points relate to places in the real world through an associated “Coordinate Reference System” (CRS – the combination of a geographic coordinate system and possibly a projection), which is stored using a code called a proj4string. The proj4string is so important to a SpatialPoints object, that it’s presented right in the summary:

summary(my.first.points)
Object of class SpatialPoints
Coordinates:
          min max
coords.x1 0.5 3.5
coords.x2 0.0 2.0
Is projected: NA 
proj4string : [NA]
Number of points: 9

In this case, however, while our SpatialPoints object clearly knows what a CRS is, the Spatial object we just created does not have a projection or geographic coordinate system defined. It is ok to plot, but be aware that for many meaningful spatial operations you will need to define a CRS.

CRS objects can be created by passing the CRS() function the code associated with a known projection. You can find the codes for most commonly used projections from www.spatialreference.org.

Note that the same CRS can often be called in many ways. For example, one of the most commonly used CRS is the WGS84 latitude-longitude projection. You can create a WGS84 lat-long projection object by either passing the reference code for the projection – CRS("+init=EPSG:4326") – or by directly calling all its relevant parameters – CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs").

Note: Some users report this only works if EPSG is lower-case (epsg), so try that if you have issues!

Here’s an illustration of assigning a CRS:

is.projected(my.first.points)  # see if a projection is defined. 
[1] NA
# Returns `NA` if no geographic coordinate system or projection; returns
# FALSE if has geographic coordinate system but no projection.

crs.geo <- CRS("+init=EPSG:32633")  # UTM 33N
proj4string(my.first.points) <- crs.geo  # define projection system of our data
is.projected(my.first.points)
[1] TRUE
summary(my.first.points)
Object of class SpatialPoints
Coordinates:
          min max
coords.x1 0.5 3.5
coords.x2 0.0 2.0
Is projected: TRUE 
proj4string :
[+init=EPSG:32633 +proj=utm +zone=33 +datum=WGS84 +units=m +no_defs
+ellps=WGS84 +towgs84=0,0,0]
Number of points: 9

When CRS is called with only an EPSG code, R will try to complete the CRS with the parameters looked up in the EPSG table:

crs.geo <- CRS("+init=EPSG:32633")  # looks up UTM 33N
crs.geo  # prints all parameters
CRS arguments:
 +init=EPSG:32633 +proj=utm +zone=33 +datum=WGS84 +units=m +no_defs
+ellps=WGS84 +towgs84=0,0,0 

We’ll talk more about managing projections in the next lesson.

Geometry-only objects (objects without attributes) can be subsetted similar to how vectors or lists are subsetted; we can select the first two points by

my.first.points[1:2]
class       : SpatialPoints 
features    : 2 
extent      : 1.5, 2.5, 2, 2  (xmin, xmax, ymin, ymax)
coord. ref. : +init=EPSG:32633 +proj=utm +zone=33 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0 

Add Attributes

Moving from a SpatialPoints to a SpatialPointsDataFrame occurs when you add a data.frame of attributes to the points. Let’s just add an arbitrary table to the data – this will label each point with a letter and a number. Note points will merge with the data.frame based on the order of observations.

df <- data.frame(attr1 = c("a", "b", "z", "d", "e", "q", "w", "r", "z"), attr2 = c(101:109))
df
  attr1 attr2
1     a   101
2     b   102
3     z   103
4     d   104
5     e   105
6     q   106
7     w   107
8     r   108
9     z   109
my.first.spdf <- SpatialPointsDataFrame(my.first.points, df)
summary(my.first.spdf)
Object of class SpatialPointsDataFrame
Coordinates:
          min max
coords.x1 0.5 3.5
coords.x2 0.0 2.0
Is projected: TRUE 
proj4string :
[+init=EPSG:32633 +proj=utm +zone=33 +datum=WGS84 +units=m +no_defs
+ellps=WGS84 +towgs84=0,0,0]
Number of points: 9
Data attributes:
     attr1       attr2    
 z      :2   Min.   :101  
 a      :1   1st Qu.:103  
 b      :1   Median :105  
 d      :1   Mean   :105  
 e      :1   3rd Qu.:107  
 q      :1   Max.   :109  
 (Other):2                

Now that we have attributes, we can also subset our data the same way we would subset a data.frame. Some subsetting:

my.first.spdf[1:2, ]  # row 1 and 2 only
class       : SpatialPointsDataFrame 
features    : 2 
extent      : 1.5, 2.5, 2, 2  (xmin, xmax, ymin, ymax)
coord. ref. : +init=EPSG:32633 +proj=utm +zone=33 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0 
variables   : 2
names       : attr1, attr2 
min values  :     a,   101 
max values  :     b,   102 
my.first.spdf[1:2, "attr1"]  # row 1 and 2 only, attr1
class       : SpatialPointsDataFrame 
features    : 2 
extent      : 1.5, 2.5, 2, 2  (xmin, xmax, ymin, ymax)
coord. ref. : +init=EPSG:32633 +proj=utm +zone=33 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0 
variables   : 1
names       : attr1 
min values  :     a 
max values  :     b 
plot(my.first.spdf[which(my.first.spdf$attr2 > 105), ])  # select if attr2 > 5


SpatialPoint from a lon/lat table

A SpatialPointsDataFrame object can be created directly from a data.frame by specifying which columns contain the coordinates. This is interesting, for example if you have a spreadsheet that contains latitude, longitude and some values. You can read this into a data frame with read.table, and then create the object from the data frame in one step by using the coordinates() command. That automatically turns the dataframe object into a SpatialPointsDataFrame.


Exercise 1.1

  1. If you haven’t already, create a new directory R_Workshop on your Desktop.
  2. Download and place the RGIS1_Data folder from coursework in this directory.
  3. read.csv() to read sf_restaurant_inspections.csv into a dataframe in R and name it sf.df.
  4. Use head() to examine the first few lines of the dataframe.
  5. Use class() to examine which object class the table belongs to.
  6. Load the sp library (library(sp))
  7. Convert the table into a spatial object using the coordinates command and passing the names of the columns in the table that correspond to longitude and latitude:

    coordinates(sf.df) <- c("longitude", "latitude")

Note the reverse order! While we usually say “latitude and longitude”, since longitude corresponds to the x-coordinates on a map and latitude corresponds to the y-coodinates and R assumes coordinates to be ordered (x,y), we pass c("longitude","latitude")!

  1. Use class() again to examine which object class the table belongs to now:
    What to you observe?

  2. Plot restaurants with terrible scores by subsetting on Score and using the plot command.


1.2 SpatialPolygons: Your bread and butter

SpatialPolygons are very, very common, especially in political science (think administrative borders, electoral constituencies, etc.), so they’re important to get used to.

Building up a SpatialPolygons from scratch.

SpatialPolygons are a little more complicated than SpatialPoints. With SpatialPoints, we moved directly from x-y coordinates to a SpatialPoints object.

To get a SpatialPolygons object, we have to build it up by (a) creating Polygon objects, (b) combining those into Polygons objects (note the “s” at the end), and finally (c) combining those to create SpatialPolygons. So what are these components?

  • A Polygon object is a single geometric shape (e.g. a square, rectangle, etc.) defined by a single uninterrupted line around the exterior of the shape.
  • A Polygons object consists of one or more simple geometric objects (Polygon objects) that combine to form what you think of as a single unit of analysis (an “observation”). For example, each island in Hawaii would be a Polygon, but Hawaii itself is the Polygons consisting of all the individual island Polygon objects.
  • A SpatialPolygons object is a collection of Polygons objects, where each Polygons object is an “observation”. For example, if we wanted a map of US states, we would make a Polygons object for each state, then combine them all into a single SpatialPolygons. If you’re familiar with shapefiles, SpatialPolygons is basically the R analogue of a shapefile or layer.

One special note: if you want to put a hole in a polygon (e.g. drawing a donut, or if you wanted to draw South Africa and leave a hole in the middle for Lesotho) you do so by (a) creating a Polygon object for the outline, (b) creating a second Polygon object for the hole and passing the argument hole=TRUE, and (c) combine the two into a Polygons object.

Let’s try building up a SpatialPolygon!

# create polyon objects from coordinates.  Each object is a single geometric
# polygon defined by a bounding line.
house1.building <- Polygon(rbind(c(1, 1), c(2, 1), c(2, 0), c(1, 0)))

house1.roof <- Polygon(rbind(c(1, 1), c(1.5, 2), c(2, 1)))

house2.building <- Polygon(rbind(c(3, 1), c(4, 1), c(4, 0), c(3, 0)))

house2.roof <- Polygon(rbind(c(3, 1), c(3.5, 2), c(4, 1)))

house2.door <- Polygon(rbind(c(3.25, 0.75), c(3.75, 0.75), c(3.75, 0), c(3.25, 
    0)), hole = TRUE)

# create lists of polygon objects from polygon objects and unique ID A
# `Polygons` is like a single observation.
h1 <- Polygons(list(house1.building, house1.roof), "house1")
h2 <- Polygons(list(house2.building, house2.roof, house2.door), "house2")

# create spatial polygons object from lists A SpatialPolygons is like a
# shapefile or layer.
houses <- SpatialPolygons(list(h1, h2))
plot(houses)

Adding Attributes to SpatialPolygon

As with SpatialPoints, we can associated a data.frame with SpatialPolygons. There are two things that are important about doing this with SpatialPolygons:

  1. When you first associate a data.frame with a SpatialPolygons object, R will line up rows and polygons by matching Polygons object names with the data.frame row.names.
  2. After the initial association, this relationship is NO LONGER based on row.names! For the rest of the SpatialPolygonsDataFrame’s life, the association between Polygons and rows of your data.frame is based on the order of rows in your data.frame, so don’t try to change the order of the data.frame by hand!

Make attributes and plot. Note how the door – which we created with hole=TRUE is empty!

attr <- data.frame(attr1 = 1:2, attr2 = 6:5, row.names = c("house2", "house1"))
houses.DF <- SpatialPolygonsDataFrame(houses, attr)
as.data.frame(houses.DF)  # Notice the rows were re-ordered!
       attr1 attr2
house1     2     5
house2     1     6
spplot(houses.DF)

Adding a Coordinate Reference Systems (CRS)

As with SpatialPoints, a SpatialPolygons object on Earth doesn’t actually know where it until you set its Coordinate Reference System, which you can do the same way you did with the SpatialPoints objects:

crs.geo <- CRS("+init=EPSG:4326")  # geographical, datum WGS84
proj4string(houses.DF) <- crs.geo  # define projection system of our data

Exercise 1.2 Answer code at end of exercise – but don’t cheat!

  1. Make a (highly stylized) SpatialPolygon object for Lesotho and South Africa based on the following points:

South Africa

  1. Add an attribute Table with each country’s GDP Per Capita (approximately $7,000 for South Africa and $1,000 for Lesotho).

  2. Plot your SpatialPolygons using spplot()

The result should look something like this:

South Africa 2


1.3 SpatialLines: Just like SpatialPolygons

SpatialLines objects are basically like SpatialPolygons, except they’re built up using Line objects (each a single continuous line, like each branch of a river), Lines objects (collection of lines that form a single unit of analysis, like all the parts of a river), to SpatialLines (collection of “observations”, like rivers).

1.4 Recap of Spatial* Objects

Here’s a quick summary of the construction flow of SpatialObjects:

DEM reproject

2. Importing and Exporting Spatial Data using rgdal

Normally we do not create Spatial* objects by hand. It is much more common to work with existing data read from external sources like shapefiles, or databases.

In order to read those into R and turn them into Spatial* family objects we rely on the rgdal package. It provides us direct access to the powerful GDAL library from within R.

We can read in and write out spatial data using:

readOGR() and writeOGR()

The parameters provided for each function varies depending on the exact spatial file type you are reading. We will take the ESRI shapefile as an example. A shapefile - as you know - consists of various files, and R expects all those files to be in one directory.

When reading in a shapefile, readOGR() expects at least the following two arguments:

datasource name (dsn)  # the path to the folder that contains the files
                       # Note that this is a path to the folder
layer name (layer)     # the file name without extension
                       # Note that this is not a path but just the name of the file

For example, if I have a shapefile called sf_incomes.shp and all its associated files (like .dbf, .prj, .shx) in a directory called PH on my desktop, and I have my working directory set to my desktop folder, my command to read this shapefile would look like this:

readOGR(dsn = "shapefiles", layer = "sf_incomes")

or in short:

readOGR("shapefiles", "sf_incomes")

Note you may run into trouble if you set your working directory to the folder holding the shapefile as ‘readOGR()’ doesn’t like it if the first argument is an empty string.


Exercise 2

  1. Load the rgdal package.
  2. Make sure your working directory is set to the R_Workshop folder and it contains the materials you downloaded and unzipped earlier.
  3. Read sf_income into an object called sf. Make sure you understand the directory structure.
  4. Examine the file, for example with summary(), class(), str("sf", max.level = 2),
  5. Take a look at the column names of the attribute data with names(sf)
  6. Take a look at the attribute data with head(as.data.frame(sf))
  7. Note that subsetting works here: sf[sf$MdIncHH > 40000,]
  8. Plot some attribute data: spplot(sf, "MdIncHH")

3. OPTIONAL: Understanding Spatial* Objects

Turns out Spatial* objects are just a collection of things that you can see using the str command:

str(my.first.spdf)
Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
  ..@ data       :'data.frame': 9 obs. of  2 variables:
  .. ..$ attr1: Factor w/ 8 levels "a","b","d","e",..: 1 2 8 3 4 5 7 6 8
  .. ..$ attr2: int [1:9] 101 102 103 104 105 106 107 108 109
  ..@ coords.nrs : num(0) 
  ..@ coords     : num [1:9, 1:2] 1.5 2.5 0.5 1 1.5 2 2.5 3 3.5 2 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
  ..@ bbox       : num [1:2, 1:2] 0.5 0 3.5 2
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
  .. .. ..$ : chr [1:2] "min" "max"
  ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
  .. .. ..@ projargs: chr "+init=EPSG:32633 +proj=utm +zone=33 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0"

What this shows is that my.first.spdf is actually a collection of different things – data, coords, bbox, and a proj4string. In the same way we can get many of these things with commands, we can also call them with the @ command!

bbox(my.first.spdf)
          min max
coords.x1 0.5 3.5
coords.x2 0.0 2.0
my.first.spdf@bbox
          min max
coords.x1 0.5 3.5
coords.x2 0.0 2.0
coordinates(my.first.spdf)
      coords.x1 coords.x2
 [1,]       1.5      2.00
 [2,]       2.5      2.00
 [3,]       0.5      0.50
 [4,]       1.0      0.25
 [5,]       1.5      0.00
 [6,]       2.0      0.00
 [7,]       2.5      0.00
 [8,]       3.0      0.25
 [9,]       3.5      0.50
my.first.spdf@coords
      coords.x1 coords.x2
 [1,]       1.5      2.00
 [2,]       2.5      2.00
 [3,]       0.5      0.50
 [4,]       1.0      0.25
 [5,]       1.5      0.00
 [6,]       2.0      0.00
 [7,]       2.5      0.00
 [8,]       3.0      0.25
 [9,]       3.5      0.50
as.data.frame(my.first.spdf)
  attr1 attr2 coords.x1 coords.x2
1     a   101       1.5      2.00
2     b   102       2.5      2.00
3     z   103       0.5      0.50
4     d   104       1.0      0.25
5     e   105       1.5      0.00
6     q   106       2.0      0.00
7     w   107       2.5      0.00
8     r   108       3.0      0.25
9     z   109       3.5      0.50
my.first.spdf@data
  attr1 attr2
1     a   101
2     b   102
3     z   103
4     d   104
5     e   105
6     q   106
7     w   107
8     r   108
9     z   109

In R, the items with @ at the start of the line are called “slots”. If you are used to working in another object-oriented language like Java or Python, these are analogous to attributes. You can access each of these “slots” using the @ operator, making them equivalent of some function calls. It is a general recommendation to not modify slots by directly assigning values to them unless you know very well what you do.

Answers to Exercise 1.2

sa.outline <- rbind(c(16, -29), c(33, -21), c(33, -29), c(26, -34), c(17, -35))
sa.hole <- rbind(c(29, -28), c(27, -30), c(28, -31))
lesotho <- rbind(c(29, -28), c(27, -30), c(28, -31))

# Make Polygon objects
sa.outline.poly <- Polygon(sa.outline)
sa.hole.poly <- Polygon(sa.hole, hole = TRUE)
lesotho.poly <- Polygon(lesotho)

# Make Polygons objects
sa.polys <- Polygons(list(sa.outline.poly, sa.hole.poly), "SA")
lesotho.polys <- Polygons(list(lesotho.poly), "Lesotho")

# Make spatialy Polygons
map <- SpatialPolygons(list(sa.polys, lesotho.polys))

# Check
plot(map)

# Add data
df <- data.frame(c(7000, 1000), row.names = c("SA", "Lesotho"))
mapDF <- SpatialPolygonsDataFrame(map, df)

# Add WGS 84 coordinate system
proj4string(mapDF) <- CRS("+init=EPSG:4326")

# Plot
spplot(mapDF)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.