This tutorial provides an overview of the various libraries that contain tools for calculating spatial statistics. However, be aware that the focus of this tutorial is on finding the libraries that implement these different statistics, not on defining them or discussing their substantive interpretation. It is strongly recommended that users familiarize themselves with spatial statistical methods from another source before jumping to this tutorial.


The first thing to know about libraries for spatial statistics is that there are lots of them, and there is also a lot of redundancy across libraries. In this tutorial, I will focus on a small subset of these libraries that seem to be the most well-established and comprehensive, but if you find you are missing something, additional resources abound.

1. spatstat for Spatial Point Patterns

The best library for studying the statistical properties of point distributions is probably spatstat, which also comes with a companion textbook, website, and [a set of tutorials](click the “Vignette” links)](https://cran.r-project.org/web/packages/spatstat/). spatstat has also been around for a long time, has been widely used, and repeatedly updated, suggesting most bugs are likely to have been found and patched (which is not always the case for R packages).

Here are a few commonly used methods:

1.1 Using spatstat with Spatial* Objects

spatstat is not directly compatible with the sp library, so to use spatstat we need to do a quick conversion:

library(rgdal)
library(spatstat)
sf <- readOGR("rgis5_data/sfpd_incident_shapefile", "sfpd_incident_2015")
OGR data source with driver: ESRI Shapefile 
Source: "rgis5_data/sfpd_incident_shapefile", layer: "sfpd_incident_2015"
with 124016 features
It has 11 fields
# Syntax is: ppp(x.coordinates, y.coordinates, x.range, y.range)
ss.object <- ppp(sf@coords[, "coords.x1"], sf@coords[, "coords.x2"], sf@bbox[1, 
    ], sf@bbox[2, ])

# Do Kest! Note there are lots of versions of kest...
plot(Kest(ss.object, correction = "good"))

2. spdep for Spatial Econometrics

spatstat only works with points. If you wish to analyze spatial correlation of polygons, spdep can be very helpful. A primary author of spdep is Roger Bivand, who also wrote the sp library, so unlike spatstats, spdep plays very well with sp objects!

2.1 Installing spatstat

If install.packages("spatstat") does not work for you:

  1. Visit the CRAN spatstat site

  2. In the “downloads” section, download the r-release binaries file associated with your operating system.

  3. Set your working directory to wherever you placed the downloaded file.

  4. Run install.packages("spatstat_1.43-0.zip", repos=NULL) (updating the file name to the most recent version and appropriate suffix)

Hopefully you’re set to go!

2.2 Neighbor lists and Spatial weights

We generally creating spatial weighting matrices in two steps:

  1. Create a “neighbor list” (nb object)
  2. Convert it to a spatial weighting matrix (listw object)

An nb object just records which features are “neighbors” of one another (you either are or are not a neighbor). The function allows users to specify the A listw is a full, normalized weighting matrix. In most cases, you start by making an nb object then convert it to a listw using the nb2listw command. Details of methods not shown below – like graph distance or k nearest neighbor methods can be found here.

library(spdep)
library(rgdal)
pa <- readOGR("rgis5_data/palo_alto_demographic_shapefile", "palo_alto")
OGR data source with driver: ESRI Shapefile 
Source: "rgis5_data/palo_alto_demographic_shapefile", layer: "palo_alto"
with 371 features
It has 5 fields
pa$share.hispanic <- pa$hispanc/(pa$hispanc + pa$White)
plot(pa)

# Make continuity NB (neighbors share edges)
continuity.nb <- poly2nb(pa, queen = FALSE)

# Plot neighbors
plot(continuity.nb, coordinates(pa))

# Convert to weights and summarize
continuity.listw <- nb2listw(continuity.nb)
summary(continuity.listw)
Characteristics of weights list object:
Neighbour list object:
Number of regions: 371 
Number of nonzero links: 1944 
Percentage nonzero weights: 1.41237 
Average number of links: 5.239892 
Link number distribution:

  3   4   5   6   7   8   9  10  11  16 
 25 105 114  66  36  10   8   3   3   1 
25 least connected regions:
8 35 87 88 89 93 139 147 159 160 171 178 227 251 257 263 296 298 305 312 315 336 342 353 370 with 3 links
1 most connected region:
313 with 16 links

Weights style: W 
Weights constants summary:
    n     nn  S0       S1      S2
W 371 137641 371 146.7926 1525.55
# Make continuity NB (neighbors share at least one vertex)
continuity.plus.vertex.nb <- poly2nb(pa, queen = TRUE)
continuity.plus.vertex.listw <- nb2listw(continuity.plus.vertex.nb)

# Make all polygons farther than d1 and closer than d2 neighbors Convert
# polygons to centroids
library(rgeos)
pa.in.utm <- spTransform(pa, CRS("+init=EPSG:32610"))
pa.centroids = gCentroid(pa.in.utm, byid = TRUE)

# Neighbors within 10 km
intermediate <- dnearneigh(pa.centroids, d1 = 0, d2 = 10000)
listw.10km <- nb2listw(intermediate)

2.3 Spatial Analysis with spdep

One of the most commonly examined measures of spatial dependence is the Moran’s I, which is easily executed using the weights created above:

moran.test(pa$share.hispanic, continuity.listw)

    Moran's I test under randomisation

data:  pa$share.hispanic  
weights: continuity.listw  

Moran I statistic standard deviate = 24.69, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.798888034      -0.002702703       0.001054051 
moran.plot(pa$share.hispanic, continuity.listw)

One can also calculate local Moran’s I statistics for each polygon rather than just global values:

library(maptools)
result <- localmoran(pa$share.hispanic, continuity.listw)
result
               Ii         E.Ii     Var.Ii          Z.Ii    Pr(z > 0)
0    8.215135e-01 -0.002702703 0.19732293  1.8554633259 3.176505e-02
1    2.516570e-01 -0.002702703 0.16398645  0.6281225207 2.649618e-01
2    4.834256e-02 -0.002702703 0.16398645  0.1260524904 4.498452e-01
3   -1.307524e-02 -0.002702703 0.24732765 -0.0208568490 5.083201e-01
4   -1.111539e-01 -0.002702703 0.16398645 -0.2678121759 6.055781e-01
5   -1.411150e-02 -0.002702703 0.24732765 -0.0229405280 5.091511e-01
6    3.190916e-01 -0.002702703 0.14017468  0.8594953266 1.950336e-01
7   -1.626224e-01 -0.002702703 0.16398645 -0.3949098641 6.535453e-01
8    6.810898e-04 -0.002702703 0.33066885  0.0058844663 4.976525e-01
9    4.769649e-01 -0.002702703 0.19732293  1.0798206295 1.401110e-01
10   5.391738e-01 -0.002702703 0.24732765  1.0895922247 1.379464e-01
11   1.316338e+00 -0.002702703 0.19732293  2.9694052230 1.491884e-03
12   1.011352e+00 -0.002702703 0.24732765  2.0390365941 2.072319e-02
13  -2.181948e-01 -0.002702703 0.16398645 -0.5321418909 7.026861e-01
14   1.824294e+00 -0.002702703 0.24732765  3.6736816706 1.195403e-04
 [ reached getOption("max.print") -- omitted 356 rows ]
attr(,"call")
localmoran(x = pa$share.hispanic, listw = continuity.listw)
attr(,"class")
[1] "localmoran" "matrix"    
pa <- spCbind(pa, as.data.frame(result))

spplot(pa, "Z.Ii")

2.4 Spatial Regressions and Tests

Diagnostics

You can easily run the various Lagrange tests for spatial correction with one command once you have a weighting matrix:

my.model <- lm(share.hispanic ~ PrCpInc, data = pa)

lm.LMtests(my.model, continuity.listw, test = "all")

    Lagrange multiplier diagnostics for spatial dependence

data:  
model: lm(formula = share.hispanic ~ PrCpInc, data = pa)
weights: continuity.listw

LMerr = 232, df = 1, p-value < 2.2e-16


    Lagrange multiplier diagnostics for spatial dependence

data:  
model: lm(formula = share.hispanic ~ PrCpInc, data = pa)
weights: continuity.listw

LMlag = 264.26, df = 1, p-value < 2.2e-16


    Lagrange multiplier diagnostics for spatial dependence

data:  
model: lm(formula = share.hispanic ~ PrCpInc, data = pa)
weights: continuity.listw

RLMerr = 35.081, df = 1, p-value = 3.163e-09


    Lagrange multiplier diagnostics for spatial dependence

data:  
model: lm(formula = share.hispanic ~ PrCpInc, data = pa)
weights: continuity.listw

RLMlag = 67.34, df = 1, p-value = 2.22e-16


    Lagrange multiplier diagnostics for spatial dependence

data:  
model: lm(formula = share.hispanic ~ PrCpInc, data = pa)
weights: continuity.listw

SARMA = 299.34, df = 2, p-value < 2.2e-16

Alternate regressions

And run the regression you decide you want almost as easily!

# Spatial Auto-Regression Linear Model
my.model.sar <- lagsarlm(share.hispanic ~ PrCpInc, data = pa, continuity.listw)

# Error Auto-Regresive
my.model.ear <- errorsarlm(share.hispanic ~ PrCpInc, data = pa, continuity.listw)

3. Spatial Interpolation

Spatial interpolation is the process of using a set of observations of some attribute at specific points and attempts to interpolate the likely values between those points. There are a number of methods for spatial interpolation, including Inverse Distance Weighting (IDW), kernel density estimators, and Kriging, and many libraries for each.

The main library for this purpose is the gstat library, the full manual for which you can find here.

A nice tutorial on spatial interpolation can be found here.

4. Bayesian Methods

Though this author has not worked with theme, there are bayesian spatial libraries available for interested parties!

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.