This tutorial presents an illustrative example of how network analysis tools – especially community detection algorithms – can be combined with geo-spatial tools. For examples of this type of analysis, see:


1. Network Basics

This section provides a brief introduction to network vocabulary, and how networks are represented in computers.

1.1 The Components of a Network

A network – also sometimes called a graph – is composed of two elements: individual agents, and information on how they relate to one another. More formally, these individuals are known as vertices or nodes, and relationships between individuals are known as edges or links.

Examples of networks abound, and are not limited to the most obvious examples like Facebook social networks (where vertices are individual users and edges exist between people who are “friends”). One can also imagine business networks, where each vertex is a business and edges lie between businesses that buy from one another or share investors. Political actors can also be thought of a network where vertices correspond to individuals and edges are placed between individuals who ahve donated to one another.

1.2 Types of Networks

Though all networks have vertices and edges, the nature of edges can also vary. In particular, edges may be “weighted” or “unweighted”, and they can be either “undirected” or “directed”.

Weighted versus Unweighted

In an unweighted network, individuals are either connected or they are not – all edges are binary and equal in importance. However, network edges may also be assigned weights that reflect the strength of a connection between individuals. In a business network, for example, one might assign weights based on the total amount of money exchanged between businesses.

Directed Versus Undirected

In an undirected graph, all relationships are symmetrical – if Ali is friends with Beth, then Beth is necessarily friends with Ali. By constrast, in a directed graph, relationships may not be symmetric. For example, in a business network, Business A may buy a product from Business B, but Business B may not have bought products from Business A.

2. Computer Representation of Networks

When we think of networks, we usually have in mind an image like the following:

as_graph

Unfortunately, while easy for humans to interpret, this type of picture isn’t very helpful for computers, so a number of solutions have been developed for network representation.

2.1 Adjacency Matrices

The most common formal presentation of a network is as an adjacency matrix – a matrix that, for n individuals, is n rows long and n rows wide. The cells of this matrix can then be filled in to indicate when individuals are connected – for an unweighted, undirected network, the cell at row A and column B will take a value of 1 if A and B share an edge, and 0 otherwise, as illustrated below:

as_adjacency

For undirected networks, this matrix will be symmetric; for weighted networks, these values will correspond to edge weights rather than being 1s and 0s.

2.2 Edgelists

Adjacency matrices are very clear, clean representations of networks, but they have on fundamental problem: they quickly become enormous. An adjacency matrix will always be N rows long and N columns wide, giving rise to N^2 entries. Even for a moderate size network of 1000 individuals, this matrix will have one million entries, and for a larger network with 100,000 nodes, this number climbs to ten billion.

Fortunately, there is a more compact way to represent some networks. In most social networks, the majority of individuals are only connected to a few other people, giving rise to what is called a sparse network. In sparse networks, the vast majority of cells in an adjacency matrix will be occupied with zeros, and so it’s actually easier to just write out who is connected to who – a format called an edgelist, as illustrated below:

as_edgelist

The interpretation of an edgelist depends on whether it is known to be directed or not – if a network is directed, then the order in which names appear indicates the direction of connections; if not, order is irrelevant. Weighted networks also often include a third column with the weight associated with each edge.

3. The iGraph library

The best tool currently available for analyzing network data in R is the igraph library.

The igraph library has a number of advantages over other tools that currently exist. First, it is quite comprehensive – it includes a tremendous number of tools for very different types of network analysis. Second, despite it’s cute name, igraph was written by serious programmers, and it is very fast (as it’s actually written in C). This is very important in networks, as many network algorithms are extremely computationally demanding for even moderately sized networks. And finally, it’s multi-platform – you can also use it in Python if you want, and on different computer systems.

Note that iGraph released a new version (version 1.0) in June of 2015 which includes a number of substantial changes to the interface (you can read about them here), so be careful about relying too much on older tutorials. Old function names should still work thanks to careful work by the project author, but many things have changed.

Probably the best resource if you want to learn more is a tutorial by the author of iGraph for R (and one of two authors of the underlying C library) found here..

3.1 Creating a Network Object

The easiest way to create an igraph network is to create an edgelist in a data.frame, then convert that data.frame into a graph object.

library("igraph")

# Make an edgelist. Normally you might read this from a csv or create from
# other data
edge.list.df <- data.frame(list(col1 = c(1, 2, 3, 3, 4), col2 = c(4, 1, 2, 4, 
    2)))
edge.list.df
  col1 col2
1    1    4
2    2    1
3    3    2
4    3    4
5    4    2
# Now convert to a graph!
my.graph <- graph_from_data_frame(edge.list.df, directed = FALSE)

# You can see the vertices:
V(my.graph)
+ 4/4 vertices, named:
[1] 1 2 3 4
# And edges:
E(my.graph)
+ 5/5 edges (vertex names):
[1] 1--4 1--2 2--3 3--4 2--4
# And plot it!
plot(my.graph)