Skip to Tutorial Content

Setting up

The data we’re going to use here, “ison_algebra”, is included in the {manynet} package. Do you remember how to call the data? Can you find out some more information about it?

# Let's call and load the 'ison_algebra' dataset
data("ison_algebra", package = "manynet")
# Or you can retrieve like this:
ison_algebra <- manynet::ison_algebra
# If you want to learn more about the 'ison_algebra' dataset, use the following function (below)
?manynet::ison_algebra
data("ison_algebra", package = "manynet")
?manynet::ison_algebra
# If you want to see the network object, you can run the name of the object
ison_algebra
# or print the code with brackets at the front and end of the code
(ison_algebra <- manynet::ison_algebra)

We can see after printing the object that the dataset is multiplex, meaning that it contains several different types of ties: friendship (friends), social (social) and task interactions (tasks).

Adding names

The network is also anonymous, but I think it would be nice to add some names, even if it’s just pretend. Luckily, {manynet} has a function for this, to_named(). This makes plotting the network just a wee bit more accessible and interpretable. Let’s try adding names and graphing the network now:

ison_algebra <- to_named(ison_algebra)
autographr(ison_algebra)
ison_algebra <- to_named(ison_algebra)
autographr(ison_algebra)

Note that you will likely get a different set of names, as they are assigned randomly from a pool of (American) first names.

Separating multiplex networks

As a multiplex network, there are actually three different types of ties (friends, social, and tasks) in this network. We can extract them and graph them separately using to_uniplex():

# to_uniplex extracts ties of a single type,
# focusing on the 'friends' tie attribute here
friends <- to_uniplex(ison_algebra, "friends")
gfriend <- autographr(friends) + ggtitle("Friendship")
# now let's focus on the 'social' tie attribute
social <- to_uniplex(ison_algebra, "social")
gsocial <- autographr(social) + ggtitle("Social")
# and the 'tasks' tie attribute
tasks <- to_uniplex(ison_algebra, "tasks")
gtask <- autographr(tasks) + ggtitle("Task")
# now, let's compare each attribute's graph, side-by-side
gfriend + gsocial + gtask
# if you get an error here, you may need to install and load
# the package 'patchwork'.
# It's highly recommended for assembling multiple plots together.
# Otherwise you can just plot them separately on different lines.
friends <- to_uniplex(ison_algebra, "friends")
gfriend <- autographr(friends) + ggtitle("Friendship")

social <- to_uniplex(ison_algebra, "social")
gsocial <- autographr(social) + ggtitle("Social")

tasks <- to_uniplex(ison_algebra, "tasks")
gtask <- autographr(tasks) + ggtitle("Task")

# We now have three separate networks depicting each type of tie from the ison_algebra network:
gfriend + gsocial + gtask

Note also that these are weighted networks. autographr() automatically recognises these different weights and plots them. Where useful (less dense directed networks), autographr() also bends reciprocated arcs. What (else) can we say about these three networks?

Cohesion

Let’s concentrate on the task network for now and calculate a few basic measures of cohesion: density, reciprocity, transitivity, and components.

Density

Because this is a directed network, we can calculate the density as:

# calculating network density manually according to equation
network_ties(tasks)/(network_nodes(tasks)*(network_nodes(tasks)-1))

but we can also just use the {migraph} function…

network_density(tasks)

Note that the various measures in {migraph} print results to three decimal points by default, but the underlying result retains the same recurrence. So same result…

Closure

Next let’s calculate reciprocity in the task network. While one could do this by hand, it’s more efficient to do this using the {migraph} package. Can you guess the correct name of the function?

network_reciprocity(tasks)
# this function calculates the amount of reciprocity in the whole network

And let’s calculate transitivity in the task network. Again, can you guess the correct name of this function?

network_transitivity(tasks)
# this function calculates the amount of transitivity in the whole network

We have collected measures of the task network’s reciprocity and transitivity, but we still need to interpret these measures. These measures do not speak for themselves.

Components

Now let’s look at the friendship network, ‘friends’. We’re interested here in how many components there are. By default, the network_components() function will return the number of strong components for directed networks. For weak components, you will need to first make the network undirected. Remember the difference between weak and strong components?

network_components(friends)
# note that friends is a directed network
# you can see this by calling the object 'friends'
# or by running `manynet::is_directed(friends)`
# Now let's look at the number of components for objects connected by an undirected edge
# Note: to_undirected() returns an object with all tie direction removed, 
# so any pair of nodes with at least one directed edge 
# will be connected by an undirected edge in the new network.
network_components(to_undirected(friends))
# note that friends is a directed network
network_components(friends)
network_components(to_undirected(friends))

So we know how many components there are, but maybe we’re also interested in which nodes are members of which components? node_components() returns a membership vector that can be used to color nodes in autographr():

friends <- friends %>% 
  mutate(weak_comp = node_components(to_undirected(friends)),
         strong_comp = node_components(friends))
# node_components returns a vector of nodes' memberships to components in the network
# here, we are adding the nodes' membership to components as an attribute in the network
# alternatively, we can also use the function `add_node_attribute()`
# eg. `add_node_attribute(friends, "weak_comp", node_components(to_undirected(friends)))`
autographr(friends, node_color = "weak_comp") + ggtitle("Weak components") +
autographr(friends, node_color = "strong_comp") + ggtitle("Strong components")
# by using the 'node_color' argument, we are telling autographr to colour 
# the nodes in the graph according to the values of the 'weak_comp' attribute in the network 
friends <- friends %>% 
  mutate(weak_comp = node_components(to_undirected(friends)),
         strong_comp = node_components(friends))
autographr(friends, node_color = "weak_comp") + ggtitle("Weak components") +
autographr(friends, node_color = "strong_comp") + ggtitle("Strong components")

Community Detection

Ok, the friendship network has 3-4 components, but how many ‘groups’ are there? Just visually, it looks like there are two denser clusters within the main component.

Today we’ll use the ‘friends’ subgraph for exploring community detection methods. For clarity and simplicity, we will concentrate on the main component (the so-called ‘giant’ component) and consider friendship undirected. Can you guess how to make these changes to the ‘friends’ network?

# to_giant() returns an object that includes only the main component without any smaller components or isolates
(friends <- to_giant(friends))
(friends <- to_undirected(friends))
# now, let's graph the new network
autographr(friends)
(friends <- to_giant(friends))
(friends <- to_undirected(friends))
autographr(friends)

Comparing friends before and after these operations, you’ll notice the number of ties decreases as reciprocated directed ties are consolidated into single undirected ties, and the number of nodes decreases as two isolates are removed.

There is no one single best community detection algorithm. Instead there are several, each with their strengths and weaknesses. Since this is a rather small network, we’ll focus on the following methods: walktrap, edge betweenness, and fast greedy. (Others are included in {migraph}/{igraph}) As you use them, consider how they portray communities and consider which one(s) afford a sensible view of the social world as cohesively organized.

Walktrap

This algorithm detects communities through a series of short random walks, with the idea that nodes encountered on any given random walk are more likely to be within a community than not. It was proposed by Pons and Latapy (2005).

The algorithm initially treats all nodes as communities of their own, then merges them into larger communities, still larger communities, and so on. In each step a new community is created from two other communities, and its ID will be one larger than the largest community ID so far. This means that before the first merge we have n communities (the number of vertices in the graph) numbered from zero to n-1. The first merge creates community n, the second community n+1, etc. This merge history is returned by the function: # ?igraph::cluster_walktrap

Note the “steps=” argument that specifies the length of the random walks. While {igraph} sets this to 4 by default, which is what is recommended by Pons and Latapy, Waugh et al (2009) found that for many groups (Congresses), these lengths did not provide the maximum modularity score. To be thorough in their attempts to optimize modularity, they ran the walktrap algorithm 50 times for each group (using random walks of lengths 1–50) and selected the network partition with the highest modularity value from those 50. They call this the “maximum modularity partition” and insert the parenthetical “(though, strictly speaking, this cannot be proven to be the optimum without computationally-prohibitive exhaustive enumeration (Brandes et al. 2008)).”

So let’s try and get a community classification using the walktrap algorithm with path lengths of the random walks specified to be 50.

# let's use the node_walktrap()function to create a hierarchical, 
# agglomerative algorithm based on random walks, and assign it to
# an object

friend_wt <- node_walktrap(friends, times=50)
friend_wt # note that it prints pretty, but underlying its just a vector:
c(friend_wt)

# This says that dividing the graph into 2 communities maximises modularity,
# one with the nodes 
which(friend_wt == 1)
# and the other 
which(friend_wt == 2)
# resulting in a modularity of 
network_modularity(friends, friend_wt)
friend_wt <- node_walktrap(friends, times=50)
friend_wt # note that it prints pretty, but underlying it is just a vector:
# c(friend_wt)

# This says that dividing the graph into 2 communities maximises modularity,
# one with the nodes 
which(friend_wt == 1)
# and the other 
which(friend_wt == 2)
# resulting in a modularity of 
network_modularity(friends, friend_wt)

We can also visualise the clusters on the original network How does the following look? Plausible?

# plot 1: groups by node color

friends <- friends %>% 
  mutate(walk_comm = friend_wt)
autographr(friends, node_color = "walk_comm")
#plot 2: groups by borders

# to be fancy, we could even draw the group borders around the nodes using the node_group argument
autographr(friends, node_group = "walk_comm")
# plot 3: group and node colors

# or both!
autographr(friends, 
           node_color = "walk_comm", 
           node_group = "walk_comm") +
  ggtitle("Walktrap",
    subtitle = round(network_modularity(friends, friend_wt), 3))
# the function `round()` rounds the values to a specified number of decimal places
# here, we are telling it to round the network_modularity score to 3 decimal places,
# but the score is exactly 0.27 so only two decimal places are printed.
friends <- friends %>% 
  mutate(walk_comm = friend_wt)
autographr(friends, node_color = "walk_comm")
# to be fancy, we could even draw the group borders around the nodes using the node_group argument
autographr(friends, node_group = "walk_comm")
# or both!
autographr(friends, 
           node_color = "walk_comm", 
           node_group = "walk_comm") +
  ggtitle("Walktrap",
    subtitle = round(network_modularity(friends, friend_wt), 3))

This can be helpful when polygons overlap to better identify membership Or you can use node color and size to indicate other attributes…

Edge Betweenness

Edge betweenness is like betweenness centrality but for ties not nodes. The edge-betweenness score of an edge measures the number of shortest paths from one vertex to another that go through it.

The idea of the edge-betweenness based community structure detection is that it is likely that edges connecting separate clusters have high edge-betweenness, as all the shortest paths from one cluster to another must traverse through them. So if we iteratively remove the edge with the highest edge-betweenness score we will get a hierarchical map (dendrogram) of the communities in the graph.

The following works similarly to walktrap, but no need to set a step length.

friend_eb <- node_edge_betweenness(friends)
friend_eb

How does community membership differ here from that found by walktrap?

We can see how the edge betweenness community detection method works here: http://jfaganuk.github.io/2015/01/24/basic-network-analysis/

To visualise the result:

# create an object

friends <- friends %>% 
  mutate(eb_comm = friend_eb)
# create a graph with a title and subtitle returning the modularity score

autographr(friends, 
           node_color = "eb_comm", 
           node_group = "eb_comm") +
  ggtitle("Edge-betweenness",
    subtitle = round(network_modularity(friends, friend_eb), 3))
friends <- friends %>% 
  mutate(eb_comm = friend_eb)
autographr(friends, 
           node_color = "eb_comm", 
           node_group = "eb_comm") +
  ggtitle("Edge-betweenness",
    subtitle = round(network_modularity(friends, friend_eb), 3))

For more on this algorithm, see M Newman and M Girvan: Finding and evaluating community structure in networks, Physical Review E 69, 026113 (2004), https://arxiv.org/abs/cond-mat/0308217.

Fast Greedy

This algorithm is the Clauset-Newman-Moore algorithm. Whereas edge betweenness was divisive (top-down), the fast greedy algorithm is agglomerative (bottom-up).

At each step, the algorithm seeks a merge that would most increase modularity. This is very fast, but has the disadvantage of being a greedy algorithm, so it might not produce the best overall community partitioning, although I personally find it both useful and in many cases quite “accurate”.

friend_fg <- node_fast_greedy(friends)
friend_fg # Does this result in a different community partition?
network_modularity(friends, friend_fg) # Compare this to the edge betweenness procedure
# Again, we can visualise these communities in different ways:
friends <- friends %>% 
  mutate(fg_comm = friend_fg)
autographr(friends, 
           node_color = "fg_comm", 
           node_group = "fg_comm") +
  ggtitle("Fast-greedy",
    subtitle = round(network_modularity(friends, friend_fg), 3))
# 
friend_fg <- node_fast_greedy(friends)
friend_fg # Does this result in a different community partition?
network_modularity(friends, friend_fg) # Compare this to the edge betweenness procedure

# Again, we can visualise these communities in different ways:
friends <- friends %>% 
  mutate(fg_comm = friend_fg)
autographr(friends, 
           node_color = "fg_comm", 
           node_group = "fg_comm") +
  ggtitle("Fast-greedy",
    subtitle = round(network_modularity(friends, friend_fg), 3))

See A Clauset, MEJ Newman, C Moore: Finding community structure in very large networks, https://arxiv.org/abs/cond-mat/0408187

Two-mode network: Southern women

The next dataset, ‘ison_southern_women’, is also available in {manynet}. Let’s load and graph the data.

# let's load the data and analyze it
data("ison_southern_women")
ison_southern_women
autographr(ison_southern_women, node_color = "type")
autographr(ison_southern_women, "railway", node_color = "type")
data("ison_southern_women")
ison_southern_women
autographr(ison_southern_women, node_color = "type")

Project two-mode network into two one-mode networks

Now what if we are only interested in one part of the network? For that, we can obtain a ‘projection’ of the two-mode network. There are two ways of doing this. The hard way…

twomode_matrix <- as_matrix(ison_southern_women)
women_matrix <- twomode_matrix %*% t(twomode_matrix)
event_matrix <- t(twomode_matrix) %*% twomode_matrix

Or the easy way:

# women-graph
# to_mode1(): Results in a weighted one-mode object that retains the row nodes from
# a two-mode object, and weights the ties between them on the basis of their joint
# ties to nodes in the second mode (columns)

women_graph <- to_mode1(ison_southern_women)
autographr(women_graph)

# note that projection `to_mode1` involves keeping one type of nodes
# this is different from to_uniplex above, which keeps one type of ties in the network
# event-graph
# to_mode2(): Results in a weighted one-mode object that retains the column nodes from
# a two-mode object, and weights the ties between them on the basis of their joint ties
# to nodes in the first mode (rows)

event_graph <- to_mode2(ison_southern_women)
autographr(event_graph)
women_graph <- to_mode1(ison_southern_women)
autographr(women_graph)
event_graph <- to_mode2(ison_southern_women)
autographr(event_graph)

{manynet} also includes several other options for how to construct the projection. Please see the help file for more details.

autographr(to_mode2(ison_southern_women, similarity = "jaccard")) + ggtitle("Jaccard") +
autographr(to_mode2(ison_southern_women, similarity = "rand")) + ggtitle("Rand") +
autographr(to_mode2(ison_southern_women, similarity = "pearson")) + ggtitle("Pearson") +
autographr(to_mode2(ison_southern_women, similarity = "yule")) + ggtitle("Yule's Q")

Which women/events ‘bind’ which events/women? Let’s return to the question of cohesion.

# network_equivalency(): Calculate equivalence or reinforcement in a (usually two-mode) network

network_equivalency(ison_southern_women)
# network_transitivity(): Calculate transitivity in a network

network_transitivity(women_graph)
network_transitivity(event_graph)
network_equivalency(ison_southern_women)
network_transitivity(women_graph)
network_transitivity(event_graph)

What do we learn from this?

Task/Unit Test

  1. Produce a plot comparing 3 community detection procedures used here on a (women) projection of the ‘ison_southern_women’ dataset. Identify which you prefer, and explain why.
  2. Explain in no more than a paragraph why projection can lead to misleading transitivity measures.
  3. Explain in no more than a paragraph how structural balance might lead to group identity.

Community

by James Hollway