Networks and NetworkX

Python

network analysis

This notebook is an introduction to basic network analysis in Python.

Published

5 October 2024

This notebook introduces key concepts in network analysis pertaining to archeology and provides a hands-on tutorial to using the NetworkX Python library.

0. Prerequisites

Read through and follow the instructions in the pre-reading.
- Have Python and Jupyter set up on your device.
- Have created the networks environment.
If you have not done this, see the pre-reading.

1. What is Network analysis?

Network Analysis is a set of techniques used to study the structure and dynamics of networks. Networks are collections of objects/locations/entities (called nodes) connected by relationships (called edges). Network analysis has applications in many fields, including sociology, biology, economics, computer science, and more.

Network science in archeology is traditional network science used in an archeological context. According to Brighmans and Peeples (2023), archeological network science falls within the discipline of archeology and is not a discipline of its own: “[N]etwork science applied to archaeological research is a subset of archaeological research: it does not happen in isolation, it is not immune to the limitations of archaeological data nor does it replace archaeological theory.”

Archeological network science research process — Archeological network research process. (Brighmans and Peeples, 2023)

1.1 Key terms

Nodes in a network. — An example of a graph with nodes and edges.

Node: A node is a representation of an individual entity or actor in a network. In different contexts, nodes can be people, organizations, cities, or any other unit of analysis.

Edge: An edge represents the relationship or connection between two nodes. Edges can be directed (having a specific direction from one node to another) or undirected (no direction, implying a mutual relationship).

Nodes in a network with degree. — An example of a network with nodes colored by degree.

Degree: The degree of a node is the number of edges connected to it. In directed networks, this can be further divided into in-degree (number of incoming edges) and out-degree (number of outgoing edges).

The network above is an example of a Undirected graph, a graph with no direction. This means that if there is a connection between node A and node B, it is bidirectional - A is connected to B, and B is connected to A.

The example to the left is a directed graph: the edges between nodes have a specific direction. This means that if there is an edge from node A to node B, it does not imply there is an edge from B to A unless explicitly stated.

Density: Density is a measure that indicates how closely connected the nodes in a network are. Specifically, it refers to the ratio of the number of actual edges in the network to the maximum possible number of edges between nodes.

Centrality: Centrality measures the importance, influence, or prominence of nodes (entities) within a network. The centrality of a node tells us how “important” a node is to the aggregate network. There are many different kinds of centrality, but the four most well-known ones are degree, betweenness, closeness, and eigenvector centrality. This notebook will primarily focus on the first three.

1.2 NetworkX

NetworkX is a Python library that is used for the creation, manipulation, and visualization of complex networks. It provides tools to work with both undirected and directed networks, perform network-related calculations, and visualize the results.

A library in Python is a collection of code that makes everyday tasks more efficient. In this case working with networks becomes much simpler when using network X.

If you want to read the NetworkX documentation you can follow the NetworkX documentation link. This link shows what kind of commands exist within the NetworkX library.

1.2.1 Importing NetworkX

We can import NetworkX using the import command. At the same time, we’ll also import the matplotlib.pyplot library, for plotting graphs. Additionally, we’ll import pandas for basic data wrangling, and numpy for math. The as command allows us to use networkx commands without needing to type out networkx each time. Additionally, we’ll import the community_louvain package for the louvain clustering algorithm. Along with some other libraries for our code to function.

import matplotlib.pyplot as plt #allows us to call the matplotlib.pyplot library as 'plt'
import matplotlib.patches as mpatches #imports mpatches matplotlib subpackage 
import networkx as nx #allows us to call the networkx library as 'nx'
import pandas as pd #allows us to call the pandas library as 'pd'
import numpy as np
import geopandas as gpd
import contextily as ctx
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
import community as community_louvain
import random

#Let's us open the datasets in Google Collab easily
import os
os.system("wget https://github.com/ubcecon/ai-workshop/blob/ccss-networks.zip")
os.system("ccss-networks")

csv_path = ("https://github.com/ubcecon/ai-workshop/blob/ccss-networks/data")

1.2.2 Creating simple networks using NetworkX

We’ll start by creating a simple graph:

Below in the code we choose our nodes and edges between them.

G = nx.Graph() #creates an empty network graph

nodes = (1, 2, 3, 4, 5, 6) #our nodes, labeled 1,2,3,4,5,6.
edges = [(1, 2), (2, 3), (3, 1), (1,5), (3,5), (4, 5), (4, 6), (6, 1), (6, 3), (6,4), (4, 3), (5, 5), (3, 5)]
        #the connections between our nodes are stored in an array, containing pairs of numbers called tuples.
G.add_edges_from(edges) #the `add_edges_from()` command adds edges to the network
G.add_nodes_from(nodes) #the `add_nodes_from()` command adds nodes to the network

nx.draw(G, with_labels = True) #renders the graph in the notebook
        #the `with_labels = True` argument specifies that we want labels on the nodes.

Let’s create a directed graph using nx.DiGraph(). We’ll also set our node positions using a seed: this will ensure that each time the nodes are rendered they hold the same position on the graph. You can set the seed to any number.

G = nx.DiGraph() #creates an empty directed graph object
nodes = (1, 2, 3, 4, 5, 6) #our nodes
edges = [(1, 2), (2, 3), (3, 1), (1,5), (3,5), (4, 5), (4, 6), (6, 1), (6, 3), (6,4), (4, 3), (5, 5), (3, 5)] #our tuples stored in an array which represent our nodes
G.add_edges_from(edges) #connects edges to nodes
G.add_nodes_from(nodes) #connects edges to nodes

position = nx.spring_layout(G, seed=100)

#nx.draw plots our network
nx.draw(G, pos = position, with_labels = True) # `pos` argument assigns a position to each node

1.3 Creating Random Graphs

Instead of creating a graph with predetermined positions of nodes and edges we can also generate a random graph with a set amount of nodes and edges. Below you can change the amount of nodes and edges by changing n and d which correspond to the number of nodes and the degree (number of edges) that each node has. Creating a random graph could be more helpful for testing or when you want to try something and don’t wish to spend time plotting a real network and determining paths for all edges and nodes.

The first most basic command we will use is the nx.random_regular_graph command. Which generates a random regular graph.

# Set a seed for reproducibility so that everytime the code runs we get the same random graph
random.seed(42)

# Parameters
n = 20  # number of nodes
d = 3   # degree of each node

# Generate the random regular graph
rr_graph = nx.random_regular_graph(d, n)

# Visualize the graph, you can change the size, color, font and node size. 
plt.figure(figsize=(8, 6)) 
nx.draw(rr_graph, with_labels=True, node_color='lightgreen', node_size=500, font_size=10, font_weight='bold')
plt.title("Random Regular Graph")
plt.show()

# Print some basic information about the graph
print(f"Number of nodes: {rr_graph.number_of_nodes()}")
print(f"Number of edges: {rr_graph.number_of_edges()}")
print(f"Degree of each node: {d}")

Number of nodes: 20
Number of edges: 30
Degree of each node: 3

Another option is using the Erdős-Rényi model which can be accessed using the nx.erdos_renyi_graph(n, p) command. This command has two inputs n and p. N is the number of nodes and p is the probability of edge creation to each node.

The Erdős–Rényi model refers to one of two closely related models for generating random graphs or the evolution of a random network. These models are named after Hungarian mathematicians Paul Erdős and Alfréd Rényi, who introduced one of the models in 1959

# Set a seed for reproducibility so that everytime the code runs we get the same random graph
random.seed(43)

# Parameters
n = 20  # number of nodes
p = 0.2  # probability of edge creation

# Generate the Erdős-Rényi random graph
er_graph = nx.erdos_renyi_graph(n, p)

# Visualize the graph
plt.figure(figsize=(8, 6))
nx.draw(er_graph, with_labels=True, node_color='lightblue', node_size=500, font_size=10, font_weight='bold')
plt.title("Erdős-Rényi Random Graph")
plt.show()

# Print some basic information about the graph
print(f"Number of nodes: {er_graph.number_of_nodes()}")
print(f"Number of edges: {er_graph.number_of_edges()}")
print(f"Average degree: {sum(dict(er_graph.degree()).values()) / n:.2f}")

Number of nodes: 20
Number of edges: 40
Average degree: 4.00

There are more commands in NetworkX to generate random graphs, but the two above demonstrate two common methods of random graph generations. The first being a set number of nodes and edges and the second being a set number of nodes and a probability of edge creation between them.

2. Degrees, Density and Weights

2.1 Degrees

The degree of a node is the number of edges that are connected to a node. The degree of a node \(N\) is denoted as \(deg(N)\). The maximum degree of a network \(G\) is denoted by \(\Delta(G)\) and is the degree of the node with the highest degree in the network. Conversely, the minimum degree is denoted as \(\delta(G)\).

If a node on a graph with \(n\) nodes has degree \(n-1\) it is called a dominating vertex. Not every graph has a dominating vertex.

We can see the degree of each node by running dict(G.degree()). This create a dictionary of key-value pairs for our network, where each key is the name of the node and the value is it’s respective degree.

degrees = dict(G.degree())

If we want to see the degree of node \(n\), we can do so by running print(degrees[n]). For instance:

print(degrees[1])

Let’s color the nodes of our graph based on their degree. We’ll create a function called get_node_colors which takes in the degree dictionary of each node and returns a color. We’ll then create a for-loop that iterates over each nodes in the list of nodes, gets the color of each node using the get_node_colors function we defined earlier, and appends it to an empty list called color_map.

degrees = dict(G.degree())
nodes = list(G.nodes())

def get_node_colors(degree):
    if degree in [1, 2]:
        return 'blue'
    elif degree in [3, 4]:
        return 'green'
    elif degree in [5, 6]:
        return 'yellow'
    else:
        return 'red' 

color_map = [] #`color_map` is an empty list

for node in nodes:
  color = get_node_colors(degrees[node]) # get color of current node using node_colors according to degree of node
  color_map.append(color) # appends color of each node to color_map for each node in nodes

print(degrees)
print(nodes)
print(color_map)

{1: 4, 2: 2, 3: 5, 5: 5, 4: 4, 6: 4}
[1, 2, 3, 5, 4, 6]
['green', 'blue', 'yellow', 'yellow', 'green', 'green']

The \(n\)-th entry in color_map corresponds to the \(n\)-th node in nodes. For instance, color_map[0] returns the color of the first node (1).

color_map[0]

'green'

We can now color the nodes of our graph, using the color map we defined above. The node_color argument takes in an array or list of colors that it uses to color each node.

G = nx.DiGraph() # creates an empty directed graph object
nodes = (1, 2, 3, 4, 5, 6) 
edges = [(1, 2), (2, 3), (3, 1), (1,5), (3,5), (4, 5), (4, 6), (6, 1), (6, 3), (6,4), (4, 3), (5, 5), (3, 5)]
G.add_edges_from(edges) 
G.add_nodes_from(nodes) 

position = nx.spring_layout(G, seed=100)

nx.draw(G, pos = position, node_color=color_map, with_labels=True) 
    # node_color argument colors the nodes based on a given list or array of colors, 
    # with the first color corresponding to the first node, second to the second node, etc.

Let’s also add a legend to our graph, which gives information about the meaning of each color. We’ll do this using the mpatches subpackage we imported earlier.

blue_patch = mpatches.Patch(color='blue', label='1-2 edges') 
green_patch = mpatches.Patch(color='green', label='3-4 edges')
yellow_patch = mpatches.Patch(color='yellow', label='5-6 edges')
plt.legend(handles=[blue_patch, green_patch, yellow_patch]) #adds legend to the plot

nx.draw(G, pos = position, node_color=color_map, with_labels=True)

2.2 Density

Density is defined as:

\[ \text{Density} = \frac{\text{Number of Possible Edges}}{\text{Number of Actual Edges}} \]

In an undirected graph, the total number of edges is \(\frac{V\times(V-1)}{2}\), where V is the total number of nodes. In a directed graph, the total number of edges is \(V\times(V-1)\), because a connection between point A and point B can either be from point A to point B, or to point A from point B (hence multiplying by 2).

Note that self-loops (edges from and to the same node) are counted in the total number of edges but not in the maximum number of edges so graphs can have a density greater than 1.

The formula for undirected graph density is:

\[ \frac{2E}{V(V-1)} \]

And for directed graphs, it is:

\[ \frac{E}{V(V-1)} \]

Where \(E\) is the number of edges in our graph and \(V\) is the number of nodes.

We can calculate the density of our graph:

nx.density(G)

0.4

2.3 Weights

Often times, you may end up working with weighted graphs: for instance, these weights could correspond to popularity of roads in road networks, or the number of artifacts in common between sites.

We’ll standardize our weights to be between 1 and 2 (as otherwise the results are messy). We’ll do this using a for-loop, like we did with the degrees.

G_weights = nx.DiGraph() #creating a new graph object called G_weights
nodes = [1, 2, 3, 4, 5, 6]
edges = [(1, 2), (2, 3), (3, 1), (1,5), (3,5), (4, 5), (4, 6), (6, 1), (6, 3), (6,4), (4, 3), (5, 5), (3, 5)]
weights = [100, 50, 75, 50, 60, 100, 100, 75, 40, 50, 50, 100, 100] #add list of weights
G_weights.add_edges_from(edges) 
G_weights.add_nodes_from(nodes) 

adjusted_weights = []
for weight in weights:
    adjusted_weight = 1+ (max(weights)-weight)/(max(weights)-min(weights)) #standardizes weights to be between 1 and 2
    adjusted_weights.append(adjusted_weight)

position = nx.spring_layout(G, seed=100)

print(adjusted_weights)
nx.draw(G_weights, pos = position, width = adjusted_weights, with_labels = True) 
    # width argument take in a list or array of numbers corresponding to weights

[1.0, 1.8333333333333335, 1.4166666666666667, 1.8333333333333335, 1.6666666666666665, 1.0, 1.0, 1.4166666666666667, 2.0, 1.8333333333333335, 1.8333333333333335, 1.0, 1.0]

This is great, but the results aren’t very clear. Let’s add a color gradient to the edges to represent different weights.

norm = plt.Normalize(min(weights), max(weights), clip=False) 
    #`plot.normalizes` normalizes the weights such that they are evenly distributed across the gradient spectrum
edge_colors = plt.cm.Greys(norm(weights)) 
    # norm(weights) normalizes the weights 
    # plot.cm.greys() assigns the weights to color values
    # edge_colors is a multidimensional array of RGBA color values corresponding to each edge

fig, ax = plt.subplots() #explicitly specifying figure and axes in order to create a color bar

nx.draw(G_weights, pos=position, edge_color=edge_colors, width=adjusted_weights, with_labels=True, ax=ax) 
    #ax = ax argument needed for color bar

# Adding color bar
sm = plt.cm.ScalarMappable(cmap="Greys", norm=norm) # creates a scalarmappable object which acts 
                                                    # as a bridge between the numerical weight values and color map
plt.colorbar(sm, ax=ax) #plotting color bar

3. Adjacency matrices

An Adjacency matrix is a method of representing graphs in matrix form. In an adjacency matrix, the rows and columns correspond to the vertices (or nodes) of the graph. The entries of the matrix indicate whether pairs of vertices are adjacent or not in the graph. Normally, a value of 1 is assigned to entries where an edge is present, and 0 is assigned to entries where an edge is not. For a weighed graph, the weight of the edge is represented as a numerical value for entries where an edge is present.

We can convert our simple graph to an adjacency matrix:

nx.to_pandas_adjacency(G)

	1	2	3	5	4	6
1	0.0	1.0	0.0	1.0	0.0	0.0
2	0.0	0.0	1.0	0.0	0.0	0.0
3	1.0	0.0	0.0	1.0	0.0	0.0
5	0.0	0.0	0.0	1.0	0.0	0.0
4	0.0	0.0	1.0	1.0	0.0	1.0
6	1.0	0.0	1.0	0.0	1.0	0.0

If we want to use our weighted graph, we can use the following code:

# len(edges) returns the total number of entries in the list of edges.
# range(len(edges)): This generates a sequence of numbers from 0 to n-1 where n is len(edges), 
    #so the for-loop will run n times with i taking each value in that range, one at a time.

for i in range(len(edges)):
    edge = edges[i] # retrieves the edge at position i in the list of edges
    weight = weights[i] # retrieves the weight at position i in the list of weights
    G_weights.add_edge(edge[0], edge[1], weight=weight) # adds an edge with a weight to the graph 
    
nx.to_pandas_adjacency(G_weights, nodelist=nodes, weight='weight') #converts to pandas adjacency matrix with the weights in place

	1	2	3	4	5	6
1	0.0	100.0	0.0	0.0	50.0	0.0
2	0.0	0.0	50.0	0.0	0.0	0.0
3	75.0	0.0	0.0	0.0	100.0	0.0
4	0.0	0.0	50.0	0.0	100.0	100.0
5	0.0	0.0	0.0	0.0	100.0	0.0
6	75.0	0.0	40.0	50.0	0.0	0.0

We can visualize our matrix using the code below. Note that instead of using nx.to_pandas_adjacency we use nx.to_numpy_array: this allows us to store the matrix in the form of an array.

adj_matrix = nx.to_numpy_array(G_weights, nodelist=nodes, weight='weight')


plt.figure(figsize=(8, 8)) #displays data as an image on a 2d raster; in our case, a numpy array

plt.imshow(adj_matrix, cmap='gray_r')

for i in range(adj_matrix.shape[0]): #loops through each row of the matrix
    for j in range(adj_matrix.shape[1]): #for each row, loops through each column of the matrix
        plt.text(j, i, int(adj_matrix[i, j]),
                 ha='center', va='center', color='red', size=30) #prints the value at that position in the matrix on the graph

plt.title('Adjacency Matrix Visualization')
plt.xlabel('Node Index')
plt.ylabel('Node Index')

Text(0, 0.5, 'Node Index')

4. Working with archeological datasets

The ICRATES database of tablewares in the Roman East is a dataset of red slip tablewares between the late Hellenistic and late Roman periods in the east Mediterranean (Bes, 2015). The dataset is in fact comprised of two datasets: the first is a dataset of nodes with their respected connections, as well as weights for each edge and other information. The second dataset contains geographical coordinates for each node in the network. We will use these coordinates to add a geospatial component to our analysis, by placing each node in it’s respective geographical coordinates and adding a map overlay. We begin by demonstrating how to import and visualize the dataset in python, and then introduce various centrality and connectivity measures.

Importantly, the nodes in the first dataset are categorized by time period. We will initially focus on a singular time period, but will showcase how to graph multiple time frames towards the end of the notebook.

4.1 Importing datasets in Python

We can import datasets in python using pandas’ read_ function. In our case, both network and location datasets are stored in excel files. We can thus use the read_excel function.

df_edges = pd.read_excel("network_analysis_datasets/vistorian_network.xls") #loading in dataset

G_directed = nx.from_pandas_edgelist(df_edges, source="SOURCE_LOCATION", target="TARGET_LOCATION", create_using=nx.DiGraph()) #creating a graph object
G = nx.from_pandas_edgelist(df_edges, source="SOURCE_LOCATION", target="TARGET_LOCATION")

4.2 Adding a geospacial component to our graph

We could just plot this network right away, but this wouldn’t convey good information, as the nodes in our dataset represent real places. Instead, let’s bind each node to it’s respective geographical location using the geopandas library.

df_pos = pd.read_excel("network_analysis_datasets/vistorian_locations.xls", index_col=0)
 # loading in coordinates of nodes; index_col specifies that the first column (with index 0) in the Excel sheet should be used as the row labels for the DataFrame

gdf_pos = gpd.GeoDataFrame(df_pos, geometry=gpd.points_from_xy(df_pos['LONGITUDE'], df_pos['LATITUDE']), crs='EPSG:4326') 
 # creates a geodataframe, a special dataframe from the geopandas library for storing geographic datapoints.
    # crs='EPSG:4326' argument defines the Coordinate Reference System for the geometric data: in our case, the location dataset uses the WGS84 coordinate system

gdf_pos = gdf_pos.to_crs('EPSG:3857') #reprojecting to web mercator for the world map overlay to accurately show up on our graph

gdf_pos #returns a GeoDataframe of all the nodes and their respective positions

	GEONAME	LONGITUDE	LATITUDE	geometry
NODE_NAME
Abdera	Abdera	24.973047	40.933650	POINT (2779986.9 5002559.974)
Aizanoi	Aizanoi	29.609819	39.201047	POINT (3296150.023 4750510.914)
Alexandreia	Alexandreia	29.907915	31.198246	POINT (3329333.832 3658521.969)
Altinum	Altinum	12.390766	45.557134	POINT (1379333.798 5709661.408)
Amathous	Amathous	33.142972	34.711781	POINT (3689458.791 4124781.928)
...	...	...	...	...
Tel_Anafa	Tel_Anafa	35.645039	33.176975	POINT (3967987.579 3918818.063)
Tenos	Tenos	25.152483	37.591633	POINT (2799961.636 4521896.839)
Thasos	Thasos	24.713428	40.779792	POINT (2751086.196 4979915.044)
Troia/Ilion	Troia/Ilion	26.238967	39.957667	POINT (2920908.409 4859792.421)
Veloukovo	Veloukovo	22.179072	38.534158	POINT (2468963.026 4655161.727)

69 rows × 4 columns

gdf_pos.loc['Abdera'] 
# the loc accessor in pandas (and by extension, GeoPandas, which extends pandas) is used to access a group of rows and columns by labels. This will be useful for our next step.

GEONAME                                            Abdera
LONGITUDE                                       24.973047
LATITUDE                                         40.93365
geometry     POINT (2779986.9000867764 5002559.974384146)
Name: Abdera, dtype: object

Great! we now have a GeoDataFrame of each node in our graph, with their respective position. However, networkX’s requires a dictionary of node-location pairs for properly mapping each node. Let’s create a new dictionary using the dataframe above.

positions = {} #empty dictionary which will contain the positions of each node
for location in gdf_pos.index: # iterates over each index in the gdf_pos GeoDataFrame. The index in gdf_pos is the names of all our nodes.
    x = gdf_pos.loc[location].geometry.x  # For each location, this line accesses the `geometry` column to retrieve the x-coordinate (latitude).
    y = gdf_pos.loc[location].geometry.y  # Same thing but for y-coordinate (longitude)
    positions[location] = (x, y)  # Adding to the dictionary with the location as the key

Now we have a dictionary containing each node, and it’s geographic position. We can now plot our graph!

fig, ax = plt.subplots(figsize=(9, 9))# Creating subplots so that we can overlay the network and the map
gdf_pos.plot(ax=ax, alpha=0)

ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)# Adding the basemap: we are using CartoDB for this

# Drawing the network edges and nodes, as usual
nx.draw_networkx_edges(G, pos=positions, ax=ax, alpha=0.05)
nx.draw_networkx_nodes(G, pos=positions, ax=ax, node_size=20, node_color="red", edgecolors="black", alpha=0.8)

We can focus on a specific geographical area by specifying longitudonal and latitudonal limits and filtering the locations dataset for locations that fall within those limits. From now on, let’s focus our analysis on the area surrounding the Aegean Sea.

gdf_pos = gpd.GeoDataFrame(df_pos, geometry=gpd.points_from_xy(df_pos['LONGITUDE'], df_pos['LATITUDE']), crs='EPSG:4326')
gdf_pos = gdf_pos.to_crs('EPSG:3857')

# We create a "bonding box" which contains the outer limits of our positions.
min_x, min_y = 1858948, 4055442
max_x, max_y = 3336323, 5175704

gdf_pos_filtered = gdf_pos.cx[min_x:max_x, min_y:max_y] # Filtering the GeoDataFrame to include only points within the bounding box
filtered_nodes = gdf_pos_filtered.index.tolist()

G_sub = G.subgraph(filtered_nodes) # Create a subgraph from the original graph G using only the filtered nodes

# Prepare positions for the filtered subgraph
positions_filtered = {}
for loc in gdf_pos_filtered.index:
    x = gdf_pos_filtered.loc[loc].geometry.x
    y = gdf_pos_filtered.loc[loc].geometry.y
    positions_filtered[loc] = (x, y)

# Plotting the filtered subgraph
fig, ax = plt.subplots(figsize=(9, 9))
gdf_pos_filtered.plot(ax=ax, alpha=0)
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)

nx.draw_networkx_edges(G_sub, pos=positions_filtered, ax=ax, alpha=0.1)
nx.draw_networkx_nodes(G_sub, pos=positions_filtered, ax=ax, node_size=60, node_color="red", edgecolors="black", alpha=0.8)

plt.show()

4.3 Working with weighed graphs

If we print the df_edges dataframe, we see that it contains WEIGHT and TYPE columns, that correspond to each edge in the network. The TYPE category corresponds to the type of pottery linking two sites, and the WEIGHT cateogry corresponds to the relative importance of each edge.

print(df_edges)

print(max(df_edges['WEIGHT']))
print(min(df_edges['WEIGHT']))

print(df_edges['TYPE'].unique())

             FROM SOURCE_LOCATION                 TO    TARGET_LOCATION  \
0     Alexandreia     Alexandreia              Assos              Assos   
1     Alexandreia     Alexandreia          Anemorion          Anemorion   
2     Alexandreia     Alexandreia  Apollo_Smintheion  Apollo_Smintheion   
3     Alexandreia     Alexandreia          Tel_Anafa          Tel_Anafa   
4     Alexandreia     Alexandreia        Troia/Ilion        Troia/Ilion   
...           ...             ...                ...                ...   
1586       Tarsos          Tarsos             Thasos             Thasos   
1587       Tarsos          Tarsos          Tel_Anafa          Tel_Anafa   
1588        Tenos           Tenos        Troia/Ilion        Troia/Ilion   
1589        Tenos           Tenos             Thasos             Thasos   
1590        Tenos           Tenos             Thasos             Thasos   

      WEIGHT TYPE  PERIOD  
0          3  ESC      10  
1          1  ESC      10  
2          1  ESC      10  
3          2  ESC      10  
4          1  ESC      10  
...      ...  ...     ...  
1586       1  ESB      12  
1587       1  ESB      12  
1588       1  ESB      12  
1589       1  ESB      12  
1590       0  NaN      13  

[1591 rows x 7 columns]
17
0
['ESC' 'ESB' nan]

We can see that the WEIGHT cateory has values that range from 0 to 17, and the TYPE categories has two pottery types: ESB, ESC, as well as undefined pottery types (NaN). Let’s change the width of each edge in the network to reflect it’s weight, and the color of each edge to reflect it’s type.

This is made slightly difficult as we are working with a subgraph and the TYPE and WIDTH categories contain entries for the full graph.

df_edges

	FROM	SOURCE_LOCATION	TO	TARGET_LOCATION	WEIGHT	TYPE	PERIOD
0	Alexandreia	Alexandreia	Assos	Assos	3	ESC	10
1	Alexandreia	Alexandreia	Anemorion	Anemorion	1	ESC	10
2	Alexandreia	Alexandreia	Apollo_Smintheion	Apollo_Smintheion	1	ESC	10
3	Alexandreia	Alexandreia	Tel_Anafa	Tel_Anafa	2	ESC	10
4	Alexandreia	Alexandreia	Troia/Ilion	Troia/Ilion	1	ESC	10
...	...	...	...	...	...	...	...
1586	Tarsos	Tarsos	Thasos	Thasos	1	ESB	12
1587	Tarsos	Tarsos	Tel_Anafa	Tel_Anafa	1	ESB	12
1588	Tenos	Tenos	Troia/Ilion	Troia/Ilion	1	ESB	12
1589	Tenos	Tenos	Thasos	Thasos	1	ESB	12
1590	Tenos	Tenos	Thasos	Thasos	0	NaN	13

1591 rows × 7 columns

G_sub_weighted = G.subgraph(filtered_nodes)

# creating a new column called `edge_tuple` which contains tuples of the source and target locations.
# We such a column for filtering the weight and type categories to only contain edges within G_sub_weighted and not G.
edge_tuples = []
for index, row in df_edges.iterrows(): #Iterating over the rows of df_edges using iterrows() 
    edge_tuple = (row['SOURCE_LOCATION'], row['TARGET_LOCATION'])  # Creating a tuple from the 'SOURCE_LOCATION`` and 'TARGET_LOCATION' columns
    edge_tuples.append(edge_tuple)  # Appending each tuple to the list
df_edges['edge_tuple'] = edge_tuples #  appending edge_tuples to df_edges 

subgraph_edges = list(G_sub_weighted.edges()) #listing all the edges in the subgraph
filtered_edges = df_edges[df_edges['edge_tuple'].isin(subgraph_edges)] #filtering df_edges to only contain edges in the subgraph
filtered_edges.reset_index(drop=True, inplace=True) #the index gets all sliced up when filtering rows, this makes it normal again

edge_colors = [] #create an empty set which contains the colors of the artifacts ESB, ESC
for type in filtered_edges['TYPE']: 
    if type == 'ESB':
        edge_colors.append('maroon')
    elif type == 'ESC':
        edge_colors.append('navy')
    else:
        edge_colors.append('plum') 

# Set the weights of the edges
weights = filtered_edges['WEIGHT']
adjusted_weights = []
for weight in weights:
    adjusted_weight = 0.1 + ((max(weights)- min(weights))/20)  
    adjusted_weights.append(adjusted_weight)

fig, ax = plt.subplots(figsize=(9, 9))
gdf_pos_filtered.plot(ax=ax, alpha=0)
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)

#Plot the network on the graph
nx.draw_networkx_nodes(G_sub_weighted, pos=positions_filtered, ax=ax, node_size=60, node_color="red", edgecolors="black", alpha=0.8)
nx.draw_networkx_edges(G_sub_weighted, pos=positions_filtered, ax=ax, width=adjusted_weights, edge_color=edge_colors, alpha=0.3)

#Show our graph
patch_maroon = mpatches.Patch(color='maroon', label='ESB') 
patch_navy = mpatches.Patch(color='navy', label='ESC') 
plt.legend(handles=[patch_maroon, patch_navy])
plt.show()

#Print information about the types of edges
print(adjusted_weights)
print(filtered_edges)

[0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85]
                  FROM    SOURCE_LOCATION           TO TARGET_LOCATION  \
0            Amygdalea          Amygdalea        Argos           Argos   
1            Amygdalea          Amygdalea      Knossos         Knossos   
2            Amygdalea          Amygdalea        Stobi           Stobi   
3    Apollo_Smintheion  Apollo_Smintheion  Troia/Ilion     Troia/Ilion   
4    Apollo_Smintheion  Apollo_Smintheion        Argos           Argos   
..                 ...                ...          ...             ...   
534              Stobi              Stobi        Tenos           Tenos   
535              Stobi              Stobi      Tanagra         Tanagra   
536              Stobi              Stobi       Thasos          Thasos   
537              Tenos              Tenos       Thasos          Thasos   
538              Tenos              Tenos       Thasos          Thasos   

     WEIGHT TYPE  PERIOD                        edge_tuple  
0         1  ESC      10                (Amygdalea, Argos)  
1         1  ESC      10              (Amygdalea, Knossos)  
2         1  ESC      10                (Amygdalea, Stobi)  
3         7  ESC      10  (Apollo_Smintheion, Troia/Ilion)  
4         5  ESC      10        (Apollo_Smintheion, Argos)  
..      ...  ...     ...                               ...  
534       2  ESB      12                    (Stobi, Tenos)  
535       1  ESB      12                  (Stobi, Tanagra)  
536       1  ESB      12                   (Stobi, Thasos)  
537       1  ESB      12                   (Tenos, Thasos)  
538       0  NaN      13                   (Tenos, Thasos)  

[539 rows x 8 columns]

This works, but the weightings are hard to see. One alternative is to filter the subgraph by weight, and graph the different weights side-by-side:

# creating a function that creates a list of colors, then apply the function to each subset of edges filtered by weight
def get_edge_colors(filtered_edges):
    edge_colors = []
    for type in filtered_edges['TYPE']: 
        if type == 'ESB':
            edge_colors.append('maroon')
        elif type == 'ESC':
            edge_colors.append('navy')
        else:
            edge_colors.append('green')
    return edge_colors

# Filtering the edges by weight
filtered_edges_w1 = filtered_edges[filtered_edges['WEIGHT'] == 1]
filtered_edges_w2 = filtered_edges[filtered_edges['WEIGHT'] == 2]
filtered_edges_w3 = filtered_edges[filtered_edges['WEIGHT'] == 3]

# applying the get_edge_colors function on each subset of edges
edge_colors_w1 = get_edge_colors(filtered_edges_w1)
edge_colors_w2 = get_edge_colors(filtered_edges_w2)
edge_colors_w3 = get_edge_colors(filtered_edges_w3)

fig, axes = plt.subplots(1, 3, figsize=(18, 6)) #creating a plot; 1, 3 creates 3 subplots spaced horizontally

# Graph 1: weight = 1
axes[0].set_title('Weight 1') #axes[n] specifies which plot to plot the graph in: axes[0] is plot 1.

gdf_pos_filtered.plot(ax=axes[0], alpha=0)
ctx.add_basemap(axes[0], source=ctx.providers.CartoDB.Positron)

nx.draw_networkx_nodes(G_sub_weighted, pos=positions_filtered, ax=axes[0], node_size=60, node_color="red", edgecolors="black", alpha=0.8)
nx.draw_networkx_edges(G_sub_weighted, pos=positions_filtered, ax=axes[0], edgelist=list(filtered_edges_w1['edge_tuple']),
                       edge_color=edge_colors_w1, alpha=0.4) #`list(filtered_edges_w1['edge_tuple'])` extracts the edge_tuple column and converts the entries into a list

# Graph 2: weight = 2
axes[1].set_title('Weight 2')
gdf_pos_filtered.plot(ax=axes[1], alpha=0)
ctx.add_basemap(axes[1], source=ctx.providers.CartoDB.Positron)
nx.draw_networkx_nodes(G_sub_weighted, pos=positions_filtered, ax=axes[1], node_size=60, node_color="red", edgecolors="black", alpha=0.8)
nx.draw_networkx_edges(G_sub_weighted, pos=positions_filtered, ax=axes[1], edgelist=list(filtered_edges_w2['edge_tuple']),
                    edge_color=edge_colors_w2, alpha=0.4)

# Graph 3: weight = 3
axes[2].set_title('Weight 3')
gdf_pos_filtered.plot(ax=axes[2], alpha=0)
ctx.add_basemap(axes[2], source=ctx.providers.CartoDB.Positron)
nx.draw_networkx_nodes(G_sub_weighted, pos=positions_filtered, ax=axes[2], node_size=60, node_color="red", edgecolors="black", alpha=0.8)
nx.draw_networkx_edges(G_sub_weighted, pos=positions_filtered, ax=axes[2], edgelist=list(filtered_edges_w3['edge_tuple']),
                       edge_color=edge_colors_w3, alpha=0.4)

patch_maroon = mpatches.Patch(color='maroon', label='ESB') 
patch_navy = mpatches.Patch(color='navy', label='ESC') 
axes[0].legend(handles=[patch_maroon, patch_navy], loc='upper left')
axes[1].legend(handles=[patch_maroon, patch_navy], loc='upper left')
axes[2].legend(handles=[patch_maroon, patch_navy], loc='upper left')
plt.show()

5. Measures of Centrality

Centrality is defined as the set of metrics used to determine the importance or influence of a particular node within a network. It helps to identify which nodes hold strategic significance in terms of connectivity, information flow, or influence over other nodes. Various centrality metrics, such as degree, betweenness, and eigenvector centrality, provide different perspectives on the role each node plays within the network’s overall structure.

5.1 Network Distance and Eccentricity

Before talking about centrality, we first need to talk a bit about distance. Distance, also known as Geodesic distance, is defined as the number of edges traversed by the shortest path between two nodes.

The distance between a node and itself is 0.
The distance between a node and a node for which no shortest path exists (such as a node that is disconnected from other nodes) is \(\infty\).
The distance between a node and it’s neighbor is 1.

A node’s eccentricity is the maximum distance from said node to all other nodes in the graph. For instance, in the following network, the eccentricity of node \(A\) is 2, but the eccentricity of node \(B\) is 1.

nodes = ("A","B", "C")
edges = [("A","B"), ("B", "C")]

G_example = nx.Graph()
G_example.add_edges_from(edges)
G_example.add_nodes_from(nodes)

color_map = ["salmon", "lightblue", "salmon"]


red_patch = mpatches.Patch(color='salmon', label='eccentricity = 1') 
blue_patch = mpatches.Patch(color='lightblue', label='eccentricity = 2') 
plt.legend(handles=[blue_patch, red_patch])

nx.draw(G_example, node_color=color_map, with_labels=True)

If we color the nodes of our Aegean sea pottery network by eccentricity, we see an interesting result:

#Bring in our data to be used
gdf_pos = gpd.GeoDataFrame(df_pos, geometry=gpd.points_from_xy(df_pos['LONGITUDE'], df_pos['LATITUDE']), crs='EPSG:4326')
gdf_pos = gdf_pos.to_crs('EPSG:3857')

#Set a bounding box on our data
min_x, min_y = 1858948, 4055442
max_x, max_y = 3336323, 5175704

#Use the bounding box on our data
gdf_pos_filtered = gdf_pos.cx[min_x:max_x, min_y:max_y] 
filtered_nodes = gdf_pos_filtered.index.tolist()

G_sub_eccentricity = G.subgraph(filtered_nodes) 

eccentricities = nx.eccentricity(G_sub_eccentricity) #calculating eccentricities
eccentricities_array = np.array(list(eccentricities.values()))

#Set a colour map for different values of eccentricity in our network
color_map = []
for eccentricity in eccentricities_array:
    if eccentricity == 1:
        color_map.append("palegreen")
    elif eccentricity == 2:
        color_map.append("slateblue")
    elif eccenticity == 3:
        color_map.append("orange")
    else:
        color_map.append("grey")

#Filter oyur graph
positions_filtered = {}
for loc in gdf_pos_filtered.index:
    x = gdf_pos_filtered.loc[loc].geometry.x
    y = gdf_pos_filtered.loc[loc].geometry.y
    positions_filtered[loc] = (x, y)

fig, ax = plt.subplots(figsize=(9, 9))
gdf_pos_filtered.plot(ax=ax, alpha=0)
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)

#Draw our graph with different colours for eccentricity
nx.draw_networkx_edges(G_sub_eccentricity, pos=positions_filtered, ax=ax, alpha=0.1)
nx.draw_networkx_nodes(G_sub_eccentricity, pos=positions_filtered, node_color=color_map, ax=ax, node_size=60, edgecolors="black", alpha=0.8)

patch = mpatches.Patch(color='slateblue', label='eccentricity = 2') 
plt.legend(handles=[patch])

#show the graph
plt.show()

All nodes are at most 2 edges away from every other other node!

5.2 Degree Centrality

Degree centrality is simple: Recall that the degree of a node is the number of nodes directly connected to it. In degree centrality, the more adjacent nodes, the more important the network is considered to be. Degree centrality is used primarily in social networks, where nodes with higher degrees are commonly major channels of information. A high degree means a node has many direct ties with other nodes, and has better access to resources within the network.

Note that the networkX nx.degree_centrality() function normalizes each node’s degree by dividing by the maximum possible degree in the network. Therefore for graphs without self-loops (such as the one we are working with) the degree centrality is always \(\leq 1\). For educational purposes, we un-normalize the degree values, but this is not common practice.

We can calculate the degree centrality of all our nodes in our network:

#Bring in our dataset
gdf_pos = gpd.GeoDataFrame(df_pos, geometry=gpd.points_from_xy(df_pos['LONGITUDE'], df_pos['LATITUDE']), crs='EPSG:4326')
gdf_pos = gdf_pos.to_crs('EPSG:3857')

#Set a bounding box to limit the scope of the graph
min_x, min_y = 1858948, 4055442
max_x, max_y = 3336323, 5175704

#Filter our graph using the bounding box 
gdf_pos_filtered = gdf_pos.cx[min_x:max_x, min_y:max_y] 
filtered_nodes = gdf_pos_filtered.index.tolist()

G_sub = G.subgraph(filtered_nodes) 

#Find the centrality values 
centrality = nx.degree_centrality(G_sub)
centrality_values = np.array(list(centrality.values()))
n = len(centrality_values)
original_degrees = centrality_values * (n - 1)

#Colour our map based on the degree centrality 
color_map=[]
for degree in original_degrees:
    if degree == max(original_degrees):
        color_map.append("indianred")
    else:
        normalized_value = degree / max(original_degrees)  # Normalize the degree to the range [0, 1]
        color_map.append(plt.cm.winter(normalized_value))

cmap = plt.cm.winter

#Filter our graph 
positions_filtered = {}
for loc in gdf_pos_filtered.index:
    x = gdf_pos_filtered.loc[loc].geometry.x
    y = gdf_pos_filtered.loc[loc].geometry.y
    positions_filtered[loc] = (x, y)


fig, ax = plt.subplots(figsize=(9, 9))
gdf_pos_filtered.plot(ax=ax, alpha=0)
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)


edges= nx.draw_networkx_edges(G_sub, pos=positions_filtered, ax=ax, alpha=0.1)
nodes = nx.draw_networkx_nodes(G_sub, pos=positions_filtered, node_color=color_map, cmap=cmap, ax=ax, node_size=60, edgecolors="black", alpha=1)


# Adding the colorbar using the inset_axes function
axins = inset_axes(ax,
                   width="5%", 
                   height="30%",  
                   loc='lower left', 
                   bbox_to_anchor=(0.03, 0.03, 1, 1), 
                   bbox_transform=ax.transAxes, 
                   borderpad=0.3)

# Creating colorbar
norm = plt.Normalize(vmin=min(original_degrees), vmax=max(original_degrees))
sm = plt.cm.ScalarMappable(cmap=cmap, norm=norm)
sm.set_array(original_degrees)
plt.colorbar(sm, cax=axins, orientation="vertical")

patch = mpatches.Patch(color='indianred', label='maximum degree (degree = 36)') 
plt.legend(handles=[patch], loc='center left', bbox_to_anchor=(10, 3))
plt.show()

#Print some information about our graph
print("minimum degree:", min(original_degrees))
print("maximum degree:", max(original_degrees))
print("difference:", max(original_degrees)-min(original_degrees))

D:\Users\alexr\anaconda3\Lib\site-packages\networkx\drawing\nx_pylab.py:457: UserWarning:

No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored

minimum degree: 3.0000000000000004
maximum degree: 39.0
difference: 36.0

5.3 Closeness Centrality

Closeness centrality is a measure of how close a node is to all other nodes in the network. It can be computed as the “sum of the geodesic distances of a node to all other nodes in the network”. A node is important if it is close to all other nodes in the network. One flaw of closeness centrality is that while it is a useful indicator of node importance in small networks, it produces little variation in large networks with many edges. This is particularly evident in our example network:

#Import a library so the code works
from mpl_toolkits.axes_grid1.inset_locator import inset_axes

#Bring in our data
gdf_pos = gpd.GeoDataFrame(df_pos, geometry=gpd.points_from_xy(df_pos['LONGITUDE'], df_pos['LATITUDE']), crs='EPSG:4326')
gdf_pos = gdf_pos.to_crs('EPSG:3857')

#Bounding Box
min_x, min_y = 1858948, 4055442
max_x, max_y = 3336323, 5175704

#Filter our graph using bounding box
gdf_pos_filtered = gdf_pos.cx[min_x:max_x, min_y:max_y]
filtered_nodes = gdf_pos_filtered.index.tolist()

G_sub = G.subgraph(filtered_nodes) 

# Find centrality 
centrality = nx.closeness_centrality(G_sub)
centrality_values = np.array(list(centrality.values()))

#Set a colour for different values of centrality 
color_map=[]
for degree in centrality_values:
    if degree == max(centrality_values):
        color_map.append("indianred")
    else:
        normalized_value = degree / max(centrality_values)
        color_map.append(plt.cm.winter(normalized_value))

cmap = plt.cm.winter

positions_filtered = {}
for loc in gdf_pos_filtered.index:
    x = gdf_pos_filtered.loc[loc].geometry.x
    y = gdf_pos_filtered.loc[loc].geometry.y
    positions_filtered[loc] = (x, y)


fig, ax = plt.subplots(figsize=(9, 9))

gdf_pos_filtered.plot(ax=ax, alpha=0)

ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)

#Draw our edges
edges= nx.draw_networkx_edges(G_sub, pos=positions_filtered, ax=ax, alpha=0.1)
nodes = nx.draw_networkx_nodes(G_sub, pos=positions_filtered, node_color=color_map, cmap=cmap, ax=ax, node_size=60, edgecolors="black", alpha=1)

axins = inset_axes(ax,
                   width="5%", 
                   height="30%", 
                   loc='lower left',
                   bbox_to_anchor=(0.03, 0.03, 1, 1), 
                   bbox_transform=ax.transAxes, 
                   borderpad=0.3)

norm = plt.Normalize(vmin=min(original_degrees), vmax=max(original_degrees))
sm = plt.cm.ScalarMappable(cmap=cmap, norm=norm)
sm.set_array(original_degrees)
plt.colorbar(sm, cax=axins, orientation="vertical")

patch = mpatches.Patch(color='indianred', label='maximum closeness (closeness = 39.093)') 
plt.legend(handles=[patch], loc='center left', bbox_to_anchor=(7, 3))
plt.show()


#Print information about our graph
print("minumim closeness:", min(original_degrees))
print("maximum closeness:", max(original_degrees))
print("difference:", max(original_degrees)-min(original_degrees))

D:\Users\alexr\anaconda3\Lib\site-packages\networkx\drawing\nx_pylab.py:457: UserWarning:

No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored

minumim closeness: 3.0000000000000004
maximum closeness: 39.0
difference: 36.0

5.4 Betweenness Centrality

Betweenness Centrality is a measure of the importance of a node based on how well it serves as a bridge between nodes in a network. The mathematical representation of the betweeness centrality of a node is the number of times each node has to pass through that node to reach every other node in a network. Nodes with high betweenness thus serve as “bridges” within a network.

The Betweenness centrality of a given node \(i\) is calculated as: \(b(i)=\sum_{j,k}\frac{g_{jik}}{g_{jk}}\), where \(g_{jik}\) is the number of paths from node \(j\) to node \(k\) passing through node \(i\), and \(g_{jk}\) is the number of paths from node \(g\) to node \(j\) (including paths passing through node \(i\)).

Note that, for undirected graphs, two adjacent nodes can only have one path between them (ie, between two adjacent nodes \(A\) and \(B\), if \(A\rightarrow B\) is a path, then \(B \rightarrow A\) is not).

Consider the graph below:

#Define our network
G_betweenness_example = nx.Graph()
edges_list = [(0,1),(0,2),(0,3),(0,4),(1,2),(2,3),(3,4),(1,4),(2,4),(1,3),(4,5),(5,6)]
G_betweenness_example.add_edges_from(edges_list)
pos = nx.spring_layout(G_betweenness_example, seed=1000)

#Draw our graph
nx.draw(G_betweenness_example,pos=pos, with_labels=True, edgecolors="black", node_color="bisque", node_size=800)

Node \(4\) serves as a bridge between nodes 5 and 6 to the rest of the nodes in the network. For a path to be drawn between nodes 6 or 5 to nodes 0,1,2,3, the path must go through node 4. Let’s calculate the betweenness centrality of this network, and label nodes by centrality:

centrality

{'Assos': 0.9534883720930233,
 'Apollo_Smintheion': 0.803921568627451,
 'Troia/Ilion': 0.8367346938775511,
 'Athens': 0.9318181818181818,
 'Ephesos': 0.9534883720930233,
 'Amygdalea': 0.6833333333333333,
 'Argos': 0.7735849056603774,
 'Knossos': 0.9318181818181818,
 'Stobi': 0.82,
 'Gortyn': 0.8367346938775511,
 'Kallion': 0.5694444444444444,
 'Kenchreai': 0.7735849056603774,
 'Samos_(Heraion)': 0.9318181818181818,
 'Sykea': 0.5694444444444444,
 'Veloukovo': 0.5189873417721519,
 'Sparta': 0.6833333333333333,
 'Corinth': 0.8723404255319149,
 'Pergamon': 0.5616438356164384,
 'Tenos': 0.6949152542372882,
 'Kastro_Tigani': 0.7321428571428571,
 'Eretria': 0.7735849056603774,
 'Kydonia': 0.5540540540540541,
 'Tanagra': 0.6949152542372882,
 'Thasos': 0.640625,
 'MS116': 0.5540540540540541,
 'Abdera': 0.803921568627451,
 'Aizanoi': 0.6612903225806451,
 'Delos': 0.5466666666666666,
 'Priene': 0.5942028985507246,
 'Olympia': 0.6307692307692307,
 'Iasos': 0.5857142857142857,
 'Pylos': 0.5324675324675324,
 'Labraunda': 0.82,
 'Emporio': 0.6833333333333333,
 'Isthmia': 0.7321428571428571,
 'Kythera': 0.6833333333333333,
 'Nikopolis': 0.6833333333333333,
 'Siphnos': 0.7592592592592593,
 'Amphipolis': 0.6307692307692307,
 'Kommos': 0.6307692307692307,
 'MS010': 0.5256410256410257,
 'Didyma': 0.6212121212121212}

#Define our network
G_betweenness_example = nx.Graph()
edges_list = [(0,1),(0,2),(0,3),(0,4),(1,2),(2,3),(3,4),(1,4),(2,4),(1,3),(4,5),(5,6)]
G_betweenness_example.add_edges_from(edges_list)
pos = nx.spring_layout(G_betweenness_example, seed=1000)

#Find the centrality values for our nodes
centrality = nx.betweenness_centrality(G_betweenness_example, normalized=False)
centrality_values = np.array(list(centrality.values()))
cmap="BuPu"

#Put labels on our network
labels = {}
for node in G_betweenness_example.nodes():
    labels[node] = centrality_values[node]


#Draw our graph using `nx.draw`
nx.draw(G_betweenness_example,pos=pos, node_color=centrality_values, edgecolors="black", cmap=cmap, node_size=800)
nx.draw_networkx_labels(G_betweenness_example, pos, labels=labels, font_color="orangered")

{0: Text(0.498764757665798, -0.06713690471966567, '0.0'),
 1: Text(0.2550713364952553, 0.20785219968276475, '0.0'),
 2: Text(0.47012617248883504, 0.16863130442291982, '0.0'),
 3: Text(0.29322154425308394, -0.14483930534061298, '0.0'),
 4: Text(0.030284391105298025, 0.0025896144060782622, '8.0'),
 5: Text(-0.5474682020082701, -0.05903340272624547, '5.0'),
 6: Text(-1.0, -0.10806350572523814, '0.0')}

We can see that node 4 does indeed have the highest betweenness centrality. The values of 0 for nodes 0, 1, 2, 3 and 6 indicate that each node can reach every other node without passing through those nodes. The value of 5.0 for node 5 indicates that five nodes must pass through node 5 in order to reach another node. In other words, \(\frac{g_{jik}}{g_{jk}}=1\) for 5 pairs of nodes in the network.

Coloring the nodes in our archeology dataset by betweenness centrality, we get:

#Bring in our data
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
gdf_pos = gpd.GeoDataFrame(df_pos, geometry=gpd.points_from_xy(df_pos['LONGITUDE'], df_pos['LATITUDE']), crs='EPSG:4326')
gdf_pos = gdf_pos.to_crs('EPSG:3857')

#Bounding box
min_x, min_y = 1858948, 4055442
max_x, max_y = 3336323, 5175704

#Filter graph using bounding box
gdf_pos_filtered = gdf_pos.cx[min_x:max_x, min_y:max_y] 
filtered_nodes = gdf_pos_filtered.index.tolist()

G_sub = G.subgraph(filtered_nodes) 

#Find centrality values and place them into an array.
centrality = nx.betweenness_centrality(G_sub)
centrality_values = np.array(list(centrality.values()))
n = len(centrality_values)
original_degrees = centrality_values * (n - 1)

#Colour our values based on centrality
color_map=[]
for degree in original_degrees:
    if degree == max(original_degrees):
        color_map.append("indianred")
    else:
        normalized_value = degree / max(original_degrees) 
        color_map.append(plt.cm.winter(normalized_value))

cmap = plt.cm.winter

#Filter our new network
positions_filtered = {}
for loc in gdf_pos_filtered.index:
    x = gdf_pos_filtered.loc[loc].geometry.x
    y = gdf_pos_filtered.loc[loc].geometry.y
    positions_filtered[loc] = (x, y)

fig, ax = plt.subplots(figsize=(9, 9))


gdf_pos_filtered.plot(ax=ax, alpha=0)


ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)

# Create our edges and nodes
edges= nx.draw_networkx_edges(G_sub, pos=positions_filtered, ax=ax, alpha=0.1)
nodes = nx.draw_networkx_nodes(G_sub, pos=positions_filtered, node_color=color_map, cmap=cmap, ax=ax, node_size=60, edgecolors="black", alpha=1)


axins = inset_axes(ax,
                   width="5%", 
                   height="30%",  
                   loc='lower left', 
                   bbox_to_anchor=(0.03, 0.03, 1, 1), 
                   bbox_transform=ax.transAxes,  
                   borderpad=0.3)

norm = plt.Normalize(vmin=min(original_degrees), vmax=max(original_degrees))
sm = plt.cm.ScalarMappable(cmap=cmap, norm=norm)
sm.set_array(original_degrees)
plt.colorbar(sm, cax=axins, orientation="vertical")

#Plot our graph
patch = mpatches.Patch(color='indianred', label='maximum betweenness (betweenness = 2.621)') 
plt.legend(handles=[patch], loc='center left', bbox_to_anchor=(7, 3))
plt.show()

D:\Users\alexr\anaconda3\Lib\site-packages\networkx\drawing\nx_pylab.py:457: UserWarning:

No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored

5.5 Eigenvector centrality

Eigenvector centrality is a measure of the influence of a node in a network by considering not just how many connections it has (as we did with degree centrality), but also the importance of those connections: A node with high eigenvector centrality is connected to many nodes that themselves have high centrality, making it more influential in spreading information or resources. Unlike simpler measures like degree centrality, which only counts connections, eigenvector centrality looks at the overall structure of the network. It helps identify key players in a network who might not have the most connections but are well-connected to other important nodes.

The term “eigenvector” in comes from linear algebra: an eigenvector is a special kind of vector taken from a network’s adjacency matrix. Each element in this eigenvector represents a node’s centrality in the network.

#Bring in data
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
gdf_pos = gpd.GeoDataFrame(df_pos, geometry=gpd.points_from_xy(df_pos['LONGITUDE'], df_pos['LATITUDE']), crs='EPSG:4326')
gdf_pos = gdf_pos.to_crs('EPSG:3857')

#Bounding Box
min_x, min_y = 1858948, 4055442
max_x, max_y = 3336323, 5175704

#Filter using bounding box
gdf_pos_filtered = gdf_pos.cx[min_x:max_x, min_y:max_y]
filtered_nodes = gdf_pos_filtered.index.tolist()

G_sub = G.subgraph(filtered_nodes) 

#Find the eigenvector centrality in our graph
centrality = nx.eigenvector_centrality(G_sub)
centrality_values = np.array(list(centrality.values()))

# Colour our nodes
color_map=[]
for degree in centrality_values:
    if degree == max(centrality_values):
        color_map.append("indianred")
    else:
        normalized_value = degree / max(centrality_values) 
        color_map.append(plt.cm.winter(normalized_value))

cmap = plt.cm.winter

#Filter the graph
positions_filtered = {}
for loc in gdf_pos_filtered.index:
    x = gdf_pos_filtered.loc[loc].geometry.x
    y = gdf_pos_filtered.loc[loc].geometry.y
    positions_filtered[loc] = (x, y)


fig, ax = plt.subplots(figsize=(9, 9))

gdf_pos_filtered.plot(ax=ax, alpha=0)


ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)

#Draw our edges and nodes
edges= nx.draw_networkx_edges(G_sub, pos=positions_filtered, ax=ax, alpha=0.1)
nodes = nx.draw_networkx_nodes(G_sub, pos=positions_filtered, node_color=color_map, cmap=cmap, ax=ax, node_size=60, edgecolors="black", alpha=1)


axins = inset_axes(ax,
                   width="5%",  
                   height="30%",  
                   loc='lower left', 
                   bbox_to_anchor=(0.03, 0.03, 1, 1),  
                   bbox_transform=ax.transAxes, 
                   borderpad=0.3)

norm = plt.Normalize(vmin=min(original_degrees), vmax=max(original_degrees))
sm = plt.cm.ScalarMappable(cmap=cmap, norm=norm)
sm.set_array(original_degrees)
plt.colorbar(sm, cax=axins, orientation="vertical")

#Plot our graph 
patch = mpatches.Patch(color='indianred', label='maximum betweenness (betweenness = 2.621)') 
plt.legend(handles=[patch], loc='center left', bbox_to_anchor=(7, 3))
plt.show()

D:\Users\alexr\anaconda3\Lib\site-packages\networkx\drawing\nx_pylab.py:457: UserWarning:

No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored

5.6 Directed graphs and Putting it all together

Lastly, let’s clean up our centrality graphs by putting them together into one graph. We’ll also use a directed version of our graph, to see how that changes the results.

#Bring in our data
df_edges = pd.read_excel("network_analysis_datasets/vistorian_network.xls")
df_pos = pd.read_excel("network_analysis_datasets/vistorian_locations.xls", index_col=0)

#Define the graph
G_directed = nx.from_pandas_edgelist(df_edges, source="SOURCE_LOCATION", target="TARGET_LOCATION", create_using=nx.DiGraph())
gdf_pos = gpd.GeoDataFrame(df_pos, geometry=gpd.points_from_xy(df_pos['LONGITUDE'], df_pos['LATITUDE']), crs='EPSG:4326')
gdf_pos = gdf_pos.to_crs('EPSG:3857')

#Create a bounding box and filter the graph using it
min_x, min_y = 1858948, 4055442
max_x, max_y = 3336323, 5175704
gdf_pos_filtered = gdf_pos.cx[min_x:max_x, min_y:max_y]
filtered_nodes = gdf_pos_filtered.index.tolist()

#Plot the now directed points
G_filtered = G_directed.subgraph(filtered_nodes)
positions_filtered = {loc: (gdf_pos_filtered.loc[loc].geometry.x, gdf_pos_filtered.loc[loc].geometry.y) for loc in gdf_pos_filtered.index}

fig, axes = plt.subplots(2, 2, figsize=(12, 10))
fig.set_facecolor('lightblue')
for ax in axes.flatten():
    gdf_pos_filtered.plot(ax=ax, alpha=0)
    ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)

#The codes below are from the above sections (more notes can be found there on what each code block does)

# Plotting Degree Centrality
degree_centrality = nx.degree_centrality(G_filtered)
degree_values = np.array(list(degree_centrality.values()))
n = len(degree_values)
original_degrees = degree_values * (n - 1)
color_map=[]

for degree in original_degrees:
    if degree == max(original_degrees):
        color_map.append("indianred")
    else:
        normalized_degree = degree / max(original_degrees)  
        color_map.append(plt.cm.winter(normalized_degree))

nx.draw_networkx_edges(G_filtered, pos=positions_filtered, ax=axes[0, 0], alpha=0.1)
nx.draw_networkx_nodes(G_filtered, 
                       pos=positions_filtered, 
                       node_color=color_map, 
                       cmap=plt.cm.winter, 
                       ax=axes[0, 0], 
                       node_size=60, 
                       edgecolors="black", 
                       alpha=1)

axes[0, 0].set_title("Degree Centrality")
axes[0, 0].set_facecolor("white")

# Add colorbar for Degree Centrality
axins = inset_axes(axes[0, 0], width="5%", height="30%", loc='lower left', 
                   bbox_to_anchor=(0.03, 0.03, 1, 1), 
                   bbox_transform=axes[0, 0].transAxes, 
                   borderpad=0.3)
sm = plt.cm.ScalarMappable(cmap=plt.cm.winter, norm=plt.Normalize(vmin=min(original_degrees), vmax=max(original_degrees )))
sm.set_array(original_degrees)
plt.colorbar(sm, cax=axins, orientation="vertical")
patch = mpatches.Patch(color='indianred', label=f'max degree ({max(original_degrees):.3f})')
axes[0, 0].legend(handles=[patch], loc='upper left')

# Plot Closeness Centrality
closeness_centrality = nx.closeness_centrality(G_filtered)
closeness_values = np.array(list(closeness_centrality.values()))

color_map_closeness=[]
for degree in closeness_values:
    if degree == max(closeness_values):
        color_map_closeness.append("indianred")
    else:
        normalized_degree_closeness = degree / max(closeness_values) 
        color_map_closeness.append(plt.cm.winter(normalized_degree_closeness))

nx.draw_networkx_edges(G_filtered, pos=positions_filtered, ax=axes[1, 0], alpha=0.1)
nx.draw_networkx_nodes(G_filtered, pos=positions_filtered, node_color=color_map_closeness, cmap=plt.cm.winter, ax=axes[1, 0], node_size=60, edgecolors="black", alpha=1)
axes[1, 0].set_title("Closeness Centrality")
axes[1, 0].set_facecolor("white")

# Add colorbar for Closeness Centrality
axins = inset_axes(axes[1, 0], width="5%", height="30%", loc='lower left', 
                   bbox_to_anchor=(0.03, 0.03, 1, 1), 
                   bbox_transform=axes[1, 0].transAxes, 
                   borderpad=0.3)
sm = plt.cm.ScalarMappable(cmap=plt.cm.winter, norm=plt.Normalize(vmin=min(closeness_values), vmax=max(closeness_values)))
sm.set_array(closeness_values)
plt.colorbar(sm, cax=axins, orientation="vertical")
patch = mpatches.Patch(color='indianred', label=f'max closeness ({max(closeness_values):.3f})')
axes[1, 0].legend(handles=[patch], loc='upper left')

# Plot Betweenness Centrality
betweenness_centrality = nx.betweenness_centrality(G_filtered)
betweenness_values = np.array(list(betweenness_centrality.values()))
color_map_betweenness=[]
for degree in betweenness_values:
    if degree == max(betweenness_values):
        color_map_betweenness.append("indianred")
    else:
        normalized_degree_betweenness = degree / max(betweenness_values)  # Normalize the degree to the range [0, 1]
        color_map_betweenness.append(plt.cm.winter(normalized_degree_betweenness))
nx.draw_networkx_edges(G_filtered, pos=positions_filtered, ax=axes[0, 1], alpha=0.1)
nx.draw_networkx_nodes(G_filtered, pos=positions_filtered, node_color=color_map_betweenness, cmap=plt.cm.winter, ax=axes[0, 1], node_size=60, edgecolors="black", alpha=1)
axes[0, 1].set_title("Betweenness Centrality")
axes[0, 1].set_facecolor("white")

# Add colorbar for Betweenness Centrality
axins = inset_axes(axes[0, 1], width="5%", height="30%", loc='lower left', bbox_to_anchor=(0.03, 0.03, 1, 1), bbox_transform=axes[0, 1].transAxes, borderpad=0.3)
sm = plt.cm.ScalarMappable(cmap=plt.cm.winter, norm=plt.Normalize(vmin=min(betweenness_values), vmax=max(betweenness_values)))
sm.set_array(betweenness_values)
plt.colorbar(sm, cax=axins, orientation="vertical")
patch = mpatches.Patch(color='indianred', label=f'max betweenness ({max(betweenness_values):.3f})')
axes[0, 1].legend(handles=[patch], loc='upper left')

# Plot Eigenvector Centrality
eigenvector_centrality = nx.eigenvector_centrality(G_filtered, max_iter=1000)
eigenvector_values = np.array(list(eigenvector_centrality.values()))

color_map_eigenvector=[]
for degree in eigenvector_values:
    if degree == max(eigenvector_values):
        color_map_eigenvector.append("indianred")
    else:
        normalized_degree_eigenvector = degree / max(eigenvector_values)  # Normalize the degree to the range [0, 1]
        color_map_eigenvector.append(plt.cm.winter(normalized_degree_eigenvector))
nx.draw_networkx_edges(G_filtered, pos=positions_filtered, ax=axes[1, 1], alpha=0.1)
nx.draw_networkx_nodes(G_filtered, pos=positions_filtered, node_color=color_map_eigenvector, cmap=plt.cm.winter, ax=axes[1, 1], node_size=60, edgecolors="black", alpha=1)
axes[1, 1].set_title("Eigenvector Centrality")
axes[1, 1].set_facecolor("white")

# Add colorbar for Eigenvector Centrality
axins = inset_axes(axes[1, 1], width="5%", height="30%", loc='lower left', bbox_to_anchor=(0.03, 0.03, 1, 1), bbox_transform=axes[1, 1].transAxes, borderpad=0.3)
sm = plt.cm.ScalarMappable(cmap=plt.cm.winter, norm=plt.Normalize(vmin=min(eigenvector_values), vmax=max(eigenvector_values)))
sm.set_array(eigenvector_values)
plt.colorbar(sm, cax=axins, orientation="vertical")
patch = mpatches.Patch(color='indianred', label=f'max eigenvector centrality ({max(eigenvector_values):.3f})')
axes[0, 1].legend(handles=[patch], loc='upper left')

D:\Users\alexr\anaconda3\Lib\site-packages\networkx\drawing\nx_pylab.py:457: UserWarning:

No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored

6. Group-level analysis: finding subgroups within networks

Cohesive subgroups in network analysis refer to clusters of nodes within a network that are more densely connected to each other than to the rest of the network. These subgroups indicate areas of high interaction or strong relationships within the larger network. Identifying cohesive subgroups helps in understanding the structure and dynamics of the network, such as how information or influence flows within and between these groups. The process of finding cohesive subgroups within networks is called cohesive group analysis.

A clique is a subset of nodes within a graph where every node is directly connected to every other node in the subset. This means that in a clique, all possible edges between the nodes are present, making it a maximally connected subgraph.
- All nodes are that are by themselves are inherently a clique (a 1-clique)
In undirected graphs, a \(k\)-core is a subgraph in which every node is connected to at least \(k\) other nodes within the subgraph. The \(k\)-core is obtained by iteratively removing all nodes with a degree less than \(k\) until no more nodes can be removed.

#Create our network
G_cliques_example = nx.Graph()
edges_list = [(0,1),(0,2),(0,3),(0,4),(1,2),(2,3),(3,4),(1,4),(2,4),(1,3),(4,5),(5,6)]
G_cliques_example.add_edges_from(edges_list)
pos = nx.spring_layout(G_cliques_example, seed=1000)

#Find our cliques and print where they are
cliques = [x for x in nx.find_cliques(G_cliques_example)]
print(cliques)

#Draw our graph      
nx.draw(G_cliques_example,pos=pos, with_labels=True, edgecolors="black", node_color = "bisque", cmap=cmap, node_size=800)

[[4, 0, 1, 2, 3], [4, 5], [6, 5]]

We can see that there are three cliques in this network: \((4, 0, 1, 2, 3)\), \((4, 5)\), and \((6, 5)\):

#Set the figure size (24,6) not (8,6) because we have 3 graphs to show
fig, axes = plt.subplots(1, 3, figsize=(24, 6))
fig.set_facecolor('lightblue')

#Create our network
G_cliques_example = nx.Graph()
edges_list = [(0,1),(0,2),(0,3),(0,4),(1,2),(2,3),(3,4),(1,4),(2,4),(1,3),(4,5),(5,6)]
G_cliques_example.add_edges_from(edges_list)
pos = nx.spring_layout(G_cliques_example, seed=1000)

#Find our cliques and print them
cliques = [x for x in nx.find_cliques(G_cliques_example)]
print(cliques)

#Colour our cliques
clique_1 = ["mediumseagreen", "mediumseagreen", "mediumseagreen", "mediumseagreen", "mediumseagreen", "bisque", "bisque"]
clique_2 = ["bisque", "bisque", "bisque",  "bisque", "mediumseagreen", "mediumseagreen", "bisque"]
clique_3 = ["bisque", "bisque", "bisque", "bisque",  "bisque", "mediumseagreen", "mediumseagreen"]

#Draw our cliques       
nx.draw(G_cliques_example, ax=axes[0], pos=pos, with_labels=True, edgecolors="black", node_color = clique_1, cmap=cmap, node_size=800)
nx.draw(G_cliques_example, ax=axes[1], pos=pos, with_labels=True, edgecolors="black", node_color = clique_2, cmap=cmap, node_size=800)
nx.draw(G_cliques_example, ax=axes[2], pos=pos, with_labels=True, edgecolors="black", node_color = clique_3, cmap=cmap, node_size=800)

[[4, 0, 1, 2, 3], [4, 5], [6, 5]]

#Create our example graph
G_cliques_example = nx.Graph()
edges_list = [(0,1),(0,2),(0,3),(0,4),(1,2),(2,3),(3,4),(1,4),(2,4),(1,3),(4,5),(5,6)]
G_cliques_example.add_edges_from(edges_list)
pos = nx.spring_layout(G_cliques_example, seed=1000)

#Find the cliques
cliques = [x for x in nx.find_cliques(G_cliques_example)]
print(cliques)

#For loop
node_counts = {}
for clique in cliques: #for each clique in the list of cliques...
    for node in clique: # for each node in each clique...
        if node in node_counts: #checks whether the current node already exists as a key in the node_counts dictionary
            node_counts[node] += 1 #if it is in the dictionary, increase it's value by 1
        else:
            node_counts[node] = 1 #if it isn't, dont change

#Colour our nodes
colors = []
for node in G_cliques_example.nodes():
    if node_counts[node] == 1:
        colors.append("lightgreen")
    elif node_counts[node] == 2:
        colors.append("forestgreen")
    elif node_counts[node] == 3:
        colors.append("orange")
    else:
        colors.append("red")

#Draw our network           
nx.draw(G_cliques_example,pos=pos, with_labels=True, edgecolors="black", node_color = colors, cmap=cmap, node_size=800)
patch_green = mpatches.Patch(color='lightgreen', label='node in one clique') 
patch_forest = mpatches.Patch(color='forestgreen', label='node in two cliques') 
plt.legend(handles=[patch_green, patch_forest])
plt.show()

[[4, 0, 1, 2, 3], [4, 5], [6, 5]]

Let’s repeat the same process as the example above on our archeology dataset:

G_sub_cliques = G.subgraph(filtered_nodes) 

#Set the plot size
fig, ax = plt.subplots(figsize=(9, 9))
gdf_pos_filtered.plot(ax=ax, alpha=0)
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)

#Find our cliques
cliques = [x for x in nx.find_cliques(G_sub_cliques)]
print(cliques)

node_counts = {} #empty dictionary of key-value pairs where keys are nodes and values are the frequency that they show up in a clique
for node in G_sub_cliques.nodes():
    node_counts[node] = 0 # makes dictionary values start at 0 

for clique in cliques: #for each clique in the list of cliques...
    for node in clique: # for each node in each clique...
        node_counts[node] += 1 #if it is in the dictionary, increase it's value by 1

#Set the colours  for different cliques
colors = []
max_values = max(node_counts.values()) 
min_values = min(node_counts.values())
for node in G_sub_cliques.nodes():
    count = node_counts.get(node, 0)
    if count == max_values:
        colors.append("salmon")
    elif count == min_values:
        colors.append("forestgreen")
    else:
        normalized_value = count / max_values
        colors.append(plt.cm.PuBu(normalized_value))

#Draw our network
nx.draw_networkx_edges(G_sub_cliques, pos=positions_filtered, ax=ax, alpha=0.1)
nx.draw_networkx_nodes(G_sub_cliques, pos=positions_filtered, ax=ax, node_size=60, node_color=colors, edgecolors="black", alpha=0.8)

#Make a legend in the corner of the graph with labels
patch_forestgreen = mpatches.Patch(color='forestgreen', label=f"Minimum number of cliques: {min_values}") 
patch_salmon = mpatches.Patch(color='salmon', label=f"Maximum number of cliques: {max_values}") 
plt.legend(handles=[patch_salmon, patch_forestgreen])

axins = inset_axes(ax,
                   width="5%",  
                   height="30%",  
                   loc='lower left',  
                   bbox_to_anchor=(0.03, 0.03, 1, 1), 
                   bbox_transform=ax.transAxes,  
                   borderpad=0.3)

node_counts_values = np.array(list(node_counts.values()))
norm = plt.Normalize(vmin=min_values, vmax=max_values)
sm = plt.cm.ScalarMappable(cmap="PuBu", norm=norm)
sm.set_array(node_counts_values)
plt.colorbar(sm, cax=axins, orientation="vertical")

#Plot our graph
plt.show()

[['MS010', 'Knossos', 'Athens', 'Abdera', 'Apollo_Smintheion'], ['Delos', 'Samos_(Heraion)', 'Troia/Ilion', 'Ephesos', 'Aizanoi', 'Corinth', 'Athens', 'Abdera'], ['Assos', 'Veloukovo', 'Athens', 'Apollo_Smintheion'], ['Assos', 'Ephesos', 'Pylos', 'Knossos', 'Corinth', 'Athens'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Knossos', 'Kydonia', 'Troia/Ilion', 'Corinth', 'Argos', 'Eretria'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Knossos', 'Athens', 'Stobi', 'Labraunda', 'Corinth', 'Priene', 'Aizanoi', 'Kenchreai', 'Troia/Ilion'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Knossos', 'Athens', 'Stobi', 'Labraunda', 'Corinth', 'Priene', 'Aizanoi', 'Kenchreai', 'Iasos'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Knossos', 'Athens', 'Stobi', 'Labraunda', 'Corinth', 'Gortyn', 'Didyma', 'Kenchreai', 'Kastro_Tigani', 'Troia/Ilion', 'Tenos', 'Aizanoi', 'Apollo_Smintheion', 'Isthmia'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Knossos', 'Athens', 'Stobi', 'Labraunda', 'Corinth', 'Gortyn', 'Abdera', 'Siphnos', 'Eretria', 'Olympia', 'Amphipolis', 'Tenos', 'Kenchreai', 'Thasos', 'Kommos'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Knossos', 'Athens', 'Stobi', 'Labraunda', 'Corinth', 'Gortyn', 'Abdera', 'Siphnos', 'Eretria', 'Kastro_Tigani', 'Troia/Ilion', 'Apollo_Smintheion', 'Isthmia', 'Argos', 'Kenchreai'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Knossos', 'Athens', 'Stobi', 'Labraunda', 'Corinth', 'Gortyn', 'Abdera', 'Siphnos', 'Eretria', 'Kastro_Tigani', 'Troia/Ilion', 'Apollo_Smintheion', 'Isthmia', 'Argos', 'Nikopolis', 'Sparta', 'Amygdalea', 'Kythera', 'Emporio', 'Tanagra'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Knossos', 'Athens', 'Stobi', 'Labraunda', 'Corinth', 'Gortyn', 'Abdera', 'Siphnos', 'Eretria', 'Kastro_Tigani', 'Troia/Ilion', 'Apollo_Smintheion', 'Isthmia', 'Tenos', 'Kenchreai'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Knossos', 'Athens', 'Stobi', 'Labraunda', 'Corinth', 'Gortyn', 'Abdera', 'Siphnos', 'Eretria', 'Thasos', 'Tanagra'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Knossos', 'Athens', 'Stobi', 'Labraunda', 'Corinth', 'Gortyn', 'Abdera', 'Aizanoi', 'Kenchreai', 'Kastro_Tigani', 'Troia/Ilion', 'Isthmia', 'Apollo_Smintheion', 'Tenos'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Knossos', 'Athens', 'Stobi', 'Labraunda', 'Corinth', 'Iasos', 'Kenchreai', 'Argos'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Knossos', 'Athens', 'MS116', 'Troia/Ilion', 'Argos', 'Kenchreai'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Knossos', 'Sykea', 'Kenchreai', 'Troia/Ilion', 'Argos', 'Apollo_Smintheion', 'Gortyn', 'Kallion'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Pergamon', 'Troia/Ilion', 'Athens', 'Argos', 'Apollo_Smintheion', 'Gortyn'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Pergamon', 'Troia/Ilion', 'Athens', 'Priene']]

7. Network-level analysis: Clusters and clustering coefficients

A cluster (also known as a community) is a set of nodes in a graph that are densely connected to each other but sparsely connected to nodes in other clusters. For example, in a social network, a cluster might represent a group of people who frequently interact with each other but have fewer interactions with people outside the group. Community detection is the process of finding such communities within nodes.

Before diving into community detection, we first need to understand modularity. Modulaity is a numerical measure for the community structure of a graph: it compares the density of edges within the communities of a network to the density of edges between communities. Mathematically, modularity compares the observed density of edges within a community to the expected density of edges if the connections were random. A positive modularity value suggests a strong community structure, while values closer to zero or negative indicate that the divisions are no better than random.

7.1 Louvain Algorithm

The Louvain algorithm is a community detection method in networks that aims to optimize modularity. By optimizing modularity, the Louvain algorithm effectively uncovers natural divisions in the network where connections are dense within clusters and sparse between them, thus identifying meaningful community structures.

First, each node is assigned to its own community, and nodes are then iteratively moved to neighboring communities if it increases the modularity. In the second phase, the algorithm creates a new network where each community from the first phase is treated as a single node, and the process is repeated. This hierarchical approach continues until no further modularity improvements can be made, resulting in a final set of communities that maximize modularity.

G_sub_louvain = G.subgraph(filtered_nodes) 

#Set our figure size
fig, ax = plt.subplots(figsize=(9, 9))
gdf_pos_filtered.plot(ax=ax, alpha=0)
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)

partition = community_louvain.best_partition(G_sub_louvain) #applying louvain

#Colour map of possible colors with values associated with it
color_map = {
    0: "coral",
    1: "cornflowerblue",
    2: "limegreen",
    3: "crimson",
    4: "purple"}

#Set node colors
node_colors = []
for node in G_sub_louvain.nodes():
    cluster_id = partition[node]  # Get the cluster ID for the node
    color = color_map[cluster_id]  # Get the corresponding color from the color_map
    node_colors.append(color)

#Draw our network
nx.draw_networkx_edges(G_sub_louvain, pos=positions_filtered, ax=ax, alpha=0.1)
nx.draw_networkx_nodes(G_sub_louvain, pos=positions_filtered, 
                       ax=ax, node_size=60, node_color=node_colors,edgecolors="black", alpha=0.8)
print(partition)

#Make a legend in the corner with color values and show our graph 
patch_coral = mpatches.Patch(color='coral', label="Community 0") 
patch_cornflower = mpatches.Patch(color='cornflowerblue', label="Community 1") 
patch_limegreen = mpatches.Patch(color='limegreen', label="Community 2") 
patch_crimson = mpatches.Patch(color='crimson', label="Community 4") 
plt.legend(handles=[patch_coral, patch_cornflower, patch_limegreen, patch_crimson])
plt.show()

{'Assos': 0, 'Apollo_Smintheion': 1, 'Troia/Ilion': 0, 'Athens': 0, 'Ephesos': 0, 'Amygdalea': 1, 'Argos': 1, 'Knossos': 0, 'Stobi': 2, 'Gortyn': 2, 'Kallion': 0, 'Kenchreai': 2, 'Samos_(Heraion)': 0, 'Sykea': 0, 'Veloukovo': 0, 'Sparta': 1, 'Corinth': 0, 'Pergamon': 0, 'Tenos': 2, 'Kastro_Tigani': 1, 'Eretria': 1, 'Kydonia': 0, 'Tanagra': 1, 'Thasos': 2, 'MS116': 0, 'Abdera': 1, 'Aizanoi': 0, 'Delos': 0, 'Priene': 0, 'Olympia': 2, 'Iasos': 0, 'Pylos': 0, 'Labraunda': 2, 'Emporio': 1, 'Isthmia': 1, 'Kythera': 1, 'Nikopolis': 1, 'Siphnos': 1, 'Amphipolis': 2, 'Kommos': 2, 'MS010': 0, 'Didyma': 0}

clusters = {}
# Iterating through the partition dictionary
for node, cluster in partition.items():
    if cluster not in clusters:     # If the cluster is not already in the dictionary, add it
        clusters[cluster] = []
    clusters[cluster].append(node)    
print("community_0:", clusters[0]) #Prints the names of nodes in each community 0,1,2 
print("community_1:", clusters[1])
print("community_2:", clusters[2])

community_0: ['Assos', 'Troia/Ilion', 'Athens', 'Ephesos', 'Knossos', 'Kallion', 'Samos_(Heraion)', 'Sykea', 'Veloukovo', 'Corinth', 'Pergamon', 'Kydonia', 'MS116', 'Aizanoi', 'Delos', 'Priene', 'Iasos', 'Pylos', 'MS010', 'Didyma']
community_1: ['Apollo_Smintheion', 'Amygdalea', 'Argos', 'Sparta', 'Kastro_Tigani', 'Eretria', 'Tanagra', 'Abdera', 'Emporio', 'Isthmia', 'Kythera', 'Nikopolis', 'Siphnos']
community_2: ['Stobi', 'Gortyn', 'Kenchreai', 'Tenos', 'Thasos', 'Olympia', 'Labraunda', 'Amphipolis', 'Kommos']

7.2 Clustering coefficients

The last thing we’ll cover is clustering coefficients. The clustering coefficient of a graph is the measure of how closely nodes in a graph tend to cluster together. It gives an indication of the degree to which nodes in a graph tend to form clusters. Mathematically, the clustering coefficient is the fraction of a node’s neighbors (nodes connected to that node via an edge), that are also neighbors with each other.

There are two primary types of clustering coefficients: local coefficients and global coefficients. A local coefficient is a measure of how many of the node’s neighbors are also neighbors of each other, relative to the total number of possible connections between those neighbors. Mathematically, it is expressed as: \[C(v)=\frac{2 \cdot\; T(v)}{k_{v}×(k_{v}−1)}\]

Where v is a node in a graph, \(T(v)\) is the actual number of edges between the neighbors of v, and \(k_{v}×(k_{v}−1)\) is the maximum total number of possible connections between the neighbors of v.

The global coefficient is the average of the local coefficients of all the nodes in a graph. Both coefficients range between 0 and 1.

We can calculate the global and local coefficients for our network:

clustering_coefficients = nx.clustering(G_sub_louvain)
print(clustering_coefficients)

{'Assos': 0.5641025641025641, 'Apollo_Smintheion': 0.6924731182795699, 'Troia/Ilion': 0.6590909090909091, 'Athens': 0.5732574679943101, 'Ephesos': 0.5695006747638327, 'Amygdalea': 1.0, 'Argos': 0.7241379310344828, 'Knossos': 0.5846372688477952, 'Stobi': 0.7520161290322581, 'Gortyn': 0.7121212121212122, 'Kallion': 1.0, 'Kenchreai': 0.7019704433497537, 'Samos_(Heraion)': 0.5945945945945946, 'Sykea': 1.0, 'Veloukovo': 1.0, 'Sparta': 1.0, 'Corinth': 0.6554621848739496, 'Pergamon': 0.9166666666666666, 'Tenos': 0.8853754940711462, 'Kastro_Tigani': 0.9015384615384615, 'Eretria': 0.8103448275862069, 'Kydonia': 1.0, 'Tanagra': 0.9604743083003953, 'Thasos': 0.9673202614379085, 'MS116': 1.0, 'Abdera': 0.7440860215053764, 'Aizanoi': 0.8473684210526315, 'Delos': 1.0, 'Priene': 0.8974358974358975, 'Olympia': 1.0, 'Iasos': 0.9696969696969697, 'Pylos': 1.0, 'Labraunda': 0.7520161290322581, 'Emporio': 1.0, 'Isthmia': 0.9015384615384615, 'Kythera': 1.0, 'Nikopolis': 1.0, 'Siphnos': 0.8518518518518519, 'Amphipolis': 1.0, 'Kommos': 1.0, 'MS010': 1.0, 'Didyma': 1.0}

The code above gives the clustering coefficients for every node in the graph. If we want the global coefficient, we can use:

average_clustering_coefficient = nx.average_clustering(G_sub_louvain)
print(average_clustering_coefficient)

0.8616447207095111

Our high clustering coefficient indicates that the nodes in this network tend to create tightly knit groups with a high density of connections.

8. Citations

Al-Taie, M. Z., & Kadry, S. (2017). Python for graph and network analysis. Springer. https://doi.org/10.1007/978-3-319-53004-8

Bes, P. (2015). Once upon a time in the east: The chronological and geographical distribution of terra sigillata and red slip ware in the Roman East (Vol. 6). Archaeopress. (Roman and Late Antique Mediterranean Pottery)

Brughmans, T., & Bach, B. (2018). The Vistorian: Exploring archaeological networks. https://archaeologicalnetworks.wordpress.com/resources/#vistorian

Brughmans, T., & Peeples, M. A. (2023). Network science in archaeology. Cambridge University Press. https://doi.org/10.1017/9781009170659

Mills, B. J., Clark, J. J., Peeples, M. A., Haas, W. R., Roberts, J. M., Hill, J. B., Huntley, D. L., Borck, L., Breiger, R. L., Clauset, A., & Shackley, M. S. (2013). Transformation of social networks in the late pre-Hispanic US Southwest. Proceedings of the National Academy of Sciences, 110(15), 5785–5790. https://doi.org/10.1073/pnas.1219966110

QuantEcon DataScience. (n.d.). Social and economic networks. https://datascience.quantecon.org/applications/networks.html

Sayama, H. (2024, April 30). Generating random graphs. In Introduction to the modeling and analysis of complex systems. Mathematics LibreTexts. https://math.libretexts.org/Bookshelves/Scientific_Computing_Simulations_and_Modeling/ Introduction_to_the_Modeling_and_Analysis_of_Complex_Systems_(Sayama)/15%3A_Basics_of_Networks/15.06%3A_Generating_Random_Graphs