import matplotlib.pyplot as plt #allows us to call the matplotlib.pyplot library as 'plt'
import matplotlib.patches as mpatches #imports mpatches matplotlib subpackage
import networkx as nx #allows us to call the networkx library as 'nx'
import pandas as pd #allows us to call the pandas library as 'pd'
import numpy as np
import geopandas as gpd
import contextily as ctx
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
import community as community_louvain
import randomNetworks and NetworkX
This notebook introduces key concepts in network analysis pertaining to archeology and provides a hands-on tutorial to using the NetworkX Python library.
0. Prerequisites
- Read through and follow the instructions in the pre-reading.
- Have Python and Jupyter set up on your device.
- Have created the
networksenvironment.
- If you have not done this, see the pre-reading.
1. What is Network analysis?
Network Analysis is a set of techniques used to study the structure and dynamics of networks. Networks are collections of objects/locations/entities (called nodes) connected by relationships (called edges). Network analysis has applications in many fields, including sociology, biology, economics, computer science, and more.
Network science in archeology is traditional network science used in an archeological context. According to Brighmans and Peeples (2023), archeological network science falls within the discipline of archeology and is not a discipline of its own: “[N]etwork science applied to archaeological research is a subset of archaeological research: it does not happen in isolation, it is not immune to the limitations of archaeological data nor does it replace archaeological theory.”
1.1 Key terms
Node: A node is a representation of an individual entity or actor in a network. In different contexts, nodes can be people, organizations, cities, or any other unit of analysis.
Edge: An edge represents the relationship or connection between two nodes. Edges can be directed (having a specific direction from one node to another) or undirected (no direction, implying a mutual relationship).
Degree: The degree of a node is the number of edges connected to it. In directed networks, this can be further divided into in-degree (number of incoming edges) and out-degree (number of outgoing edges).
The network above is an example of a Undirected graph, a graph with no direction. This means that if there is a connection between node A and node B, it is bidirectional - A is connected to B, and B is connected to A.
The example to the left is a directed graph: the edges between nodes have a specific direction. This means that if there is an edge from node A to node B, it does not imply there is an edge from B to A unless explicitly stated.
Density: Density is a measure that indicates how closely connected the nodes in a network are. Specifically, it refers to the ratio of the number of actual edges in the network to the maximum possible number of edges between nodes.
Centrality: Centrality measures the importance, influence, or prominence of nodes (entities) within a network. The centrality of a node tells us how “important” a node is to the aggregate network. There are many different kinds of centrality, but the four most well-known ones are degree, betweenness, closeness, and eigenvector centrality. This notebook will primarily focus on the first three.
1.2 NetworkX
NetworkX is a Python library that is used for the creation, manipulation, and visualization of complex networks. It provides tools to work with both undirected and directed networks, perform network-related calculations, and visualize the results.
A library in Python is a collection of code that makes everyday tasks more efficient. In this case working with networks becomes much simpler when using network X.
If you want to read the NetworkX documentation you can follow the NetworkX documentation link. This link shows what kind of commands exist within the NetworkX library.
1.2.1 Importing NetworkX
We can import NetworkX using the import command. At the same time, we’ll also import the matplotlib.pyplot library, for plotting graphs. Additionally, we’ll import pandas for basic data wrangling, and numpy for math. The as command allows us to use networkx commands without needing to type out networkx each time. Additionally, we’ll import the community_louvain package for the louvain clustering algorithm. Along with some other libraries for our code to function.
#Let's us open the datasets in Google Collab easily
import os
os.system("wget https://github.com/ubcecon/ai-workshop/blob/ccss-networks.zip")
os.system("ccss-networks")
csv_path = ("https://github.com/ubcecon/ai-workshop/blob/ccss-networks/data")1.2.2 Creating simple networks using NetworkX
We’ll start by creating a simple graph:
Below in the code we choose our nodes and edges between them.
G = nx.Graph() #creates an empty network graph
nodes = (1, 2, 3, 4, 5, 6) #our nodes, labeled 1,2,3,4,5,6.
edges = [(1, 2), (2, 3), (3, 1), (1,5), (3,5), (4, 5), (4, 6), (6, 1), (6, 3), (6,4), (4, 3), (5, 5), (3, 5)]
#the connections between our nodes are stored in an array, containing pairs of numbers called tuples.
G.add_edges_from(edges) #the `add_edges_from()` command adds edges to the network
G.add_nodes_from(nodes) #the `add_nodes_from()` command adds nodes to the network
nx.draw(G, with_labels = True) #renders the graph in the notebook
#the `with_labels = True` argument specifies that we want labels on the nodes.Let’s create a directed graph using nx.DiGraph(). We’ll also set our node positions using a seed: this will ensure that each time the nodes are rendered they hold the same position on the graph. You can set the seed to any number.
G = nx.DiGraph() #creates an empty directed graph object
nodes = (1, 2, 3, 4, 5, 6) #our nodes
edges = [(1, 2), (2, 3), (3, 1), (1,5), (3,5), (4, 5), (4, 6), (6, 1), (6, 3), (6,4), (4, 3), (5, 5), (3, 5)] #our tuples stored in an array which represent our nodes
G.add_edges_from(edges) #connects edges to nodes
G.add_nodes_from(nodes) #connects edges to nodes
position = nx.spring_layout(G, seed=100)
#nx.draw plots our network
nx.draw(G, pos = position, with_labels = True) # `pos` argument assigns a position to each node1.3 Creating Random Graphs
Instead of creating a graph with predetermined positions of nodes and edges we can also generate a random graph with a set amount of nodes and edges. Below you can change the amount of nodes and edges by changing n and d which correspond to the number of nodes and the degree (number of edges) that each node has. Creating a random graph could be more helpful for testing or when you want to try something and don’t wish to spend time plotting a real network and determining paths for all edges and nodes.
The first most basic command we will use is the nx.random_regular_graph command. Which generates a random regular graph.
# Set a seed for reproducibility so that everytime the code runs we get the same random graph
random.seed(42)
# Parameters
n = 20 # number of nodes
d = 3 # degree of each node
# Generate the random regular graph
rr_graph = nx.random_regular_graph(d, n)
# Visualize the graph, you can change the size, color, font and node size.
plt.figure(figsize=(8, 6))
nx.draw(rr_graph, with_labels=True, node_color='lightgreen', node_size=500, font_size=10, font_weight='bold')
plt.title("Random Regular Graph")
plt.show()
# Print some basic information about the graph
print(f"Number of nodes: {rr_graph.number_of_nodes()}")
print(f"Number of edges: {rr_graph.number_of_edges()}")
print(f"Degree of each node: {d}")Number of nodes: 20
Number of edges: 30
Degree of each node: 3
Another option is using the Erdős-Rényi model which can be accessed using the nx.erdos_renyi_graph(n, p) command. This command has two inputs n and p. N is the number of nodes and p is the probability of edge creation to each node.
The Erdős–Rényi model refers to one of two closely related models for generating random graphs or the evolution of a random network. These models are named after Hungarian mathematicians Paul Erdős and Alfréd Rényi, who introduced one of the models in 1959
# Set a seed for reproducibility so that everytime the code runs we get the same random graph
random.seed(43)
# Parameters
n = 20 # number of nodes
p = 0.2 # probability of edge creation
# Generate the Erdős-Rényi random graph
er_graph = nx.erdos_renyi_graph(n, p)
# Visualize the graph
plt.figure(figsize=(8, 6))
nx.draw(er_graph, with_labels=True, node_color='lightblue', node_size=500, font_size=10, font_weight='bold')
plt.title("Erdős-Rényi Random Graph")
plt.show()
# Print some basic information about the graph
print(f"Number of nodes: {er_graph.number_of_nodes()}")
print(f"Number of edges: {er_graph.number_of_edges()}")
print(f"Average degree: {sum(dict(er_graph.degree()).values()) / n:.2f}")Number of nodes: 20
Number of edges: 40
Average degree: 4.00
There are more commands in NetworkX to generate random graphs, but the two above demonstrate two common methods of random graph generations. The first being a set number of nodes and edges and the second being a set number of nodes and a probability of edge creation between them.
2. Degrees, Density and Weights
2.1 Degrees
The degree of a node is the number of edges that are connected to a node. The degree of a node \(N\) is denoted as \(deg(N)\). The maximum degree of a network \(G\) is denoted by \(\Delta(G)\) and is the degree of the node with the highest degree in the network. Conversely, the minimum degree is denoted as \(\delta(G)\).
- If a node on a graph with \(n\) nodes has degree \(n-1\) it is called a dominating vertex. Not every graph has a dominating vertex.
We can see the degree of each node by running dict(G.degree()). This create a dictionary of key-value pairs for our network, where each key is the name of the node and the value is it’s respective degree.
degrees = dict(G.degree())If we want to see the degree of node \(n\), we can do so by running print(degrees[n]). For instance:
print(degrees[1])4
Let’s color the nodes of our graph based on their degree. We’ll create a function called get_node_colors which takes in the degree dictionary of each node and returns a color. We’ll then create a for-loop that iterates over each nodes in the list of nodes, gets the color of each node using the get_node_colors function we defined earlier, and appends it to an empty list called color_map.
degrees = dict(G.degree())
nodes = list(G.nodes())
def get_node_colors(degree):
if degree in [1, 2]:
return 'blue'
elif degree in [3, 4]:
return 'green'
elif degree in [5, 6]:
return 'yellow'
else:
return 'red'
color_map = [] #`color_map` is an empty list
for node in nodes:
color = get_node_colors(degrees[node]) # get color of current node using node_colors according to degree of node
color_map.append(color) # appends color of each node to color_map for each node in nodes
print(degrees)
print(nodes)
print(color_map){1: 4, 2: 2, 3: 5, 5: 5, 4: 4, 6: 4}
[1, 2, 3, 5, 4, 6]
['green', 'blue', 'yellow', 'yellow', 'green', 'green']
The \(n\)-th entry in color_map corresponds to the \(n\)-th node in nodes. For instance, color_map[0] returns the color of the first node (1).
color_map[0]'green'
We can now color the nodes of our graph, using the color map we defined above. The node_color argument takes in an array or list of colors that it uses to color each node.
G = nx.DiGraph() # creates an empty directed graph object
nodes = (1, 2, 3, 4, 5, 6)
edges = [(1, 2), (2, 3), (3, 1), (1,5), (3,5), (4, 5), (4, 6), (6, 1), (6, 3), (6,4), (4, 3), (5, 5), (3, 5)]
G.add_edges_from(edges)
G.add_nodes_from(nodes)
position = nx.spring_layout(G, seed=100)
nx.draw(G, pos = position, node_color=color_map, with_labels=True)
# node_color argument colors the nodes based on a given list or array of colors,
# with the first color corresponding to the first node, second to the second node, etc.Let’s also add a legend to our graph, which gives information about the meaning of each color. We’ll do this using the mpatches subpackage we imported earlier.
blue_patch = mpatches.Patch(color='blue', label='1-2 edges')
green_patch = mpatches.Patch(color='green', label='3-4 edges')
yellow_patch = mpatches.Patch(color='yellow', label='5-6 edges')
plt.legend(handles=[blue_patch, green_patch, yellow_patch]) #adds legend to the plot
nx.draw(G, pos = position, node_color=color_map, with_labels=True)2.2 Density
Density is defined as:
\[ \text{Density} = \frac{\text{Number of Possible Edges}}{\text{Number of Actual Edges}} \]
In an undirected graph, the total number of edges is \(\frac{V\times(V-1)}{2}\), where V is the total number of nodes. In a directed graph, the total number of edges is \(V\times(V-1)\), because a connection between point A and point B can either be from point A to point B, or to point A from point B (hence multiplying by 2).
- Note that self-loops (edges from and to the same node) are counted in the total number of edges but not in the maximum number of edges so graphs can have a density greater than 1.
The formula for undirected graph density is:
\[ \frac{2E}{V(V-1)} \]
And for directed graphs, it is:
\[ \frac{E}{V(V-1)} \]
Where \(E\) is the number of edges in our graph and \(V\) is the number of nodes.
We can calculate the density of our graph:
nx.density(G)0.4
2.3 Weights
Often times, you may end up working with weighted graphs: for instance, these weights could correspond to popularity of roads in road networks, or the number of artifacts in common between sites.
We’ll standardize our weights to be between 1 and 2 (as otherwise the results are messy). We’ll do this using a for-loop, like we did with the degrees.
G_weights = nx.DiGraph() #creating a new graph object called G_weights
nodes = [1, 2, 3, 4, 5, 6]
edges = [(1, 2), (2, 3), (3, 1), (1,5), (3,5), (4, 5), (4, 6), (6, 1), (6, 3), (6,4), (4, 3), (5, 5), (3, 5)]
weights = [100, 50, 75, 50, 60, 100, 100, 75, 40, 50, 50, 100, 100] #add list of weights
G_weights.add_edges_from(edges)
G_weights.add_nodes_from(nodes)
adjusted_weights = []
for weight in weights:
adjusted_weight = 1+ (max(weights)-weight)/(max(weights)-min(weights)) #standardizes weights to be between 1 and 2
adjusted_weights.append(adjusted_weight)
position = nx.spring_layout(G, seed=100)
print(adjusted_weights)
nx.draw(G_weights, pos = position, width = adjusted_weights, with_labels = True)
# width argument take in a list or array of numbers corresponding to weights[1.0, 1.8333333333333335, 1.4166666666666667, 1.8333333333333335, 1.6666666666666665, 1.0, 1.0, 1.4166666666666667, 2.0, 1.8333333333333335, 1.8333333333333335, 1.0, 1.0]
This is great, but the results aren’t very clear. Let’s add a color gradient to the edges to represent different weights.
norm = plt.Normalize(min(weights), max(weights), clip=False)
#`plot.normalizes` normalizes the weights such that they are evenly distributed across the gradient spectrum
edge_colors = plt.cm.Greys(norm(weights))
# norm(weights) normalizes the weights
# plot.cm.greys() assigns the weights to color values
# edge_colors is a multidimensional array of RGBA color values corresponding to each edge
fig, ax = plt.subplots() #explicitly specifying figure and axes in order to create a color bar
nx.draw(G_weights, pos=position, edge_color=edge_colors, width=adjusted_weights, with_labels=True, ax=ax)
#ax = ax argument needed for color bar
# Adding color bar
sm = plt.cm.ScalarMappable(cmap="Greys", norm=norm) # creates a scalarmappable object which acts
# as a bridge between the numerical weight values and color map
plt.colorbar(sm, ax=ax) #plotting color bar3. Adjacency matrices
An Adjacency matrix is a method of representing graphs in matrix form. In an adjacency matrix, the rows and columns correspond to the vertices (or nodes) of the graph. The entries of the matrix indicate whether pairs of vertices are adjacent or not in the graph. Normally, a value of 1 is assigned to entries where an edge is present, and 0 is assigned to entries where an edge is not. For a weighed graph, the weight of the edge is represented as a numerical value for entries where an edge is present.
We can convert our simple graph to an adjacency matrix:
nx.to_pandas_adjacency(G)| 1 | 2 | 3 | 5 | 4 | 6 | |
|---|---|---|---|---|---|---|
| 1 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 |
| 2 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 |
| 3 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 |
| 5 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 |
| 4 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 |
| 6 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 |
If we want to use our weighted graph, we can use the following code:
# len(edges) returns the total number of entries in the list of edges.
# range(len(edges)): This generates a sequence of numbers from 0 to n-1 where n is len(edges),
#so the for-loop will run n times with i taking each value in that range, one at a time.
for i in range(len(edges)):
edge = edges[i] # retrieves the edge at position i in the list of edges
weight = weights[i] # retrieves the weight at position i in the list of weights
G_weights.add_edge(edge[0], edge[1], weight=weight) # adds an edge with a weight to the graph
nx.to_pandas_adjacency(G_weights, nodelist=nodes, weight='weight') #converts to pandas adjacency matrix with the weights in place| 1 | 2 | 3 | 4 | 5 | 6 | |
|---|---|---|---|---|---|---|
| 1 | 0.0 | 100.0 | 0.0 | 0.0 | 50.0 | 0.0 |
| 2 | 0.0 | 0.0 | 50.0 | 0.0 | 0.0 | 0.0 |
| 3 | 75.0 | 0.0 | 0.0 | 0.0 | 100.0 | 0.0 |
| 4 | 0.0 | 0.0 | 50.0 | 0.0 | 100.0 | 100.0 |
| 5 | 0.0 | 0.0 | 0.0 | 0.0 | 100.0 | 0.0 |
| 6 | 75.0 | 0.0 | 40.0 | 50.0 | 0.0 | 0.0 |
We can visualize our matrix using the code below. Note that instead of using nx.to_pandas_adjacency we use nx.to_numpy_array: this allows us to store the matrix in the form of an array.
adj_matrix = nx.to_numpy_array(G_weights, nodelist=nodes, weight='weight')
plt.figure(figsize=(8, 8)) #displays data as an image on a 2d raster; in our case, a numpy array
plt.imshow(adj_matrix, cmap='gray_r')
for i in range(adj_matrix.shape[0]): #loops through each row of the matrix
for j in range(adj_matrix.shape[1]): #for each row, loops through each column of the matrix
plt.text(j, i, int(adj_matrix[i, j]),
ha='center', va='center', color='red', size=30) #prints the value at that position in the matrix on the graph
plt.title('Adjacency Matrix Visualization')
plt.xlabel('Node Index')
plt.ylabel('Node Index')Text(0, 0.5, 'Node Index')
4. Working with archeological datasets
The ICRATES database of tablewares in the Roman East is a dataset of red slip tablewares between the late Hellenistic and late Roman periods in the east Mediterranean (Bes, 2015). The dataset is in fact comprised of two datasets: the first is a dataset of nodes with their respected connections, as well as weights for each edge and other information. The second dataset contains geographical coordinates for each node in the network. We will use these coordinates to add a geospatial component to our analysis, by placing each node in it’s respective geographical coordinates and adding a map overlay. We begin by demonstrating how to import and visualize the dataset in python, and then introduce various centrality and connectivity measures.
Importantly, the nodes in the first dataset are categorized by time period. We will initially focus on a singular time period, but will showcase how to graph multiple time frames towards the end of the notebook.
4.1 Importing datasets in Python
We can import datasets in python using pandas’ read_ function. In our case, both network and location datasets are stored in excel files. We can thus use the read_excel function.
df_edges = pd.read_excel("network_analysis_datasets/vistorian_network.xls") #loading in dataset
G_directed = nx.from_pandas_edgelist(df_edges, source="SOURCE_LOCATION", target="TARGET_LOCATION", create_using=nx.DiGraph()) #creating a graph object
G = nx.from_pandas_edgelist(df_edges, source="SOURCE_LOCATION", target="TARGET_LOCATION")4.2 Adding a geospacial component to our graph
We could just plot this network right away, but this wouldn’t convey good information, as the nodes in our dataset represent real places. Instead, let’s bind each node to it’s respective geographical location using the geopandas library.
df_pos = pd.read_excel("network_analysis_datasets/vistorian_locations.xls", index_col=0)
# loading in coordinates of nodes; index_col specifies that the first column (with index 0) in the Excel sheet should be used as the row labels for the DataFrame
gdf_pos = gpd.GeoDataFrame(df_pos, geometry=gpd.points_from_xy(df_pos['LONGITUDE'], df_pos['LATITUDE']), crs='EPSG:4326')
# creates a geodataframe, a special dataframe from the geopandas library for storing geographic datapoints.
# crs='EPSG:4326' argument defines the Coordinate Reference System for the geometric data: in our case, the location dataset uses the WGS84 coordinate system
gdf_pos = gdf_pos.to_crs('EPSG:3857') #reprojecting to web mercator for the world map overlay to accurately show up on our graphgdf_pos #returns a GeoDataframe of all the nodes and their respective positions| GEONAME | LONGITUDE | LATITUDE | geometry | |
|---|---|---|---|---|
| NODE_NAME | ||||
| Abdera | Abdera | 24.973047 | 40.933650 | POINT (2779986.9 5002559.974) |
| Aizanoi | Aizanoi | 29.609819 | 39.201047 | POINT (3296150.023 4750510.914) |
| Alexandreia | Alexandreia | 29.907915 | 31.198246 | POINT (3329333.832 3658521.969) |
| Altinum | Altinum | 12.390766 | 45.557134 | POINT (1379333.798 5709661.408) |
| Amathous | Amathous | 33.142972 | 34.711781 | POINT (3689458.791 4124781.928) |
| ... | ... | ... | ... | ... |
| Tel_Anafa | Tel_Anafa | 35.645039 | 33.176975 | POINT (3967987.579 3918818.063) |
| Tenos | Tenos | 25.152483 | 37.591633 | POINT (2799961.636 4521896.839) |
| Thasos | Thasos | 24.713428 | 40.779792 | POINT (2751086.196 4979915.044) |
| Troia/Ilion | Troia/Ilion | 26.238967 | 39.957667 | POINT (2920908.409 4859792.421) |
| Veloukovo | Veloukovo | 22.179072 | 38.534158 | POINT (2468963.026 4655161.727) |
69 rows × 4 columns
gdf_pos.loc['Abdera']
# the loc accessor in pandas (and by extension, GeoPandas, which extends pandas) is used to access a group of rows and columns by labels. This will be useful for our next step.GEONAME Abdera
LONGITUDE 24.973047
LATITUDE 40.93365
geometry POINT (2779986.9000867764 5002559.974384146)
Name: Abdera, dtype: object
Great! we now have a GeoDataFrame of each node in our graph, with their respective position. However, networkX’s requires a dictionary of node-location pairs for properly mapping each node. Let’s create a new dictionary using the dataframe above.
positions = {} #empty dictionary which will contain the positions of each node
for location in gdf_pos.index: # iterates over each index in the gdf_pos GeoDataFrame. The index in gdf_pos is the names of all our nodes.
x = gdf_pos.loc[location].geometry.x # For each location, this line accesses the `geometry` column to retrieve the x-coordinate (latitude).
y = gdf_pos.loc[location].geometry.y # Same thing but for y-coordinate (longitude)
positions[location] = (x, y) # Adding to the dictionary with the location as the keyNow we have a dictionary containing each node, and it’s geographic position. We can now plot our graph!
fig, ax = plt.subplots(figsize=(9, 9))# Creating subplots so that we can overlay the network and the map
gdf_pos.plot(ax=ax, alpha=0)
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)# Adding the basemap: we are using CartoDB for this
# Drawing the network edges and nodes, as usual
nx.draw_networkx_edges(G, pos=positions, ax=ax, alpha=0.05)
nx.draw_networkx_nodes(G, pos=positions, ax=ax, node_size=20, node_color="red", edgecolors="black", alpha=0.8)We can focus on a specific geographical area by specifying longitudonal and latitudonal limits and filtering the locations dataset for locations that fall within those limits. From now on, let’s focus our analysis on the area surrounding the Aegean Sea.
gdf_pos = gpd.GeoDataFrame(df_pos, geometry=gpd.points_from_xy(df_pos['LONGITUDE'], df_pos['LATITUDE']), crs='EPSG:4326')
gdf_pos = gdf_pos.to_crs('EPSG:3857')
# We create a "bonding box" which contains the outer limits of our positions.
min_x, min_y = 1858948, 4055442
max_x, max_y = 3336323, 5175704
gdf_pos_filtered = gdf_pos.cx[min_x:max_x, min_y:max_y] # Filtering the GeoDataFrame to include only points within the bounding box
filtered_nodes = gdf_pos_filtered.index.tolist()
G_sub = G.subgraph(filtered_nodes) # Create a subgraph from the original graph G using only the filtered nodes
# Prepare positions for the filtered subgraph
positions_filtered = {}
for loc in gdf_pos_filtered.index:
x = gdf_pos_filtered.loc[loc].geometry.x
y = gdf_pos_filtered.loc[loc].geometry.y
positions_filtered[loc] = (x, y)
# Plotting the filtered subgraph
fig, ax = plt.subplots(figsize=(9, 9))
gdf_pos_filtered.plot(ax=ax, alpha=0)
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)
nx.draw_networkx_edges(G_sub, pos=positions_filtered, ax=ax, alpha=0.1)
nx.draw_networkx_nodes(G_sub, pos=positions_filtered, ax=ax, node_size=60, node_color="red", edgecolors="black", alpha=0.8)
plt.show()4.3 Working with weighed graphs
If we print the df_edges dataframe, we see that it contains WEIGHT and TYPE columns, that correspond to each edge in the network. The TYPE category corresponds to the type of pottery linking two sites, and the WEIGHT cateogry corresponds to the relative importance of each edge.
print(df_edges)
print(max(df_edges['WEIGHT']))
print(min(df_edges['WEIGHT']))
print(df_edges['TYPE'].unique()) FROM SOURCE_LOCATION TO TARGET_LOCATION \
0 Alexandreia Alexandreia Assos Assos
1 Alexandreia Alexandreia Anemorion Anemorion
2 Alexandreia Alexandreia Apollo_Smintheion Apollo_Smintheion
3 Alexandreia Alexandreia Tel_Anafa Tel_Anafa
4 Alexandreia Alexandreia Troia/Ilion Troia/Ilion
... ... ... ... ...
1586 Tarsos Tarsos Thasos Thasos
1587 Tarsos Tarsos Tel_Anafa Tel_Anafa
1588 Tenos Tenos Troia/Ilion Troia/Ilion
1589 Tenos Tenos Thasos Thasos
1590 Tenos Tenos Thasos Thasos
WEIGHT TYPE PERIOD
0 3 ESC 10
1 1 ESC 10
2 1 ESC 10
3 2 ESC 10
4 1 ESC 10
... ... ... ...
1586 1 ESB 12
1587 1 ESB 12
1588 1 ESB 12
1589 1 ESB 12
1590 0 NaN 13
[1591 rows x 7 columns]
17
0
['ESC' 'ESB' nan]
We can see that the WEIGHT cateory has values that range from 0 to 17, and the TYPE categories has two pottery types: ESB, ESC, as well as undefined pottery types (NaN). Let’s change the width of each edge in the network to reflect it’s weight, and the color of each edge to reflect it’s type.
This is made slightly difficult as we are working with a subgraph and the TYPE and WIDTH categories contain entries for the full graph.
df_edges| FROM | SOURCE_LOCATION | TO | TARGET_LOCATION | WEIGHT | TYPE | PERIOD | |
|---|---|---|---|---|---|---|---|
| 0 | Alexandreia | Alexandreia | Assos | Assos | 3 | ESC | 10 |
| 1 | Alexandreia | Alexandreia | Anemorion | Anemorion | 1 | ESC | 10 |
| 2 | Alexandreia | Alexandreia | Apollo_Smintheion | Apollo_Smintheion | 1 | ESC | 10 |
| 3 | Alexandreia | Alexandreia | Tel_Anafa | Tel_Anafa | 2 | ESC | 10 |
| 4 | Alexandreia | Alexandreia | Troia/Ilion | Troia/Ilion | 1 | ESC | 10 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 1586 | Tarsos | Tarsos | Thasos | Thasos | 1 | ESB | 12 |
| 1587 | Tarsos | Tarsos | Tel_Anafa | Tel_Anafa | 1 | ESB | 12 |
| 1588 | Tenos | Tenos | Troia/Ilion | Troia/Ilion | 1 | ESB | 12 |
| 1589 | Tenos | Tenos | Thasos | Thasos | 1 | ESB | 12 |
| 1590 | Tenos | Tenos | Thasos | Thasos | 0 | NaN | 13 |
1591 rows × 7 columns
G_sub_weighted = G.subgraph(filtered_nodes)
# creating a new column called `edge_tuple` which contains tuples of the source and target locations.
# We such a column for filtering the weight and type categories to only contain edges within G_sub_weighted and not G.
edge_tuples = []
for index, row in df_edges.iterrows(): #Iterating over the rows of df_edges using iterrows()
edge_tuple = (row['SOURCE_LOCATION'], row['TARGET_LOCATION']) # Creating a tuple from the 'SOURCE_LOCATION`` and 'TARGET_LOCATION' columns
edge_tuples.append(edge_tuple) # Appending each tuple to the list
df_edges['edge_tuple'] = edge_tuples # appending edge_tuples to df_edges
subgraph_edges = list(G_sub_weighted.edges()) #listing all the edges in the subgraph
filtered_edges = df_edges[df_edges['edge_tuple'].isin(subgraph_edges)] #filtering df_edges to only contain edges in the subgraph
filtered_edges.reset_index(drop=True, inplace=True) #the index gets all sliced up when filtering rows, this makes it normal again
edge_colors = [] #create an empty set which contains the colors of the artifacts ESB, ESC
for type in filtered_edges['TYPE']:
if type == 'ESB':
edge_colors.append('maroon')
elif type == 'ESC':
edge_colors.append('navy')
else:
edge_colors.append('plum')
# Set the weights of the edges
weights = filtered_edges['WEIGHT']
adjusted_weights = []
for weight in weights:
adjusted_weight = 0.1 + ((max(weights)- min(weights))/20)
adjusted_weights.append(adjusted_weight)
fig, ax = plt.subplots(figsize=(9, 9))
gdf_pos_filtered.plot(ax=ax, alpha=0)
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)
#Plot the network on the graph
nx.draw_networkx_nodes(G_sub_weighted, pos=positions_filtered, ax=ax, node_size=60, node_color="red", edgecolors="black", alpha=0.8)
nx.draw_networkx_edges(G_sub_weighted, pos=positions_filtered, ax=ax, width=adjusted_weights, edge_color=edge_colors, alpha=0.3)
#Show our graph
patch_maroon = mpatches.Patch(color='maroon', label='ESB')
patch_navy = mpatches.Patch(color='navy', label='ESC')
plt.legend(handles=[patch_maroon, patch_navy])
plt.show()
#Print information about the types of edges
print(adjusted_weights)
print(filtered_edges)[0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85]
FROM SOURCE_LOCATION TO TARGET_LOCATION \
0 Amygdalea Amygdalea Argos Argos
1 Amygdalea Amygdalea Knossos Knossos
2 Amygdalea Amygdalea Stobi Stobi
3 Apollo_Smintheion Apollo_Smintheion Troia/Ilion Troia/Ilion
4 Apollo_Smintheion Apollo_Smintheion Argos Argos
.. ... ... ... ...
534 Stobi Stobi Tenos Tenos
535 Stobi Stobi Tanagra Tanagra
536 Stobi Stobi Thasos Thasos
537 Tenos Tenos Thasos Thasos
538 Tenos Tenos Thasos Thasos
WEIGHT TYPE PERIOD edge_tuple
0 1 ESC 10 (Amygdalea, Argos)
1 1 ESC 10 (Amygdalea, Knossos)
2 1 ESC 10 (Amygdalea, Stobi)
3 7 ESC 10 (Apollo_Smintheion, Troia/Ilion)
4 5 ESC 10 (Apollo_Smintheion, Argos)
.. ... ... ... ...
534 2 ESB 12 (Stobi, Tenos)
535 1 ESB 12 (Stobi, Tanagra)
536 1 ESB 12 (Stobi, Thasos)
537 1 ESB 12 (Tenos, Thasos)
538 0 NaN 13 (Tenos, Thasos)
[539 rows x 8 columns]
This works, but the weightings are hard to see. One alternative is to filter the subgraph by weight, and graph the different weights side-by-side:
# creating a function that creates a list of colors, then apply the function to each subset of edges filtered by weight
def get_edge_colors(filtered_edges):
edge_colors = []
for type in filtered_edges['TYPE']:
if type == 'ESB':
edge_colors.append('maroon')
elif type == 'ESC':
edge_colors.append('navy')
else:
edge_colors.append('green')
return edge_colors
# Filtering the edges by weight
filtered_edges_w1 = filtered_edges[filtered_edges['WEIGHT'] == 1]
filtered_edges_w2 = filtered_edges[filtered_edges['WEIGHT'] == 2]
filtered_edges_w3 = filtered_edges[filtered_edges['WEIGHT'] == 3]
# applying the get_edge_colors function on each subset of edges
edge_colors_w1 = get_edge_colors(filtered_edges_w1)
edge_colors_w2 = get_edge_colors(filtered_edges_w2)
edge_colors_w3 = get_edge_colors(filtered_edges_w3)
fig, axes = plt.subplots(1, 3, figsize=(18, 6)) #creating a plot; 1, 3 creates 3 subplots spaced horizontally
# Graph 1: weight = 1
axes[0].set_title('Weight 1') #axes[n] specifies which plot to plot the graph in: axes[0] is plot 1.
gdf_pos_filtered.plot(ax=axes[0], alpha=0)
ctx.add_basemap(axes[0], source=ctx.providers.CartoDB.Positron)
nx.draw_networkx_nodes(G_sub_weighted, pos=positions_filtered, ax=axes[0], node_size=60, node_color="red", edgecolors="black", alpha=0.8)
nx.draw_networkx_edges(G_sub_weighted, pos=positions_filtered, ax=axes[0], edgelist=list(filtered_edges_w1['edge_tuple']),
edge_color=edge_colors_w1, alpha=0.4) #`list(filtered_edges_w1['edge_tuple'])` extracts the edge_tuple column and converts the entries into a list
# Graph 2: weight = 2
axes[1].set_title('Weight 2')
gdf_pos_filtered.plot(ax=axes[1], alpha=0)
ctx.add_basemap(axes[1], source=ctx.providers.CartoDB.Positron)
nx.draw_networkx_nodes(G_sub_weighted, pos=positions_filtered, ax=axes[1], node_size=60, node_color="red", edgecolors="black", alpha=0.8)
nx.draw_networkx_edges(G_sub_weighted, pos=positions_filtered, ax=axes[1], edgelist=list(filtered_edges_w2['edge_tuple']),
edge_color=edge_colors_w2, alpha=0.4)
# Graph 3: weight = 3
axes[2].set_title('Weight 3')
gdf_pos_filtered.plot(ax=axes[2], alpha=0)
ctx.add_basemap(axes[2], source=ctx.providers.CartoDB.Positron)
nx.draw_networkx_nodes(G_sub_weighted, pos=positions_filtered, ax=axes[2], node_size=60, node_color="red", edgecolors="black", alpha=0.8)
nx.draw_networkx_edges(G_sub_weighted, pos=positions_filtered, ax=axes[2], edgelist=list(filtered_edges_w3['edge_tuple']),
edge_color=edge_colors_w3, alpha=0.4)
patch_maroon = mpatches.Patch(color='maroon', label='ESB')
patch_navy = mpatches.Patch(color='navy', label='ESC')
axes[0].legend(handles=[patch_maroon, patch_navy], loc='upper left')
axes[1].legend(handles=[patch_maroon, patch_navy], loc='upper left')
axes[2].legend(handles=[patch_maroon, patch_navy], loc='upper left')
plt.show()5. Measures of Centrality
Centrality is defined as the set of metrics used to determine the importance or influence of a particular node within a network. It helps to identify which nodes hold strategic significance in terms of connectivity, information flow, or influence over other nodes. Various centrality metrics, such as degree, betweenness, and eigenvector centrality, provide different perspectives on the role each node plays within the network’s overall structure.
5.1 Network Distance and Eccentricity
Before talking about centrality, we first need to talk a bit about distance. Distance, also known as Geodesic distance, is defined as the number of edges traversed by the shortest path between two nodes.
- The distance between a node and itself is 0.
- The distance between a node and a node for which no shortest path exists (such as a node that is disconnected from other nodes) is \(\infty\).
- The distance between a node and it’s neighbor is 1.
A node’s eccentricity is the maximum distance from said node to all other nodes in the graph. For instance, in the following network, the eccentricity of node \(A\) is 2, but the eccentricity of node \(B\) is 1.
nodes = ("A","B", "C")
edges = [("A","B"), ("B", "C")]
G_example = nx.Graph()
G_example.add_edges_from(edges)
G_example.add_nodes_from(nodes)
color_map = ["salmon", "lightblue", "salmon"]
red_patch = mpatches.Patch(color='salmon', label='eccentricity = 1')
blue_patch = mpatches.Patch(color='lightblue', label='eccentricity = 2')
plt.legend(handles=[blue_patch, red_patch])
nx.draw(G_example, node_color=color_map, with_labels=True)If we color the nodes of our Aegean sea pottery network by eccentricity, we see an interesting result:
#Bring in our data to be used
gdf_pos = gpd.GeoDataFrame(df_pos, geometry=gpd.points_from_xy(df_pos['LONGITUDE'], df_pos['LATITUDE']), crs='EPSG:4326')
gdf_pos = gdf_pos.to_crs('EPSG:3857')
#Set a bounding box on our data
min_x, min_y = 1858948, 4055442
max_x, max_y = 3336323, 5175704
#Use the bounding box on our data
gdf_pos_filtered = gdf_pos.cx[min_x:max_x, min_y:max_y]
filtered_nodes = gdf_pos_filtered.index.tolist()
G_sub_eccentricity = G.subgraph(filtered_nodes)
eccentricities = nx.eccentricity(G_sub_eccentricity) #calculating eccentricities
eccentricities_array = np.array(list(eccentricities.values()))
#Set a colour map for different values of eccentricity in our network
color_map = []
for eccentricity in eccentricities_array:
if eccentricity == 1:
color_map.append("palegreen")
elif eccentricity == 2:
color_map.append("slateblue")
elif eccenticity == 3:
color_map.append("orange")
else:
color_map.append("grey")
#Filter oyur graph
positions_filtered = {}
for loc in gdf_pos_filtered.index:
x = gdf_pos_filtered.loc[loc].geometry.x
y = gdf_pos_filtered.loc[loc].geometry.y
positions_filtered[loc] = (x, y)
fig, ax = plt.subplots(figsize=(9, 9))
gdf_pos_filtered.plot(ax=ax, alpha=0)
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)
#Draw our graph with different colours for eccentricity
nx.draw_networkx_edges(G_sub_eccentricity, pos=positions_filtered, ax=ax, alpha=0.1)
nx.draw_networkx_nodes(G_sub_eccentricity, pos=positions_filtered, node_color=color_map, ax=ax, node_size=60, edgecolors="black", alpha=0.8)
patch = mpatches.Patch(color='slateblue', label='eccentricity = 2')
plt.legend(handles=[patch])
#show the graph
plt.show()All nodes are at most 2 edges away from every other other node!
5.2 Degree Centrality
Degree centrality is simple: Recall that the degree of a node is the number of nodes directly connected to it. In degree centrality, the more adjacent nodes, the more important the network is considered to be. Degree centrality is used primarily in social networks, where nodes with higher degrees are commonly major channels of information. A high degree means a node has many direct ties with other nodes, and has better access to resources within the network.
Note that the networkX nx.degree_centrality() function normalizes each node’s degree by dividing by the maximum possible degree in the network. Therefore for graphs without self-loops (such as the one we are working with) the degree centrality is always \(\leq 1\). For educational purposes, we un-normalize the degree values, but this is not common practice.
We can calculate the degree centrality of all our nodes in our network:
#Bring in our dataset
gdf_pos = gpd.GeoDataFrame(df_pos, geometry=gpd.points_from_xy(df_pos['LONGITUDE'], df_pos['LATITUDE']), crs='EPSG:4326')
gdf_pos = gdf_pos.to_crs('EPSG:3857')
#Set a bounding box to limit the scope of the graph
min_x, min_y = 1858948, 4055442
max_x, max_y = 3336323, 5175704
#Filter our graph using the bounding box
gdf_pos_filtered = gdf_pos.cx[min_x:max_x, min_y:max_y]
filtered_nodes = gdf_pos_filtered.index.tolist()
G_sub = G.subgraph(filtered_nodes)
#Find the centrality values
centrality = nx.degree_centrality(G_sub)
centrality_values = np.array(list(centrality.values()))
n = len(centrality_values)
original_degrees = centrality_values * (n - 1)
#Colour our map based on the degree centrality
color_map=[]
for degree in original_degrees:
if degree == max(original_degrees):
color_map.append("indianred")
else:
normalized_value = degree / max(original_degrees) # Normalize the degree to the range [0, 1]
color_map.append(plt.cm.winter(normalized_value))
cmap = plt.cm.winter
#Filter our graph
positions_filtered = {}
for loc in gdf_pos_filtered.index:
x = gdf_pos_filtered.loc[loc].geometry.x
y = gdf_pos_filtered.loc[loc].geometry.y
positions_filtered[loc] = (x, y)
fig, ax = plt.subplots(figsize=(9, 9))
gdf_pos_filtered.plot(ax=ax, alpha=0)
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)
edges= nx.draw_networkx_edges(G_sub, pos=positions_filtered, ax=ax, alpha=0.1)
nodes = nx.draw_networkx_nodes(G_sub, pos=positions_filtered, node_color=color_map, cmap=cmap, ax=ax, node_size=60, edgecolors="black", alpha=1)
# Adding the colorbar using the inset_axes function
axins = inset_axes(ax,
width="5%",
height="30%",
loc='lower left',
bbox_to_anchor=(0.03, 0.03, 1, 1),
bbox_transform=ax.transAxes,
borderpad=0.3)
# Creating colorbar
norm = plt.Normalize(vmin=min(original_degrees), vmax=max(original_degrees))
sm = plt.cm.ScalarMappable(cmap=cmap, norm=norm)
sm.set_array(original_degrees)
plt.colorbar(sm, cax=axins, orientation="vertical")
patch = mpatches.Patch(color='indianred', label='maximum degree (degree = 36)')
plt.legend(handles=[patch], loc='center left', bbox_to_anchor=(10, 3))
plt.show()
#Print some information about our graph
print("minimum degree:", min(original_degrees))
print("maximum degree:", max(original_degrees))
print("difference:", max(original_degrees)-min(original_degrees))D:\Users\alexr\anaconda3\Lib\site-packages\networkx\drawing\nx_pylab.py:457: UserWarning:
No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
minimum degree: 3.0000000000000004
maximum degree: 39.0
difference: 36.0
5.3 Closeness Centrality
Closeness centrality is a measure of how close a node is to all other nodes in the network. It can be computed as the “sum of the geodesic distances of a node to all other nodes in the network”. A node is important if it is close to all other nodes in the network. One flaw of closeness centrality is that while it is a useful indicator of node importance in small networks, it produces little variation in large networks with many edges. This is particularly evident in our example network:
#Import a library so the code works
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
#Bring in our data
gdf_pos = gpd.GeoDataFrame(df_pos, geometry=gpd.points_from_xy(df_pos['LONGITUDE'], df_pos['LATITUDE']), crs='EPSG:4326')
gdf_pos = gdf_pos.to_crs('EPSG:3857')
#Bounding Box
min_x, min_y = 1858948, 4055442
max_x, max_y = 3336323, 5175704
#Filter our graph using bounding box
gdf_pos_filtered = gdf_pos.cx[min_x:max_x, min_y:max_y]
filtered_nodes = gdf_pos_filtered.index.tolist()
G_sub = G.subgraph(filtered_nodes)
# Find centrality
centrality = nx.closeness_centrality(G_sub)
centrality_values = np.array(list(centrality.values()))
#Set a colour for different values of centrality
color_map=[]
for degree in centrality_values:
if degree == max(centrality_values):
color_map.append("indianred")
else:
normalized_value = degree / max(centrality_values)
color_map.append(plt.cm.winter(normalized_value))
cmap = plt.cm.winter
positions_filtered = {}
for loc in gdf_pos_filtered.index:
x = gdf_pos_filtered.loc[loc].geometry.x
y = gdf_pos_filtered.loc[loc].geometry.y
positions_filtered[loc] = (x, y)
fig, ax = plt.subplots(figsize=(9, 9))
gdf_pos_filtered.plot(ax=ax, alpha=0)
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)
#Draw our edges
edges= nx.draw_networkx_edges(G_sub, pos=positions_filtered, ax=ax, alpha=0.1)
nodes = nx.draw_networkx_nodes(G_sub, pos=positions_filtered, node_color=color_map, cmap=cmap, ax=ax, node_size=60, edgecolors="black", alpha=1)
axins = inset_axes(ax,
width="5%",
height="30%",
loc='lower left',
bbox_to_anchor=(0.03, 0.03, 1, 1),
bbox_transform=ax.transAxes,
borderpad=0.3)
norm = plt.Normalize(vmin=min(original_degrees), vmax=max(original_degrees))
sm = plt.cm.ScalarMappable(cmap=cmap, norm=norm)
sm.set_array(original_degrees)
plt.colorbar(sm, cax=axins, orientation="vertical")
patch = mpatches.Patch(color='indianred', label='maximum closeness (closeness = 39.093)')
plt.legend(handles=[patch], loc='center left', bbox_to_anchor=(7, 3))
plt.show()
#Print information about our graph
print("minumim closeness:", min(original_degrees))
print("maximum closeness:", max(original_degrees))
print("difference:", max(original_degrees)-min(original_degrees))D:\Users\alexr\anaconda3\Lib\site-packages\networkx\drawing\nx_pylab.py:457: UserWarning:
No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
minumim closeness: 3.0000000000000004
maximum closeness: 39.0
difference: 36.0
5.4 Betweenness Centrality
Betweenness Centrality is a measure of the importance of a node based on how well it serves as a bridge between nodes in a network. The mathematical representation of the betweeness centrality of a node is the number of times each node has to pass through that node to reach every other node in a network. Nodes with high betweenness thus serve as “bridges” within a network.
The Betweenness centrality of a given node \(i\) is calculated as: \(b(i)=\sum_{j,k}\frac{g_{jik}}{g_{jk}}\), where \(g_{jik}\) is the number of paths from node \(j\) to node \(k\) passing through node \(i\), and \(g_{jk}\) is the number of paths from node \(g\) to node \(j\) (including paths passing through node \(i\)).
Note that, for undirected graphs, two adjacent nodes can only have one path between them (ie, between two adjacent nodes \(A\) and \(B\), if \(A\rightarrow B\) is a path, then \(B \rightarrow A\) is not).
Consider the graph below:
#Define our network
G_betweenness_example = nx.Graph()
edges_list = [(0,1),(0,2),(0,3),(0,4),(1,2),(2,3),(3,4),(1,4),(2,4),(1,3),(4,5),(5,6)]
G_betweenness_example.add_edges_from(edges_list)
pos = nx.spring_layout(G_betweenness_example, seed=1000)
#Draw our graph
nx.draw(G_betweenness_example,pos=pos, with_labels=True, edgecolors="black", node_color="bisque", node_size=800)Node \(4\) serves as a bridge between nodes 5 and 6 to the rest of the nodes in the network. For a path to be drawn between nodes 6 or 5 to nodes 0,1,2,3, the path must go through node 4. Let’s calculate the betweenness centrality of this network, and label nodes by centrality:
centrality{'Assos': 0.9534883720930233,
'Apollo_Smintheion': 0.803921568627451,
'Troia/Ilion': 0.8367346938775511,
'Athens': 0.9318181818181818,
'Ephesos': 0.9534883720930233,
'Amygdalea': 0.6833333333333333,
'Argos': 0.7735849056603774,
'Knossos': 0.9318181818181818,
'Stobi': 0.82,
'Gortyn': 0.8367346938775511,
'Kallion': 0.5694444444444444,
'Kenchreai': 0.7735849056603774,
'Samos_(Heraion)': 0.9318181818181818,
'Sykea': 0.5694444444444444,
'Veloukovo': 0.5189873417721519,
'Sparta': 0.6833333333333333,
'Corinth': 0.8723404255319149,
'Pergamon': 0.5616438356164384,
'Tenos': 0.6949152542372882,
'Kastro_Tigani': 0.7321428571428571,
'Eretria': 0.7735849056603774,
'Kydonia': 0.5540540540540541,
'Tanagra': 0.6949152542372882,
'Thasos': 0.640625,
'MS116': 0.5540540540540541,
'Abdera': 0.803921568627451,
'Aizanoi': 0.6612903225806451,
'Delos': 0.5466666666666666,
'Priene': 0.5942028985507246,
'Olympia': 0.6307692307692307,
'Iasos': 0.5857142857142857,
'Pylos': 0.5324675324675324,
'Labraunda': 0.82,
'Emporio': 0.6833333333333333,
'Isthmia': 0.7321428571428571,
'Kythera': 0.6833333333333333,
'Nikopolis': 0.6833333333333333,
'Siphnos': 0.7592592592592593,
'Amphipolis': 0.6307692307692307,
'Kommos': 0.6307692307692307,
'MS010': 0.5256410256410257,
'Didyma': 0.6212121212121212}
#Define our network
G_betweenness_example = nx.Graph()
edges_list = [(0,1),(0,2),(0,3),(0,4),(1,2),(2,3),(3,4),(1,4),(2,4),(1,3),(4,5),(5,6)]
G_betweenness_example.add_edges_from(edges_list)
pos = nx.spring_layout(G_betweenness_example, seed=1000)
#Find the centrality values for our nodes
centrality = nx.betweenness_centrality(G_betweenness_example, normalized=False)
centrality_values = np.array(list(centrality.values()))
cmap="BuPu"
#Put labels on our network
labels = {}
for node in G_betweenness_example.nodes():
labels[node] = centrality_values[node]
#Draw our graph using `nx.draw`
nx.draw(G_betweenness_example,pos=pos, node_color=centrality_values, edgecolors="black", cmap=cmap, node_size=800)
nx.draw_networkx_labels(G_betweenness_example, pos, labels=labels, font_color="orangered"){0: Text(0.498764757665798, -0.06713690471966567, '0.0'),
1: Text(0.2550713364952553, 0.20785219968276475, '0.0'),
2: Text(0.47012617248883504, 0.16863130442291982, '0.0'),
3: Text(0.29322154425308394, -0.14483930534061298, '0.0'),
4: Text(0.030284391105298025, 0.0025896144060782622, '8.0'),
5: Text(-0.5474682020082701, -0.05903340272624547, '5.0'),
6: Text(-1.0, -0.10806350572523814, '0.0')}
We can see that node 4 does indeed have the highest betweenness centrality. The values of 0 for nodes 0, 1, 2, 3 and 6 indicate that each node can reach every other node without passing through those nodes. The value of 5.0 for node 5 indicates that five nodes must pass through node 5 in order to reach another node. In other words, \(\frac{g_{jik}}{g_{jk}}=1\) for 5 pairs of nodes in the network.
Coloring the nodes in our archeology dataset by betweenness centrality, we get:
#Bring in our data
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
gdf_pos = gpd.GeoDataFrame(df_pos, geometry=gpd.points_from_xy(df_pos['LONGITUDE'], df_pos['LATITUDE']), crs='EPSG:4326')
gdf_pos = gdf_pos.to_crs('EPSG:3857')
#Bounding box
min_x, min_y = 1858948, 4055442
max_x, max_y = 3336323, 5175704
#Filter graph using bounding box
gdf_pos_filtered = gdf_pos.cx[min_x:max_x, min_y:max_y]
filtered_nodes = gdf_pos_filtered.index.tolist()
G_sub = G.subgraph(filtered_nodes)
#Find centrality values and place them into an array.
centrality = nx.betweenness_centrality(G_sub)
centrality_values = np.array(list(centrality.values()))
n = len(centrality_values)
original_degrees = centrality_values * (n - 1)
#Colour our values based on centrality
color_map=[]
for degree in original_degrees:
if degree == max(original_degrees):
color_map.append("indianred")
else:
normalized_value = degree / max(original_degrees)
color_map.append(plt.cm.winter(normalized_value))
cmap = plt.cm.winter
#Filter our new network
positions_filtered = {}
for loc in gdf_pos_filtered.index:
x = gdf_pos_filtered.loc[loc].geometry.x
y = gdf_pos_filtered.loc[loc].geometry.y
positions_filtered[loc] = (x, y)
fig, ax = plt.subplots(figsize=(9, 9))
gdf_pos_filtered.plot(ax=ax, alpha=0)
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)
# Create our edges and nodes
edges= nx.draw_networkx_edges(G_sub, pos=positions_filtered, ax=ax, alpha=0.1)
nodes = nx.draw_networkx_nodes(G_sub, pos=positions_filtered, node_color=color_map, cmap=cmap, ax=ax, node_size=60, edgecolors="black", alpha=1)
axins = inset_axes(ax,
width="5%",
height="30%",
loc='lower left',
bbox_to_anchor=(0.03, 0.03, 1, 1),
bbox_transform=ax.transAxes,
borderpad=0.3)
norm = plt.Normalize(vmin=min(original_degrees), vmax=max(original_degrees))
sm = plt.cm.ScalarMappable(cmap=cmap, norm=norm)
sm.set_array(original_degrees)
plt.colorbar(sm, cax=axins, orientation="vertical")
#Plot our graph
patch = mpatches.Patch(color='indianred', label='maximum betweenness (betweenness = 2.621)')
plt.legend(handles=[patch], loc='center left', bbox_to_anchor=(7, 3))
plt.show()D:\Users\alexr\anaconda3\Lib\site-packages\networkx\drawing\nx_pylab.py:457: UserWarning:
No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
5.5 Eigenvector centrality
Eigenvector centrality is a measure of the influence of a node in a network by considering not just how many connections it has (as we did with degree centrality), but also the importance of those connections: A node with high eigenvector centrality is connected to many nodes that themselves have high centrality, making it more influential in spreading information or resources. Unlike simpler measures like degree centrality, which only counts connections, eigenvector centrality looks at the overall structure of the network. It helps identify key players in a network who might not have the most connections but are well-connected to other important nodes.
The term “eigenvector” in comes from linear algebra: an eigenvector is a special kind of vector taken from a network’s adjacency matrix. Each element in this eigenvector represents a node’s centrality in the network.
#Bring in data
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
gdf_pos = gpd.GeoDataFrame(df_pos, geometry=gpd.points_from_xy(df_pos['LONGITUDE'], df_pos['LATITUDE']), crs='EPSG:4326')
gdf_pos = gdf_pos.to_crs('EPSG:3857')
#Bounding Box
min_x, min_y = 1858948, 4055442
max_x, max_y = 3336323, 5175704
#Filter using bounding box
gdf_pos_filtered = gdf_pos.cx[min_x:max_x, min_y:max_y]
filtered_nodes = gdf_pos_filtered.index.tolist()
G_sub = G.subgraph(filtered_nodes)
#Find the eigenvector centrality in our graph
centrality = nx.eigenvector_centrality(G_sub)
centrality_values = np.array(list(centrality.values()))
# Colour our nodes
color_map=[]
for degree in centrality_values:
if degree == max(centrality_values):
color_map.append("indianred")
else:
normalized_value = degree / max(centrality_values)
color_map.append(plt.cm.winter(normalized_value))
cmap = plt.cm.winter
#Filter the graph
positions_filtered = {}
for loc in gdf_pos_filtered.index:
x = gdf_pos_filtered.loc[loc].geometry.x
y = gdf_pos_filtered.loc[loc].geometry.y
positions_filtered[loc] = (x, y)
fig, ax = plt.subplots(figsize=(9, 9))
gdf_pos_filtered.plot(ax=ax, alpha=0)
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)
#Draw our edges and nodes
edges= nx.draw_networkx_edges(G_sub, pos=positions_filtered, ax=ax, alpha=0.1)
nodes = nx.draw_networkx_nodes(G_sub, pos=positions_filtered, node_color=color_map, cmap=cmap, ax=ax, node_size=60, edgecolors="black", alpha=1)
axins = inset_axes(ax,
width="5%",
height="30%",
loc='lower left',
bbox_to_anchor=(0.03, 0.03, 1, 1),
bbox_transform=ax.transAxes,
borderpad=0.3)
norm = plt.Normalize(vmin=min(original_degrees), vmax=max(original_degrees))
sm = plt.cm.ScalarMappable(cmap=cmap, norm=norm)
sm.set_array(original_degrees)
plt.colorbar(sm, cax=axins, orientation="vertical")
#Plot our graph
patch = mpatches.Patch(color='indianred', label='maximum betweenness (betweenness = 2.621)')
plt.legend(handles=[patch], loc='center left', bbox_to_anchor=(7, 3))
plt.show()D:\Users\alexr\anaconda3\Lib\site-packages\networkx\drawing\nx_pylab.py:457: UserWarning:
No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
5.6 Directed graphs and Putting it all together
Lastly, let’s clean up our centrality graphs by putting them together into one graph. We’ll also use a directed version of our graph, to see how that changes the results.
#Bring in our data
df_edges = pd.read_excel("network_analysis_datasets/vistorian_network.xls")
df_pos = pd.read_excel("network_analysis_datasets/vistorian_locations.xls", index_col=0)
#Define the graph
G_directed = nx.from_pandas_edgelist(df_edges, source="SOURCE_LOCATION", target="TARGET_LOCATION", create_using=nx.DiGraph())
gdf_pos = gpd.GeoDataFrame(df_pos, geometry=gpd.points_from_xy(df_pos['LONGITUDE'], df_pos['LATITUDE']), crs='EPSG:4326')
gdf_pos = gdf_pos.to_crs('EPSG:3857')
#Create a bounding box and filter the graph using it
min_x, min_y = 1858948, 4055442
max_x, max_y = 3336323, 5175704
gdf_pos_filtered = gdf_pos.cx[min_x:max_x, min_y:max_y]
filtered_nodes = gdf_pos_filtered.index.tolist()
#Plot the now directed points
G_filtered = G_directed.subgraph(filtered_nodes)
positions_filtered = {loc: (gdf_pos_filtered.loc[loc].geometry.x, gdf_pos_filtered.loc[loc].geometry.y) for loc in gdf_pos_filtered.index}
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
fig.set_facecolor('lightblue')
for ax in axes.flatten():
gdf_pos_filtered.plot(ax=ax, alpha=0)
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)
#The codes below are from the above sections (more notes can be found there on what each code block does)
# Plotting Degree Centrality
degree_centrality = nx.degree_centrality(G_filtered)
degree_values = np.array(list(degree_centrality.values()))
n = len(degree_values)
original_degrees = degree_values * (n - 1)
color_map=[]
for degree in original_degrees:
if degree == max(original_degrees):
color_map.append("indianred")
else:
normalized_degree = degree / max(original_degrees)
color_map.append(plt.cm.winter(normalized_degree))
nx.draw_networkx_edges(G_filtered, pos=positions_filtered, ax=axes[0, 0], alpha=0.1)
nx.draw_networkx_nodes(G_filtered,
pos=positions_filtered,
node_color=color_map,
cmap=plt.cm.winter,
ax=axes[0, 0],
node_size=60,
edgecolors="black",
alpha=1)
axes[0, 0].set_title("Degree Centrality")
axes[0, 0].set_facecolor("white")
# Add colorbar for Degree Centrality
axins = inset_axes(axes[0, 0], width="5%", height="30%", loc='lower left',
bbox_to_anchor=(0.03, 0.03, 1, 1),
bbox_transform=axes[0, 0].transAxes,
borderpad=0.3)
sm = plt.cm.ScalarMappable(cmap=plt.cm.winter, norm=plt.Normalize(vmin=min(original_degrees), vmax=max(original_degrees )))
sm.set_array(original_degrees)
plt.colorbar(sm, cax=axins, orientation="vertical")
patch = mpatches.Patch(color='indianred', label=f'max degree ({max(original_degrees):.3f})')
axes[0, 0].legend(handles=[patch], loc='upper left')
# Plot Closeness Centrality
closeness_centrality = nx.closeness_centrality(G_filtered)
closeness_values = np.array(list(closeness_centrality.values()))
color_map_closeness=[]
for degree in closeness_values:
if degree == max(closeness_values):
color_map_closeness.append("indianred")
else:
normalized_degree_closeness = degree / max(closeness_values)
color_map_closeness.append(plt.cm.winter(normalized_degree_closeness))
nx.draw_networkx_edges(G_filtered, pos=positions_filtered, ax=axes[1, 0], alpha=0.1)
nx.draw_networkx_nodes(G_filtered, pos=positions_filtered, node_color=color_map_closeness, cmap=plt.cm.winter, ax=axes[1, 0], node_size=60, edgecolors="black", alpha=1)
axes[1, 0].set_title("Closeness Centrality")
axes[1, 0].set_facecolor("white")
# Add colorbar for Closeness Centrality
axins = inset_axes(axes[1, 0], width="5%", height="30%", loc='lower left',
bbox_to_anchor=(0.03, 0.03, 1, 1),
bbox_transform=axes[1, 0].transAxes,
borderpad=0.3)
sm = plt.cm.ScalarMappable(cmap=plt.cm.winter, norm=plt.Normalize(vmin=min(closeness_values), vmax=max(closeness_values)))
sm.set_array(closeness_values)
plt.colorbar(sm, cax=axins, orientation="vertical")
patch = mpatches.Patch(color='indianred', label=f'max closeness ({max(closeness_values):.3f})')
axes[1, 0].legend(handles=[patch], loc='upper left')
# Plot Betweenness Centrality
betweenness_centrality = nx.betweenness_centrality(G_filtered)
betweenness_values = np.array(list(betweenness_centrality.values()))
color_map_betweenness=[]
for degree in betweenness_values:
if degree == max(betweenness_values):
color_map_betweenness.append("indianred")
else:
normalized_degree_betweenness = degree / max(betweenness_values) # Normalize the degree to the range [0, 1]
color_map_betweenness.append(plt.cm.winter(normalized_degree_betweenness))
nx.draw_networkx_edges(G_filtered, pos=positions_filtered, ax=axes[0, 1], alpha=0.1)
nx.draw_networkx_nodes(G_filtered, pos=positions_filtered, node_color=color_map_betweenness, cmap=plt.cm.winter, ax=axes[0, 1], node_size=60, edgecolors="black", alpha=1)
axes[0, 1].set_title("Betweenness Centrality")
axes[0, 1].set_facecolor("white")
# Add colorbar for Betweenness Centrality
axins = inset_axes(axes[0, 1], width="5%", height="30%", loc='lower left', bbox_to_anchor=(0.03, 0.03, 1, 1), bbox_transform=axes[0, 1].transAxes, borderpad=0.3)
sm = plt.cm.ScalarMappable(cmap=plt.cm.winter, norm=plt.Normalize(vmin=min(betweenness_values), vmax=max(betweenness_values)))
sm.set_array(betweenness_values)
plt.colorbar(sm, cax=axins, orientation="vertical")
patch = mpatches.Patch(color='indianred', label=f'max betweenness ({max(betweenness_values):.3f})')
axes[0, 1].legend(handles=[patch], loc='upper left')
# Plot Eigenvector Centrality
eigenvector_centrality = nx.eigenvector_centrality(G_filtered, max_iter=1000)
eigenvector_values = np.array(list(eigenvector_centrality.values()))
color_map_eigenvector=[]
for degree in eigenvector_values:
if degree == max(eigenvector_values):
color_map_eigenvector.append("indianred")
else:
normalized_degree_eigenvector = degree / max(eigenvector_values) # Normalize the degree to the range [0, 1]
color_map_eigenvector.append(plt.cm.winter(normalized_degree_eigenvector))
nx.draw_networkx_edges(G_filtered, pos=positions_filtered, ax=axes[1, 1], alpha=0.1)
nx.draw_networkx_nodes(G_filtered, pos=positions_filtered, node_color=color_map_eigenvector, cmap=plt.cm.winter, ax=axes[1, 1], node_size=60, edgecolors="black", alpha=1)
axes[1, 1].set_title("Eigenvector Centrality")
axes[1, 1].set_facecolor("white")
# Add colorbar for Eigenvector Centrality
axins = inset_axes(axes[1, 1], width="5%", height="30%", loc='lower left', bbox_to_anchor=(0.03, 0.03, 1, 1), bbox_transform=axes[1, 1].transAxes, borderpad=0.3)
sm = plt.cm.ScalarMappable(cmap=plt.cm.winter, norm=plt.Normalize(vmin=min(eigenvector_values), vmax=max(eigenvector_values)))
sm.set_array(eigenvector_values)
plt.colorbar(sm, cax=axins, orientation="vertical")
patch = mpatches.Patch(color='indianred', label=f'max eigenvector centrality ({max(eigenvector_values):.3f})')
axes[0, 1].legend(handles=[patch], loc='upper left')D:\Users\alexr\anaconda3\Lib\site-packages\networkx\drawing\nx_pylab.py:457: UserWarning:
No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
6. Group-level analysis: finding subgroups within networks
Cohesive subgroups in network analysis refer to clusters of nodes within a network that are more densely connected to each other than to the rest of the network. These subgroups indicate areas of high interaction or strong relationships within the larger network. Identifying cohesive subgroups helps in understanding the structure and dynamics of the network, such as how information or influence flows within and between these groups. The process of finding cohesive subgroups within networks is called cohesive group analysis.
- A clique is a subset of nodes within a graph where every node is directly connected to every other node in the subset. This means that in a clique, all possible edges between the nodes are present, making it a maximally connected subgraph.
- All nodes are that are by themselves are inherently a clique (a 1-clique)
- In undirected graphs, a \(k\)-core is a subgraph in which every node is connected to at least \(k\) other nodes within the subgraph. The \(k\)-core is obtained by iteratively removing all nodes with a degree less than \(k\) until no more nodes can be removed.
#Create our network
G_cliques_example = nx.Graph()
edges_list = [(0,1),(0,2),(0,3),(0,4),(1,2),(2,3),(3,4),(1,4),(2,4),(1,3),(4,5),(5,6)]
G_cliques_example.add_edges_from(edges_list)
pos = nx.spring_layout(G_cliques_example, seed=1000)
#Find our cliques and print where they are
cliques = [x for x in nx.find_cliques(G_cliques_example)]
print(cliques)
#Draw our graph
nx.draw(G_cliques_example,pos=pos, with_labels=True, edgecolors="black", node_color = "bisque", cmap=cmap, node_size=800)[[4, 0, 1, 2, 3], [4, 5], [6, 5]]
We can see that there are three cliques in this network: \((4, 0, 1, 2, 3)\), \((4, 5)\), and \((6, 5)\):
#Set the figure size (24,6) not (8,6) because we have 3 graphs to show
fig, axes = plt.subplots(1, 3, figsize=(24, 6))
fig.set_facecolor('lightblue')
#Create our network
G_cliques_example = nx.Graph()
edges_list = [(0,1),(0,2),(0,3),(0,4),(1,2),(2,3),(3,4),(1,4),(2,4),(1,3),(4,5),(5,6)]
G_cliques_example.add_edges_from(edges_list)
pos = nx.spring_layout(G_cliques_example, seed=1000)
#Find our cliques and print them
cliques = [x for x in nx.find_cliques(G_cliques_example)]
print(cliques)
#Colour our cliques
clique_1 = ["mediumseagreen", "mediumseagreen", "mediumseagreen", "mediumseagreen", "mediumseagreen", "bisque", "bisque"]
clique_2 = ["bisque", "bisque", "bisque", "bisque", "mediumseagreen", "mediumseagreen", "bisque"]
clique_3 = ["bisque", "bisque", "bisque", "bisque", "bisque", "mediumseagreen", "mediumseagreen"]
#Draw our cliques
nx.draw(G_cliques_example, ax=axes[0], pos=pos, with_labels=True, edgecolors="black", node_color = clique_1, cmap=cmap, node_size=800)
nx.draw(G_cliques_example, ax=axes[1], pos=pos, with_labels=True, edgecolors="black", node_color = clique_2, cmap=cmap, node_size=800)
nx.draw(G_cliques_example, ax=axes[2], pos=pos, with_labels=True, edgecolors="black", node_color = clique_3, cmap=cmap, node_size=800)[[4, 0, 1, 2, 3], [4, 5], [6, 5]]
#Create our example graph
G_cliques_example = nx.Graph()
edges_list = [(0,1),(0,2),(0,3),(0,4),(1,2),(2,3),(3,4),(1,4),(2,4),(1,3),(4,5),(5,6)]
G_cliques_example.add_edges_from(edges_list)
pos = nx.spring_layout(G_cliques_example, seed=1000)
#Find the cliques
cliques = [x for x in nx.find_cliques(G_cliques_example)]
print(cliques)
#For loop
node_counts = {}
for clique in cliques: #for each clique in the list of cliques...
for node in clique: # for each node in each clique...
if node in node_counts: #checks whether the current node already exists as a key in the node_counts dictionary
node_counts[node] += 1 #if it is in the dictionary, increase it's value by 1
else:
node_counts[node] = 1 #if it isn't, dont change
#Colour our nodes
colors = []
for node in G_cliques_example.nodes():
if node_counts[node] == 1:
colors.append("lightgreen")
elif node_counts[node] == 2:
colors.append("forestgreen")
elif node_counts[node] == 3:
colors.append("orange")
else:
colors.append("red")
#Draw our network
nx.draw(G_cliques_example,pos=pos, with_labels=True, edgecolors="black", node_color = colors, cmap=cmap, node_size=800)
patch_green = mpatches.Patch(color='lightgreen', label='node in one clique')
patch_forest = mpatches.Patch(color='forestgreen', label='node in two cliques')
plt.legend(handles=[patch_green, patch_forest])
plt.show()[[4, 0, 1, 2, 3], [4, 5], [6, 5]]
Let’s repeat the same process as the example above on our archeology dataset:
G_sub_cliques = G.subgraph(filtered_nodes)
#Set the plot size
fig, ax = plt.subplots(figsize=(9, 9))
gdf_pos_filtered.plot(ax=ax, alpha=0)
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)
#Find our cliques
cliques = [x for x in nx.find_cliques(G_sub_cliques)]
print(cliques)
node_counts = {} #empty dictionary of key-value pairs where keys are nodes and values are the frequency that they show up in a clique
for node in G_sub_cliques.nodes():
node_counts[node] = 0 # makes dictionary values start at 0
for clique in cliques: #for each clique in the list of cliques...
for node in clique: # for each node in each clique...
node_counts[node] += 1 #if it is in the dictionary, increase it's value by 1
#Set the colours for different cliques
colors = []
max_values = max(node_counts.values())
min_values = min(node_counts.values())
for node in G_sub_cliques.nodes():
count = node_counts.get(node, 0)
if count == max_values:
colors.append("salmon")
elif count == min_values:
colors.append("forestgreen")
else:
normalized_value = count / max_values
colors.append(plt.cm.PuBu(normalized_value))
#Draw our network
nx.draw_networkx_edges(G_sub_cliques, pos=positions_filtered, ax=ax, alpha=0.1)
nx.draw_networkx_nodes(G_sub_cliques, pos=positions_filtered, ax=ax, node_size=60, node_color=colors, edgecolors="black", alpha=0.8)
#Make a legend in the corner of the graph with labels
patch_forestgreen = mpatches.Patch(color='forestgreen', label=f"Minimum number of cliques: {min_values}")
patch_salmon = mpatches.Patch(color='salmon', label=f"Maximum number of cliques: {max_values}")
plt.legend(handles=[patch_salmon, patch_forestgreen])
axins = inset_axes(ax,
width="5%",
height="30%",
loc='lower left',
bbox_to_anchor=(0.03, 0.03, 1, 1),
bbox_transform=ax.transAxes,
borderpad=0.3)
node_counts_values = np.array(list(node_counts.values()))
norm = plt.Normalize(vmin=min_values, vmax=max_values)
sm = plt.cm.ScalarMappable(cmap="PuBu", norm=norm)
sm.set_array(node_counts_values)
plt.colorbar(sm, cax=axins, orientation="vertical")
#Plot our graph
plt.show()[['MS010', 'Knossos', 'Athens', 'Abdera', 'Apollo_Smintheion'], ['Delos', 'Samos_(Heraion)', 'Troia/Ilion', 'Ephesos', 'Aizanoi', 'Corinth', 'Athens', 'Abdera'], ['Assos', 'Veloukovo', 'Athens', 'Apollo_Smintheion'], ['Assos', 'Ephesos', 'Pylos', 'Knossos', 'Corinth', 'Athens'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Knossos', 'Kydonia', 'Troia/Ilion', 'Corinth', 'Argos', 'Eretria'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Knossos', 'Athens', 'Stobi', 'Labraunda', 'Corinth', 'Priene', 'Aizanoi', 'Kenchreai', 'Troia/Ilion'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Knossos', 'Athens', 'Stobi', 'Labraunda', 'Corinth', 'Priene', 'Aizanoi', 'Kenchreai', 'Iasos'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Knossos', 'Athens', 'Stobi', 'Labraunda', 'Corinth', 'Gortyn', 'Didyma', 'Kenchreai', 'Kastro_Tigani', 'Troia/Ilion', 'Tenos', 'Aizanoi', 'Apollo_Smintheion', 'Isthmia'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Knossos', 'Athens', 'Stobi', 'Labraunda', 'Corinth', 'Gortyn', 'Abdera', 'Siphnos', 'Eretria', 'Olympia', 'Amphipolis', 'Tenos', 'Kenchreai', 'Thasos', 'Kommos'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Knossos', 'Athens', 'Stobi', 'Labraunda', 'Corinth', 'Gortyn', 'Abdera', 'Siphnos', 'Eretria', 'Kastro_Tigani', 'Troia/Ilion', 'Apollo_Smintheion', 'Isthmia', 'Argos', 'Kenchreai'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Knossos', 'Athens', 'Stobi', 'Labraunda', 'Corinth', 'Gortyn', 'Abdera', 'Siphnos', 'Eretria', 'Kastro_Tigani', 'Troia/Ilion', 'Apollo_Smintheion', 'Isthmia', 'Argos', 'Nikopolis', 'Sparta', 'Amygdalea', 'Kythera', 'Emporio', 'Tanagra'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Knossos', 'Athens', 'Stobi', 'Labraunda', 'Corinth', 'Gortyn', 'Abdera', 'Siphnos', 'Eretria', 'Kastro_Tigani', 'Troia/Ilion', 'Apollo_Smintheion', 'Isthmia', 'Tenos', 'Kenchreai'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Knossos', 'Athens', 'Stobi', 'Labraunda', 'Corinth', 'Gortyn', 'Abdera', 'Siphnos', 'Eretria', 'Thasos', 'Tanagra'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Knossos', 'Athens', 'Stobi', 'Labraunda', 'Corinth', 'Gortyn', 'Abdera', 'Aizanoi', 'Kenchreai', 'Kastro_Tigani', 'Troia/Ilion', 'Isthmia', 'Apollo_Smintheion', 'Tenos'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Knossos', 'Athens', 'Stobi', 'Labraunda', 'Corinth', 'Iasos', 'Kenchreai', 'Argos'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Knossos', 'Athens', 'MS116', 'Troia/Ilion', 'Argos', 'Kenchreai'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Knossos', 'Sykea', 'Kenchreai', 'Troia/Ilion', 'Argos', 'Apollo_Smintheion', 'Gortyn', 'Kallion'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Pergamon', 'Troia/Ilion', 'Athens', 'Argos', 'Apollo_Smintheion', 'Gortyn'], ['Assos', 'Ephesos', 'Samos_(Heraion)', 'Pergamon', 'Troia/Ilion', 'Athens', 'Priene']]
7. Network-level analysis: Clusters and clustering coefficients
A cluster (also known as a community) is a set of nodes in a graph that are densely connected to each other but sparsely connected to nodes in other clusters. For example, in a social network, a cluster might represent a group of people who frequently interact with each other but have fewer interactions with people outside the group. Community detection is the process of finding such communities within nodes.
Before diving into community detection, we first need to understand modularity. Modulaity is a numerical measure for the community structure of a graph: it compares the density of edges within the communities of a network to the density of edges between communities. Mathematically, modularity compares the observed density of edges within a community to the expected density of edges if the connections were random. A positive modularity value suggests a strong community structure, while values closer to zero or negative indicate that the divisions are no better than random.
7.1 Louvain Algorithm
The Louvain algorithm is a community detection method in networks that aims to optimize modularity. By optimizing modularity, the Louvain algorithm effectively uncovers natural divisions in the network where connections are dense within clusters and sparse between them, thus identifying meaningful community structures.
First, each node is assigned to its own community, and nodes are then iteratively moved to neighboring communities if it increases the modularity. In the second phase, the algorithm creates a new network where each community from the first phase is treated as a single node, and the process is repeated. This hierarchical approach continues until no further modularity improvements can be made, resulting in a final set of communities that maximize modularity.
G_sub_louvain = G.subgraph(filtered_nodes)
#Set our figure size
fig, ax = plt.subplots(figsize=(9, 9))
gdf_pos_filtered.plot(ax=ax, alpha=0)
ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)
partition = community_louvain.best_partition(G_sub_louvain) #applying louvain
#Colour map of possible colors with values associated with it
color_map = {
0: "coral",
1: "cornflowerblue",
2: "limegreen",
3: "crimson",
4: "purple"}
#Set node colors
node_colors = []
for node in G_sub_louvain.nodes():
cluster_id = partition[node] # Get the cluster ID for the node
color = color_map[cluster_id] # Get the corresponding color from the color_map
node_colors.append(color)
#Draw our network
nx.draw_networkx_edges(G_sub_louvain, pos=positions_filtered, ax=ax, alpha=0.1)
nx.draw_networkx_nodes(G_sub_louvain, pos=positions_filtered,
ax=ax, node_size=60, node_color=node_colors,edgecolors="black", alpha=0.8)
print(partition)
#Make a legend in the corner with color values and show our graph
patch_coral = mpatches.Patch(color='coral', label="Community 0")
patch_cornflower = mpatches.Patch(color='cornflowerblue', label="Community 1")
patch_limegreen = mpatches.Patch(color='limegreen', label="Community 2")
patch_crimson = mpatches.Patch(color='crimson', label="Community 4")
plt.legend(handles=[patch_coral, patch_cornflower, patch_limegreen, patch_crimson])
plt.show(){'Assos': 0, 'Apollo_Smintheion': 1, 'Troia/Ilion': 0, 'Athens': 0, 'Ephesos': 0, 'Amygdalea': 1, 'Argos': 1, 'Knossos': 0, 'Stobi': 2, 'Gortyn': 2, 'Kallion': 0, 'Kenchreai': 2, 'Samos_(Heraion)': 0, 'Sykea': 0, 'Veloukovo': 0, 'Sparta': 1, 'Corinth': 0, 'Pergamon': 0, 'Tenos': 2, 'Kastro_Tigani': 1, 'Eretria': 1, 'Kydonia': 0, 'Tanagra': 1, 'Thasos': 2, 'MS116': 0, 'Abdera': 1, 'Aizanoi': 0, 'Delos': 0, 'Priene': 0, 'Olympia': 2, 'Iasos': 0, 'Pylos': 0, 'Labraunda': 2, 'Emporio': 1, 'Isthmia': 1, 'Kythera': 1, 'Nikopolis': 1, 'Siphnos': 1, 'Amphipolis': 2, 'Kommos': 2, 'MS010': 0, 'Didyma': 0}
clusters = {}
# Iterating through the partition dictionary
for node, cluster in partition.items():
if cluster not in clusters: # If the cluster is not already in the dictionary, add it
clusters[cluster] = []
clusters[cluster].append(node)
print("community_0:", clusters[0]) #Prints the names of nodes in each community 0,1,2
print("community_1:", clusters[1])
print("community_2:", clusters[2])community_0: ['Assos', 'Troia/Ilion', 'Athens', 'Ephesos', 'Knossos', 'Kallion', 'Samos_(Heraion)', 'Sykea', 'Veloukovo', 'Corinth', 'Pergamon', 'Kydonia', 'MS116', 'Aizanoi', 'Delos', 'Priene', 'Iasos', 'Pylos', 'MS010', 'Didyma']
community_1: ['Apollo_Smintheion', 'Amygdalea', 'Argos', 'Sparta', 'Kastro_Tigani', 'Eretria', 'Tanagra', 'Abdera', 'Emporio', 'Isthmia', 'Kythera', 'Nikopolis', 'Siphnos']
community_2: ['Stobi', 'Gortyn', 'Kenchreai', 'Tenos', 'Thasos', 'Olympia', 'Labraunda', 'Amphipolis', 'Kommos']
7.2 Clustering coefficients
The last thing we’ll cover is clustering coefficients. The clustering coefficient of a graph is the measure of how closely nodes in a graph tend to cluster together. It gives an indication of the degree to which nodes in a graph tend to form clusters. Mathematically, the clustering coefficient is the fraction of a node’s neighbors (nodes connected to that node via an edge), that are also neighbors with each other.
There are two primary types of clustering coefficients: local coefficients and global coefficients. A local coefficient is a measure of how many of the node’s neighbors are also neighbors of each other, relative to the total number of possible connections between those neighbors. Mathematically, it is expressed as: \[C(v)=\frac{2 \cdot\; T(v)}{k_{v}×(k_{v}−1)}\]
Where v is a node in a graph, \(T(v)\) is the actual number of edges between the neighbors of v, and \(k_{v}×(k_{v}−1)\) is the maximum total number of possible connections between the neighbors of v.
The global coefficient is the average of the local coefficients of all the nodes in a graph. Both coefficients range between 0 and 1.
We can calculate the global and local coefficients for our network:
clustering_coefficients = nx.clustering(G_sub_louvain)
print(clustering_coefficients){'Assos': 0.5641025641025641, 'Apollo_Smintheion': 0.6924731182795699, 'Troia/Ilion': 0.6590909090909091, 'Athens': 0.5732574679943101, 'Ephesos': 0.5695006747638327, 'Amygdalea': 1.0, 'Argos': 0.7241379310344828, 'Knossos': 0.5846372688477952, 'Stobi': 0.7520161290322581, 'Gortyn': 0.7121212121212122, 'Kallion': 1.0, 'Kenchreai': 0.7019704433497537, 'Samos_(Heraion)': 0.5945945945945946, 'Sykea': 1.0, 'Veloukovo': 1.0, 'Sparta': 1.0, 'Corinth': 0.6554621848739496, 'Pergamon': 0.9166666666666666, 'Tenos': 0.8853754940711462, 'Kastro_Tigani': 0.9015384615384615, 'Eretria': 0.8103448275862069, 'Kydonia': 1.0, 'Tanagra': 0.9604743083003953, 'Thasos': 0.9673202614379085, 'MS116': 1.0, 'Abdera': 0.7440860215053764, 'Aizanoi': 0.8473684210526315, 'Delos': 1.0, 'Priene': 0.8974358974358975, 'Olympia': 1.0, 'Iasos': 0.9696969696969697, 'Pylos': 1.0, 'Labraunda': 0.7520161290322581, 'Emporio': 1.0, 'Isthmia': 0.9015384615384615, 'Kythera': 1.0, 'Nikopolis': 1.0, 'Siphnos': 0.8518518518518519, 'Amphipolis': 1.0, 'Kommos': 1.0, 'MS010': 1.0, 'Didyma': 1.0}
The code above gives the clustering coefficients for every node in the graph. If we want the global coefficient, we can use:
average_clustering_coefficient = nx.average_clustering(G_sub_louvain)
print(average_clustering_coefficient)0.8616447207095111
Our high clustering coefficient indicates that the nodes in this network tend to create tightly knit groups with a high density of connections.
8. Citations
Al-Taie, M. Z., & Kadry, S. (2017). Python for graph and network analysis. Springer. https://doi.org/10.1007/978-3-319-53004-8
Bes, P. (2015). Once upon a time in the east: The chronological and geographical distribution of terra sigillata and red slip ware in the Roman East (Vol. 6). Archaeopress. (Roman and Late Antique Mediterranean Pottery)
Brughmans, T., & Bach, B. (2018). The Vistorian: Exploring archaeological networks. https://archaeologicalnetworks.wordpress.com/resources/#vistorian
Brughmans, T., & Peeples, M. A. (2023). Network science in archaeology. Cambridge University Press. https://doi.org/10.1017/9781009170659
Mills, B. J., Clark, J. J., Peeples, M. A., Haas, W. R., Roberts, J. M., Hill, J. B., Huntley, D. L., Borck, L., Breiger, R. L., Clauset, A., & Shackley, M. S. (2013). Transformation of social networks in the late pre-Hispanic US Southwest. Proceedings of the National Academy of Sciences, 110(15), 5785–5790. https://doi.org/10.1073/pnas.1219966110
QuantEcon DataScience. (n.d.). Social and economic networks. https://datascience.quantecon.org/applications/networks.html
Sayama, H. (2024, April 30). Generating random graphs. In Introduction to the modeling and analysis of complex systems. Mathematics LibreTexts. https://math.libretexts.org/Bookshelves/Scientific_Computing_Simulations_and_Modeling/ Introduction_to_the_Modeling_and_Analysis_of_Complex_Systems_(Sayama)/15%3A_Basics_of_Networks/15.06%3A_Generating_Random_Graphs