SMA Module2A
SMA Module2A
Analytics
Module 2
Network Measures
Overview - Network Measures
Network Basics - Degree and Degree Distributions,
Paths, Clustering Coefficient, Connected Components
Node Centrality – Degree centrality, Closeness
Centrality, Betweenness centrality, Edge Betweenness
centrality, Assortativity, Transitivity and Reciprocity,
Similarity.
Properties of Real-World Networks – High Average
Local Clustering Coefficient, Small-world Property,
Scale-free Property.
Random Network Model- Degree Distribution of
Random Network, Evolution of a Random Network,
Average Path Length, Clustering Coefficient, Random
Network vs. Real-world Network
Connectivity in Graphs - Adjacent nodes and Incident Edges
When the graph is directed, edge directions must match for edges to be
incident
An edge in a graph can be traversed when one starts at one of its end-
nodes, moves along the edge, and stops at its other end-node.
Walk, Path, Trail, Tour, and Cycle
Walk: A walk is a sequence of incident edges visited one
after another
– Open walk: A walk does not end where it starts
– Closed walk: A walk returns to where it starts
• Representing a walk:
– A sequence of edges: 𝑒1, 𝑒2, … , 𝑒𝑛
– A sequence of nodes: 𝑣1, 𝑣2, … , 𝑣𝑛
• Length of walk:
the number of visited edges
Length of walk= 8
Trail
• A trail is a walk where no edge is visited more than once and all walk edges are
distinct
• A closed trail (one that ends where it starts) is called a tour or circuit
Path
• A walk where nodes and edges are distinct is called a path and a closed path is called
a cycle
• The length of a path or cycle is the number of edges visited in the path or cycle
3 components 3 Strongly-connected
components
Shortest Path
• Shortest Path is the path between two nodes that
has the shortest length.
–We denote the length of the shortest path between
nodes 𝑣𝑖 and 𝑣𝑗 as 𝑙𝑖,𝑗
• A hub is a node with a high degree of connectivity, meaning it has many direct links
to others in the network.
• 📌 Examples:
– Instagram Influencers: A celebrity like Cristiano Ronaldo, who has millions of
followers and serves as a central hub in the sports and lifestyle communities.
– Wikipedia Editors: A prolific Wikipedia editor who frequently contributes to
various topics and interacts with many other editors.
– YouTube Educators: Channels like Khan Academy, which has many subscribers and
influences learners worldwide.
• Impact: Hubs play a crucial role in spreading information quickly within a social
network.
Structural Holes in Facebook
Microscopic Macroscopic
Degree Degree Distribution
Local clustering coefficient
Path and Diameter
Node centrality
Edge density
Mesoscopic
Connected components
Global clustering coefficient
Giant components Reciprocity and Assortativity
Group centralities
Degree of a Node
For an undirected, unweighted network, degree of a
node v is defined as the number of nodes in the network
to which there is an edge from the node v.
In other words, for an undirected, unweighted network,
1 3 1 3
degree of a node v is the number of edges of the
network that are incident on the node v. 5
5
Putting differently, for an undirected, unweighted 2
2
network, degree of a node v is the number of neighbours 4
4
of the node v.
G1
In graph G1, degrees of the nodes 1 through 5 are 2, 3, 4, G2
2, 3.
In graph G2, degrees of the nodes 1 through 5 are 3, 5, 7,
5, 4.
Note: A self-loop is counted twice in evaluating degree of
a node.
Weighted Degree of a Node
3
For an undirected, weighted network, the weighted degree
1 3 of a node is defined as the sum of weights of the edges
6
1 incidents on that node
6 2 9 7 5 For the weighted undirected graph G3, the weighted degrees
4 of the nodes are as follows:
2 7
4
5 Weighted degree of node 1 is 11
Weighted degree of node 2 is 22
G3 Weighted degree of node 3 is 26
Weighted degree of node 4 is 16
Weighted degree of node 5 is 22
Indegree and Outdegree of a Node
G1
Importance of Degree and Degree Distribution in a Social Graph
• Identifying Influential Nodes
– High-degree nodes (hubs) indicate influential individuals in social networks (e.g., celebrities,
key opinion leaders).
– Helps in viral marketing, influence maximization, and rumor spreading analysis.
• Network Connectivity & Robustness
– Determines how well the network is connected.
– Helps in assessing the impact of removing key nodes (network resilience analysis).
• Community Detection & Social Structure
– Degree distribution provides insights into network structure (random, scale-free, or small-
world).
– Helps in detecting tightly connected groups (e.g., friend circles, business clusters).
• Epidemic Modeling & Information Spread
– Determines how diseases, information, or trends propagate in a network.
– Networks with a power-law degree distribution (scale-free networks) show faster spreading.
Importance of Degree and Degree Distribution in a Social Graph
• Anomaly Detection & Fraud Detection
–Unusual degree distributions can indicate anomalies (e.g., fake accounts in social
media).
–Useful in cybersecurity, bot detection, and financial fraud analysis.
• Link Prediction & Recommendation Systems
–Nodes with similar degrees are likely to form new connections.
–Used in friend recommendation (Facebook, LinkedIn), content suggestions
(YouTube, Netflix).
• Graph Sampling & Network Evolution
–Degree distribution helps in generating representative samples of large social
networks.
–Useful for studying the evolution of networks over time.
Twitter
• Degree Analysis
–In-Degree: Number of followers a user has.
–Out-Degree: Number of people a user follows.
–High in-degree (e.g, Anand Mahindra) → Influencer, central node.
–High out-degree (e.g., Random User 1) → Potential bot/spammer.
𝑒2
In graph 𝐺1 , the distance between 1 and 4 is 2, the same between 1
1 3 and 5 is also 2.
𝑒6
𝑒3
𝑒1 𝑒5 The diameter of a network is defined as the maximum distance
5
𝑒4 between any pair of nodes in the network.
2 𝑒7
4 The diameter of the graph 𝐺1 is 2.
𝐺1 For a graph 𝐺 with 𝑛 nodes, the average path length 𝑙𝐺 is defined as
the average number of steps along the shortest paths for all
possible pairs of nodes in the network.
σ𝑖≠𝑗 𝑑𝑖𝑗
𝑙𝐺 = , where 𝑑𝑖𝑗 is distance between nodes 𝑣𝑖 and 𝑣𝑗
𝑛(𝑛−1)
Some Graph Preliminaries…
The density of a graph 𝐺(𝑉, 𝐸), denoted 𝜌(𝐺), is defined as the ratio of the number
of edges in the graph to the total number of possible edges in the network.
Mathematically,
2 × |𝐸|
𝜌 𝐺 =
𝑉 × ( 𝑉 − 1)
For the graph 𝐺1 , the average path length is:
2 × (1 + 1 + 2 + 2 + 1 + 2 + 1 + 1 + 1 + 1) 26
= = 1.3
5×4 20
For the graph 𝐺1 , the network density is:
2×7
= 0.7
5×4
Importance of Density in Social Graphs
1
Closed Triplet
In the graph 𝐺2 , there is three closed triplet viz., 1
[1,2,3], [2,3,1], and [3,1,2].
3 In the graph 𝐺2 , there is five closed triplets, viz.,
(1,2,3), (2,3,1), (3,1,2), (2,1,4), and (3,1,4).
2 4
Thus, the global clustering coefficient of the graph 𝐺2 is
3Τ .
1 5 𝐺2
Open Triplet
Global Clustering Coefficient
In a typical social network, there are loose links that connects the tightly-knit
clusters
In an undirected network 𝐺, two nodes 𝑣𝑖 and 𝑣𝑗 are said to be connected if
there exists a path between 𝑣𝑖 and 𝑣𝑗 .
An entire network is said to be connected if any pair of nodes in the network is
connected.
Connected subnetworks of a network, if exist, are called components of the
network.
In real-world networks, there often exist one giant component (consuming
major chunk of nodes) and many smaller components.
In a network, connectedness shows resilience to link breakdowns.
Finding Connected Components
Apply BFS from node 𝑣, and colour with blue all the
𝑛=3 nodes reached thereof, and increment 𝑛 each time
𝑣
No more node can be reached from 𝑣 using BFS. We
𝑛=4 get a component in blue.
𝑣
𝑣
𝑛=4 Since 𝑛 ≠ 9, we choose a red node as 𝑣, repeat the
steps above to find other components
Centrality in a Network
Influential players often play central roles in a network
Defining/Identifying influential players always remain hard
Some players attract limelight
Some others play behind the scene
Many others do important linkage
To identify influential players, we require
to define a notion of influence
to device measure that can capture that influence
In social network analysis, centrality refers to a set of metrics used to determine
the importance or influence of individual nodes within a network.
These metrics help identify key actors who hold significant positions in terms of
connectivity, information flow, or control within the network.
Centrality Measures in a Social Graph
• Degree Centrality: Counts the number of direct connections a node
has, indicating its immediate influence.
• Closeness Centrality: Calculates the average shortest path from a
node to all other nodes, reflecting how quickly it can access the
entire network.
• Betweenness Centrality: Measures the number of times a node acts
as a bridge along the shortest path between two other nodes,
highlighting its role in information flow.
• Eigenvector Centrality: Assigns relative scores to nodes based on
the principle that connections to high-scoring nodes contribute
more to a node's score, capturing influence within the network.
Centrality Measures in a Social Graph
B 3 1/2 5
C 5 5/6 1
D
D 4 2/3 2
A
E 3 1/2 5
F 4 2/3 2
E G
G 3 1/2 5
Degree Centrality – Normalization
• Centrality of the simplest kind
• In a sense, captures the popularity of a player within a network 1 3
• Quantifies the direct influence of a node on its local neighbourhood
• The degree centrality 𝐶𝑑 𝑣 of a node 𝑣 in a network 𝐺(𝑉, 𝐸) is defined as: 5
2
𝐝𝐞𝐠(𝐯)
• 𝐂𝐝 𝐯 = 4
𝐦𝐚𝐱 𝐝𝐞𝐠(𝐮)
𝐮∈𝐕 𝑮𝟏
• Particularly useful for marketing scenarios, wherein the detected influential user can
promote a product/service across her followers
• Degree centrality of the nodes 1 through 5 in network 𝐺1 are 2Τ4 , 3Τ4 , 4Τ4 , 2Τ4 , and 3Τ4 ,
respectively; i.e., 0.5, 0.75, 1.0, 0.5, and 0.75, respectively. So, node 3 is most central
according to degree centrality measure.
Degree Centrality
• Influencer Identification:
– Users with high degree centrality are often considered influencers due to their
extensive direct connections. Identifying these users is crucial for targeted marketing
and outreach campaigns.
• Information Dissemination:
– High degree centrality indicates a user's potential to rapidly disseminate information
across the network, making them pivotal in spreading news, trends, or viral content.
• Community Detection:
– Analyzing degree centrality helps in identifying central figures within communities or
subgroups, providing insights into the structure and cohesion of these clusters.
• Network Robustness Assessment:
– Understanding which nodes have high degree centrality aids in assessing the
network's vulnerability. The removal of such central nodes can significantly impact
the network's connectivity and information flow.
Closeness Centrality
• A means for detecting nodes that can spread information very efficiently through a
graph
• The measure is useful in
– Examining/restricting the spread of fake news/misinformation in social media
– Examining/restricting the spread of a disease in epidemic modelling
– Controlling/restricting the flow of vital information and resources within an
organization (a terrorist network, for example)
• The closeness centrality 𝐶(𝑣) of a node 𝑣 in a network 𝐺(𝑉, 𝐸) is defined as
𝑉 −1
• 𝐶 𝑣 = σ𝑢∈𝑉∖{𝑣} 𝑑(𝑢,𝑣)
Closeness centrality:
Closeness Centrality
In graph 𝐺1 , the closeness centrality for the nodes are as follows
5−1 4
𝐶 1 = = = 0.67
1 3 1+1+2+2 6
5−1 4
𝐶 2 = = = 0.80
1+1+2+1 5
5 5−1 4
𝐶 3 = = = 1.0
1+1+1+1 4
2 5−1 4
𝐶 4 = = = 0.67
4
2+2+1+1 6
5−1 4
𝐶 1 = = = 0.80
𝑮𝟏 2+1+1+1 5
Clearly, node 3 is most central according to closeness centrality
measure
Closeness Centrality: Example 1
Closeness Centrality: Example 2 (Undirected)
Closeness
Node A B C D E F G H I D_Avg Centrality Rank
A 0 1 2 1 2 1 2 3 2 1.750 0.571 1
B 1 0 1 2 1 2 3 4 3 2.125 0.471 3
C 2 1 0 3 2 3 4 5 4 3.000 0.333 8
D 1 2 3 0 1 2 3 4 3 2.375 0.421 4
E 2 1 2 1 0 3 4 5 4 2.750 0.364 7
F 1 2 3 2 3 0 1 2 1 1.875 0.533 2
G 2 3 4 3 4 1 0 3 2 2.750 0.364 7
H 3 4 5 4 5 2 3 0 1 3.375 0.296 9
I 2 3 4 3 4 1 2 1 0 2.500 0.400 5
Closeness Centrality: Example 3 (Directed)
Closeness
Node A B C D E F G H I D_Avg Centrality Rank
A 0 1 2 3 2 2 1 3 3 2.125 0.471 1
B 3 0 1 2 1 4 4 2 3 2.500 0.400 2
C 4 5 0 7 6 3 5 1 2 4.125 0.242 9
D 1 2 3 0 3 3 2 4 5 2.875 0.348 3
E 2 3 4 1 0 4 3 5 5 3.375 0.296 6
F 1 2 3 4 3 0 2 4 4 2.875 0.348 4
G 2 3 4 5 4 1 0 5 2 3.250 0.308 5
H 4 4 5 6 5 2 4 0 1 3.875 0.258 8
I 2 3 4 5 4 1 4 5 0 3.500 0.286 7
Simple Computation – Closeness Centrality
• The number next to each node is the distance from that node to
the square red node as measured by the length of the shortest
path.
• The green edges illustrate one of the two shortest paths
between the red square node and the red circle node.
• The closeness of the red square node is therefore
5/(1+1+1+2+2) = 5/7.
Closeness Centrality - Significance
• In social network analysis, Nodes with high closeness centrality can efficiently
disseminate information or influence others due to their strategic positioning.
• Facebook:
– A user connected to various groups and communities can quickly share information
across the platform.
– For instance, community managers or users active in multiple interest groups often
have high closeness centrality, enabling them to disseminate news or updates
efficiently.
• LinkedIn:
– Professionals connected across diverse industries and regions can leverage their high
closeness centrality to access and share job opportunities, industry insights, or
professional recommendations swiftly.
– Recruiters or industry leaders often exhibit high closeness centrality, facilitating
efficient communication within the professional network.
Closeness Centrality - Significance
• Instagram:
– Influencers who engage with various communities, such as fashion, travel, and
technology, possess high closeness centrality.
– Their diverse connections allow them to rapidly spread trends, brand promotions,
or messages across different audience segments.
• Twitter:
– A user who actively engages with various communities and topics can achieve high
closeness centrality.
– For instance, a journalist covering diverse subjects may connect with multiple user
groups, enabling them to quickly spread information across the platform.
– Research has utilized centrality measures to identify social media influencers
within specific industries on Twitter.
Betweenness Centrality
• A measure to compute how central a node is in between paths of the network
• A measure to compute how many (shortest) paths of the network pass through the
node
• Useful in identifying
– the articulation points, i.e., the points in a network which, if removed, may
disconnect the network
– The super spreaders in analyzing disease spreading in epidemiology
– the suspected spies in security networks
• The betweenness centrality 𝐶𝐵 (𝑣) of a node 𝑣 in a network 𝐺(𝑉, 𝐸) is defined as
𝜎𝑥𝑦 (𝑣)
• 𝐶𝐵 𝑣 = σ𝑥,𝑦∈𝑉∖{𝑣}
𝜎𝑥𝑦
• where 𝜎𝑥𝑦 denotes the number of shortest paths between nodes 𝑥 and 𝑦 in the
network, 𝜎𝑥𝑦 (𝑣) denotes the same passing though 𝑣. If 𝑥 = 𝑦 , then 𝜎𝑥𝑦 = 1.
Betweenness Centrality
• To find the betweenness centrality of node 𝑣 = 3 in graph 𝐺1
• The following matrix is of the form 𝜎𝑥𝑦 𝑣 |𝜎𝑥𝑦
1 3 𝑣
𝝈𝒙𝒚 𝒗 |𝝈𝒙𝒚 1 2 3 4 5
1 1 1 1 1 1
• Thus the betweenness centrality of node 3 = + + + + + = 4
1 2 2 1 2 2
Betweenness Centrality
Betweenness Centrality: Example 2
• Homophily • Heterophily
–On platforms like Facebook and – On platforms like LinkedIn,
Twitter, users often connect professionals often connect with
with others who share similar others from different industries,
political views, hobbies, or roles, or geographic locations to
diversify their networks.
cultural backgrounds.
– Such heterophilous connections can
–This can create echo chambers lead to new opportunities, insights,
where individuals are primarily and collaborations that wouldn't
exposed to information that arise within a more homogenous
reinforces their existing beliefs. network.
Assortative Mixing
• Understanding dynamics of Homophily and Heterophily is crucial for
comprehending how information spreads and how communities form on social
media platforms.
• While homophily can lead to echo chambers, where users are exposed primarily
to similar perspectives, heterophily can facilitate exposure to diverse viewpoints
and foster broader discussions.
• Assortativity (or assortative mixing) in social networks refers to the tendency
of nodes in a network to connect with other nodes that have similar attributes
The phenomenon of particular interest is the assortative mixing by degree
High degree nodes often prefers to connect other high degree nodes
Low degree nodes seen to connect other low degree nodes
Types of Assortativity in Social Networks
• Degree Assortativity:
–Measures whether high-degree nodes tend to connect with other
high-degree nodes, and low-degree nodes with low-degree nodes.
–Example: In a professional networking site, highly connected
influencers tend to connect with other influencers.
• Attribute Assortativity:
–Measures whether nodes with similar categorical or numerical
attributes (e.g., age, gender, profession) tend to be connected.
–Example: In social networks like Facebook, people tend to be friends
with others of the same age, cultural background, or interests.
Assortative Mixing
• Assortative Networks: Nodes prefer to connect with similar nodes.
–Example: Academic collaborations (researchers from the same field
often collaborate).
• Disassortative Networks: Nodes tend to connect with dissimilar nodes.
–Example: Buyer-seller networks (companies connect with customers,
not other companies).
• Applications in Social Network Analysis:
–Understanding information flow and community structure.
–Detecting echo chambers in social media.
–Studying resilience and fragmentation of networks.
Assortative Mixing
A common practice to find similarity between nodes is to use a correlation
coefficient
The Pearson correlation coefficient is a good choice if we want degree-based
assortativity
For two data (degree) distribution 𝑥 and 𝑦, the Pearson correlation coefficient
𝑟𝑥𝑦 is given by
𝑁 σ 𝑥𝑦 − σ 𝑥 σ 𝑦
𝑟𝑥𝑦 =
(𝑁 σ 𝑥 2 − (σ 𝑥)2 )(𝑁 σ 𝑦 2 − (σ 𝑦)2 )
If 𝑟𝑥𝑦 = 1 , then nodes 𝑥 and 𝑦 are perfectly assortative (homophily)
If 𝑟𝑥𝑦 = −1 , then nodes 𝑥 and 𝑦 are perfectly disassortative (heterophily)
If 𝑟𝑥𝑦 = 0 , then nodes 𝑥 and 𝑦 are non-assortative
Structural Metrics – Measuring Relationships
Transitivity
refers to the tendency of individuals who are connected to a common
person to also be connected to each other, forming a triangular
relationship.
This concept measures the density of such triangles in the network,
indicating the extent to which friends of friends are likely to become
friends themselves.
A higher transitivity suggests a more interconnected and cohesive
network.
For example, if person A is connected to person B, and person B is
connected to person C, transitivity examines the likelihood that person
A is also connected to person C.
Transitivity
• Mathematic representation:
– For a transitive relation 𝑅:
𝑣𝑖 𝑣𝑗 𝑣𝑘 and 𝑣𝑗 𝑣𝑘 𝑣𝑖 are
• Triple: an ordered set of three nodes, different triples
• The same members
– connected by two (open triple) edges or • First missing edge 𝑒(𝑣𝑘 , 𝑣𝑖 )
– three edges (closed triple) and second missing 𝑒(𝑣𝑖 , 𝑣𝑗 )
• A triangle can miss any of its three edges 𝑣𝑖 𝑣𝑗 𝑣𝑘 and 𝑣𝑘 𝑣𝑗 𝑣𝑖 are the same
– A triangle has 3 Triples triple
[Global] Clustering Coefficient: Example
Local Clustering Coefficient
• Local clustering coefficient measures transitivity at the node level
– Commonly employed for undirected graphs
– Computes how strongly neighbors of a node 𝑣 (nodes adjacent to 𝑣) are
themselves connected
2 𝑮
Reciprocity