0% found this document useful (0 votes)
12 views121 pages

SMA Module2A

The document covers network measures in social media analytics, focusing on concepts such as node centrality, connectivity, and the properties of real-world networks. It explains various types of walks, paths, and trails in graphs, as well as the importance of bridges and hubs in connecting different communities. Additionally, it discusses the use cases of network measures in identifying influencers, detecting misinformation, and enhancing recommendation systems.

Uploaded by

Mukund Tiwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views121 pages

SMA Module2A

The document covers network measures in social media analytics, focusing on concepts such as node centrality, connectivity, and the properties of real-world networks. It explains various types of walks, paths, and trails in graphs, as well as the importance of bridges and hubs in connecting different communities. Additionally, it discusses the use cases of network measures in identifying influencers, detecting misinformation, and enhancing recommendation systems.

Uploaded by

Mukund Tiwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 121

Social Media

Analytics
Module 2
Network Measures
Overview - Network Measures
 Network Basics - Degree and Degree Distributions,
Paths, Clustering Coefficient, Connected Components
 Node Centrality – Degree centrality, Closeness
Centrality, Betweenness centrality, Edge Betweenness
centrality, Assortativity, Transitivity and Reciprocity,
Similarity.
 Properties of Real-World Networks – High Average
Local Clustering Coefficient, Small-world Property,
Scale-free Property.
 Random Network Model- Degree Distribution of
Random Network, Evolution of a Random Network,
Average Path Length, Clustering Coefficient, Random
Network vs. Real-world Network
Connectivity in Graphs - Adjacent nodes and Incident Edges

Two nodes are adjacent if they are connected via an edge.

Two edges are incident, if they share on end-point

When the graph is directed, edge directions must match for edges to be
incident

An edge in a graph can be traversed when one starts at one of its end-
nodes, moves along the edge, and stops at its other end-node.
Walk, Path, Trail, Tour, and Cycle
Walk: A walk is a sequence of incident edges visited one
after another
– Open walk: A walk does not end where it starts
– Closed walk: A walk returns to where it starts

• Representing a walk:
– A sequence of edges: 𝑒1, 𝑒2, … , 𝑒𝑛
– A sequence of nodes: 𝑣1, 𝑣2, … , 𝑣𝑛

• Length of walk:
the number of visited edges

Length of walk= 8
Trail
• A trail is a walk where no edge is visited more than once and all walk edges are
distinct

• A closed trail (one that ends where it starts) is called a tour or circuit
Path
• A walk where nodes and edges are distinct is called a path and a closed path is called
a cycle
• The length of a path or cycle is the number of edges visited in the path or cycle

• LinkedIn is a good example of a


social network that uses paths and
path lengths to show how you might
connect to other people.
• When you look at someone’s profile
page, it will calculate the shortest
path from you to them, and show
Length of path= 4 you the first person in that path who
might be able to introduce you.
Random walk
• A walk that in each step the next node is selected randomly
among the neighbors

– The weight of an edge can be used to define the probability of


visiting it

– For all edges that start at 𝑣𝑖 the following equation holds


Connectivity
• A node 𝒗𝒊 is connected to node 𝒗𝒋 (or reachable from 𝑣𝑗) if
it is adjacent to it or there exists a path from 𝑣𝑖 to 𝑣𝑗.

• A graph is connected, if there exists a path between any


pair of nodes in it
– In a directed graph, a graph is strongly connected if there exists a
directed path between any pair of nodes
– In a directed graph, a graph is weakly connected if there exists a
path between any pair of nodes, without following the edge
directions

• A graph is disconnected, if it not connected.


Connectivity: Example
Component
• A component in an undirected graph is a connected subgraph,
i.e., there is a path between every pair of nodes inside the
component

• In directed graphs, we have a strongly connected components


when there is a path from 𝑢 to 𝑣 and one from 𝑣 to 𝑢 for every
pair of nodes 𝑢 and 𝑣.

• The component is weakly connected if replacing directed edges


with undirected edges results in a connected component
Component Examples:

3 components 3 Strongly-connected
components
Shortest Path
• Shortest Path is the path between two nodes that
has the shortest length.
–We denote the length of the shortest path between
nodes 𝑣𝑖 and 𝑣𝑗 as 𝑙𝑖,𝑗

• The concept of the neighborhood of a node can be


generalized using shortest paths. An n-hop
neighborhood of a node is the set of nodes that are
within n hops distance from the node.
Facebook uses Graph data structure.
• The whole of Facebook is a collection of multiple nodes interlinked using edges.
• Anything you do in the app, from posting photos to commenting on someone
else’s picture, all these data will be stored as a node linked to you using edges.
• The edges between various nodes represent check-ins, tags, comments, multiple
relationships, and so on.
• Anything connected to you will have a unique identification number known
as fbid. Suppose,
– let’s say, your friend Ravi has fbid 2001 lives in New Delhi and likes Cold Play. So
this is how these will be mapped:
– Your friend: 1692 —> 2001
– lives in: 8021 —> 2001
– likes: 4892 —> 2001
Facebook uses Graph data structure

• So, according to these mapped indexes,


– your fbid - 1692,
– fbid’s of Delhi and Cold Play are 8021 and 4892.
• And these are the attributes that will be used to match further and show
you people who share the same interests as you or your friend.
• The closeness between two people is measured by the average length of
the shortest path between them.
• In other words, these are strongly connected people to you by these
shared attributes, and various graph theory algorithms make these
mutual friends’ suggestions.
Bridges (cut-edges)
• Bridges are edges whose removal will increase the number of connected
components
Bridges and Hubs
Bridge
Bridge - Example

• A bridge is a node (person, page, or organization) that connects two otherwise


disconnected groups in a network.
• 📌 Examples:
– Twitter Influencers: A journalist who follows and is followed by both tech
entrepreneurs and policymakers, bridging the tech community and government
regulators.
– GitHub Contributors: A developer contributing to both open-source AI projects and
cybersecurity projects, connecting two different tech communities.
– LinkedIn Connections: A consultant who has strong professional networks in both
academia and industry, enabling knowledge transfer between them.
• Impact: Bridges provide access to new information and help in spreading ideas
across different communities.
Hub - Example

• A hub is a node with a high degree of connectivity, meaning it has many direct links
to others in the network.
• 📌 Examples:
– Instagram Influencers: A celebrity like Cristiano Ronaldo, who has millions of
followers and serves as a central hub in the sports and lifestyle communities.
– Wikipedia Editors: A prolific Wikipedia editor who frequently contributes to
various topics and interacts with many other editors.
– YouTube Educators: Channels like Khan Academy, which has many subscribers and
influences learners worldwide.
• Impact: Hubs play a crucial role in spreading information quickly within a social
network.
Structural Holes in Facebook

• 🔹 Example 1: Niche Community Gaps


• A Facebook Group for AI Researchers and a Facebook Group for Startup Founders
may have no direct interaction.
• Someone who is an active member in both groups can bridge the structural hole,
sharing AI research updates with startup founders and business opportunities
with AI researchers.
• Impact: They control the flow of knowledge, acting as a key connector.
• 🔹 Example 2: Local vs. Global Communities
• A local travel group may not have direct links with international travel influencers.
• A travel blogger active in both communities can bridge this structural hole, helping
local groups gain global exposure and influencers tap into local trends.
Structural Holes in Instagram
🔹 Example 1: Fashion & Tech Influencers
– Fashion influencers and tech influencers often have separate audiences.
– A tech-savvy fashion influencer (who covers AI-powered fashion trends) can bridge
this gap, creating new content categories and collaborations.
– Impact: New sponsorship and cross-industry trends emerge.
🔹 Example 2: Regional vs. International Brands
– Many local businesses on Instagram (e.g., small clothing brands) don’t directly
connect with global fashion influencers.
– A digital marketer who collaborates with both can bridge this structural hole,
introducing local brands to global audiences and vice versa.
– Impact: This creates brand growth opportunities and monetization pathways.
Network Measures

• Social Network Measures are quantitative metrics used to analyze the


structure, influence, and dynamics of a social network. These measures
help in understanding how individuals (nodes) interact, the importance
of connections (edges), and the overall structure of the network.
• They are broadly categorized into:
1.Centrality Measures – Identify important or influential nodes.
2.Connectivity Measures – Describe the overall network structure.
3.Community Detection Measures – Identify groups or clusters within a
network.
Why Do We Need Measures?

• Who are the central figures (influential individuals) in the network?


– Centrality
• What interaction patterns are common in friends?
– Reciprocity and Transitivity
– Balance and Status
• Who are the like-minded users and how can we find these similar individuals?
– Similarity
• To answer these and similar questions, one first needs to define measures for quantifying
centrality, level of interactions, and similarity, among others.
• Network measures are crucial for understanding the structure, dynamics, and influence within a
social graph. They help in analyzing connections, information flow, influence, and community
formation, which are essential in various domains
Use Cases - Network Measures
• Identifying Influencers & Key Users
✔ Find influential users who drive conversations and trends.
✔ Helps in influencer marketing by identifying high-impact users.
✔ Example: Brands identify top Instagram influencers for product promotions.
• Understanding Information Spread & Virality
✔ Predict how news, memes, or trends spread through social media.
✔ Analyze the role of highly connected users in spreading viral content.
✔ Example: Twitter analyzes retweet networks to predict trending topics.
• Community Detection & Targeted Marketing
✔ Identify clusters of users with similar interests or behaviors.
✔ Help businesses target specific demographics for ads.
✔ Example: Facebook’s algorithm segments users into interest-based groups for ad
targeting.
Use Cases - Network Measures
• Fake Account & Bot Detection
✔ Detect anomalous behavior by analyzing social network structures.
✔ Find spam bots and fake accounts spreading misinformation.
✔ Example: Twitter uses network analysis to detect bot farms spreading fake news.
• Improving Recommendation Systems
✔ Enhance friend suggestions and content recommendations.
✔ Help personalize feeds based on network-based user interactions.
✔ Example: LinkedIn suggests new connections based on shared network structures.
• Crisis & Emergency Response
✔ Track information flow during disasters or emergencies.
✔ Identify key sources of reliable information.
✔ Example: Social media analysis during COVID-19 helped track misinformation.
Use Cases - Network Measures
• Political & Social Movement Analysis
✔ Understand how social movements gain traction.
✔ Analyze political campaigns, protests, and activism.
• Cybersecurity & Misinformation Control
✔ Detect coordinated disinformation campaigns.
✔ Prevent cyber-attacks on social media platforms.
✔ Example: Facebook uses social network measures to detect fake news
networks.
• Customer Engagement & Sentiment Analysis
✔ Find highly engaged communities and brand advocates.
✔ Measure customer sentiment through network interactions.
Network Measures: Classification

Microscopic Macroscopic
Degree Degree Distribution
Local clustering coefficient
Path and Diameter
Node centrality
Edge density
Mesoscopic
Connected components
Global clustering coefficient
Giant components Reciprocity and Assortativity
Group centralities
Degree of a Node
For an undirected, unweighted network, degree of a
node v is defined as the number of nodes in the network
to which there is an edge from the node v.
In other words, for an undirected, unweighted network,
1 3 1 3
degree of a node v is the number of edges of the
network that are incident on the node v. 5
5
Putting differently, for an undirected, unweighted 2
2
network, degree of a node v is the number of neighbours 4
4
of the node v.
G1
In graph G1, degrees of the nodes 1 through 5 are 2, 3, 4, G2
2, 3.
In graph G2, degrees of the nodes 1 through 5 are 3, 5, 7,
5, 4.
Note: A self-loop is counted twice in evaluating degree of
a node.
Weighted Degree of a Node

3
For an undirected, weighted network, the weighted degree
1 3 of a node is defined as the sum of weights of the edges
6
1 incidents on that node
6 2 9 7 5 For the weighted undirected graph G3, the weighted degrees
4 of the nodes are as follows:
2 7
4
5  Weighted degree of node 1 is 11
 Weighted degree of node 2 is 22
G3  Weighted degree of node 3 is 26
 Weighted degree of node 4 is 16
 Weighted degree of node 5 is 22
Indegree and Outdegree of a Node

In a directed network, the indegree of a node is


1 3 defined as the number of incoming edges to the node
5 In a directed network, the outdegree of a node is
2 defined as the number of outgoing edges from the
4 node
For the directed graph G4, the indegrees and
G4
outdegrees of the nodes are as follows:
 Indegrees of the nodes 1 through 5 are 2, 2, 3, 1, 1
 Outdegrees of the nodes 1 through 5 are 1, 3, 1, 2, 2
Sum of the Degrees…
For an unweighted, undirected network, the sum of the
degrees of the nodes in a graph is twice the number of edges in
the graph
Sum of the weighted degrees of the nodes in an undirected
weighted graph is twice the sum of weights of the edges in the
graph
In a directed network, the sum of indegrees is same as the sum
of outdegrees.
Number of odd-degree nodes in an undirected network is
always even.
Degree Distribution
Degree distribution of a network is the (probability) distribution of the degrees of
nodes over the whole network.
Let a network has 𝑁 = |𝑉| nodes.
Let 𝑃𝑘 denotes the probability that a randomly chosen node has degree 𝑘.
𝑁𝑘
Then, 𝑃𝑘 = , where 𝑁𝑘 refers to the number of nodes of degree 𝑘 in the
𝑁
network.
The distribution 𝑘, 𝑃𝑘 represents the degree distribution of the concerned
graph,

The mean degree, denoted 𝑘 , is given by 𝑘 = σ𝑘 𝑘. 𝑃𝑘 .


Cumulative Degree Distribution

Cumulative degree distribution (CDD) is given by the fraction of


nodes with degree smaller than 𝑘.
σ𝑘′<𝑘 𝑁𝑘′
In other words, it is the distribution 𝑘, 𝐶𝑘 , where 𝐶𝑘 =
𝑁

Complementary cumulative degree distribution (CCDD) is given by


the fraction of nodes with degree greater than or equal to 𝑘.
In other words, it is the distribution 𝑘, 𝐶𝐶𝑘 , where 𝐶𝐶𝑘 = 1 − 𝐶𝑘
Degree Distribution: Example
 For the graph G1, we have the following:
𝑁 = 5, and 𝑁1 = 0, 𝑁2 = 2, 𝑁3 = 2, 𝑁4 = 1.
 The above implies, 𝑃1 = 0, 𝑃2 = 0.4, 𝑃3 = 0.4, 𝑃4 = 0.2,
1 3 𝐶1 = 0, 𝐶2 = 0.4, 𝐶3 = 0.8, 𝐶4 = 1.0,
and 𝐶𝐶1 = 1.0, 𝐶𝐶2 = 0.6, 𝐶𝐶3 = 0.2 𝐶𝐶4 = 0.0.
5  Then the degree distribution, Cumulative degree distribution and
Complementary cumulative degree distribution are as follows:
2
4

G1
Importance of Degree and Degree Distribution in a Social Graph
• Identifying Influential Nodes
– High-degree nodes (hubs) indicate influential individuals in social networks (e.g., celebrities,
key opinion leaders).
– Helps in viral marketing, influence maximization, and rumor spreading analysis.
• Network Connectivity & Robustness
– Determines how well the network is connected.
– Helps in assessing the impact of removing key nodes (network resilience analysis).
• Community Detection & Social Structure
– Degree distribution provides insights into network structure (random, scale-free, or small-
world).
– Helps in detecting tightly connected groups (e.g., friend circles, business clusters).
• Epidemic Modeling & Information Spread
– Determines how diseases, information, or trends propagate in a network.
– Networks with a power-law degree distribution (scale-free networks) show faster spreading.
Importance of Degree and Degree Distribution in a Social Graph
• Anomaly Detection & Fraud Detection
–Unusual degree distributions can indicate anomalies (e.g., fake accounts in social
media).
–Useful in cybersecurity, bot detection, and financial fraud analysis.
• Link Prediction & Recommendation Systems
–Nodes with similar degrees are likely to form new connections.
–Used in friend recommendation (Facebook, LinkedIn), content suggestions
(YouTube, Netflix).
• Graph Sampling & Network Evolution
–Degree distribution helps in generating representative samples of large social
networks.
–Useful for studying the evolution of networks over time.
Twitter
• Degree Analysis
–In-Degree: Number of followers a user has.
–Out-Degree: Number of people a user follows.
–High in-degree (e.g, Anand Mahindra) → Influencer, central node.
–High out-degree (e.g., Random User 1) → Potential bot/spammer.

User In-Degree (Followers) Out-Degree


(Following)
Anand Mahindra 180M 200
Random User 1 150 500
Random User 2 20 30
Twitter
• Degree Distribution
– If we plot the degree distribution (log-log scale), we often see:
• Power-Law Distribution (Scale-Free Network)
– A few nodes (celebrities, brands) have extremely high degrees.
– Most nodes have low degrees (normal users).
• Real Example:
– Studies on Twitter have shown that 80% of users have fewer than 100 followers,
but a small fraction (0.01%) have millions.
• Applications
– Marketing: Identifying influencers with high in-degree for viral campaigns.
– Epidemiology: Predicting the spread of misinformation or disease.
– Anomaly Detection: Finding bots or fake accounts with unusual degree
distributions.
FaceBook
• Degree Analysis
• Degree: Number of friends a user has (since Facebook friendships are bidirectional, this is an
undirected network).
• High-degree users (e.g., Zuckerberg) → Well-connected, influential users.
• Low-degree users (e.g., elderly individuals) → Less active users or new accounts.
• 2. Degree Distribution
• Most Facebook users have between 100-500 friends.
• A small fraction of users have thousands of friends (influencers, highly social users).
• The degree distribution often follows a right-skewed distribution (long tail).

User Degree (Number of Friends)


Mark Zuckerberg 5,000 (Facebook friend limit)
College Student 300
Elderly User 50
FaceBook
• Degree Distribution
– Most Facebook users have between 100-500 friends.
– A small fraction of users have thousands of friends (influencers, highly social users).
– The degree distribution often follows a right-skewed distribution (long tail).
• Real Example:
– A study by Facebook found that the average number of friends per user is around
338, but 50% of users have fewer than 200 friends, showing a highly imbalanced
degree distribution.
• Applications
– Friend Recommendations: Suggesting friends using high-degree nodes (common
connections).
– Community Detection: Identifying social circles and close-knit groups.
– Misinformation Tracking: Understanding how fake news spreads based on user
connectivity.
Some Graph Preliminaries…
In an undirected network,
Two nodes are called adjacent if they are linked by an edge.
Two edges are called incident if they share a common end-node.
𝑒2
1 3  In graph 𝐺1 , the nodes 1 and 2 are adjacent, 1 and 3 are adjacent, and so on.
𝑒6
𝑒1 𝑒3 In graph 𝐺1 , the edges 𝒆𝟏 and 𝒆𝟐 are incident, 𝒆𝟏 and 𝒆𝟑 are incident, and so
𝑒5 5
𝑒4 on
2 𝑒7 A walk in a network is an alternating sequence of nodes and edges, where
4 every consecutive node pair is adjacent, and every consecutive edge pair is
𝐺1 incident.
A walk may pass through a node or an edge more than once. Length of a walk
is the number of edges in the sequence.
In graph 𝐺1 , the sequence {3, 𝒆𝟑 , 2, 𝒆𝟒 , 5, 𝒆𝟔 , 3, 𝒆𝟓 , 4, 𝒆𝟕 , 5, 𝒆𝟒 , 2} is a walk
of length 6.
For a simple graph, the edges from the above sequence may be omitted.
Some Graph Preliminaries…
 A walk in a network is called
 a closed walk if the last node in the sequence is same as the first node;
else it is called an open walk.
𝑒2  a trail if the sequence has no repeated edge.
1 3
𝑒6  a path if the sequence has neither a repeated edge nor a repeated node. In
𝑒1 𝑒3
𝑒5 other words, a path is an open trail having no repeated nodes.
5
𝑒4  a cycle if the sequence has all the edges distinct, and all the nodes, except
2 𝑒7 the first and the last nodes, are also distinct. In other words, a cycle is a
4 closed path with the only repetition of the first and the last nodes in the
𝐺1 sequence.
 In graph 𝐺1 ,
 the sequence {2,5, 4, 3, 2, 1, 3, 4, 5, 2} is a closed walk.
 the sequence {5, 4, 3, 2, 1, 3} is a trail.
 the sequence {5, 4, 3, 2, 1} is a path.
 the sequence {5, 4, 3, 2, 5} is a cycle.
Some Graph Preliminaries…
The distance between nodes 𝑣𝑖 and 𝑣𝑗 in a graph is defined as the
length of the shortest path between the nodes 𝑣𝑖 and 𝑣𝑗 .

𝑒2
In graph 𝐺1 , the distance between 1 and 4 is 2, the same between 1
1 3 and 5 is also 2.
𝑒6
𝑒3
𝑒1 𝑒5 The diameter of a network is defined as the maximum distance
5
𝑒4 between any pair of nodes in the network.
2 𝑒7
4 The diameter of the graph 𝐺1 is 2.
𝐺1 For a graph 𝐺 with 𝑛 nodes, the average path length 𝑙𝐺 is defined as
the average number of steps along the shortest paths for all
possible pairs of nodes in the network.
σ𝑖≠𝑗 𝑑𝑖𝑗
𝑙𝐺 = , where 𝑑𝑖𝑗 is distance between nodes 𝑣𝑖 and 𝑣𝑗
𝑛(𝑛−1)
Some Graph Preliminaries…

The density of a graph 𝐺(𝑉, 𝐸), denoted 𝜌(𝐺), is defined as the ratio of the number
of edges in the graph to the total number of possible edges in the network.
Mathematically,
2 × |𝐸|
𝜌 𝐺 =
𝑉 × ( 𝑉 − 1)
For the graph 𝐺1 , the average path length is:
2 × (1 + 1 + 2 + 2 + 1 + 2 + 1 + 1 + 1 + 1) 26
= = 1.3
5×4 20
For the graph 𝐺1 , the network density is:
2×7
= 0.7
5×4
Importance of Density in Social Graphs

• Information Dissemination: Denser networks facilitate faster


information spread due to more available pathways
• Social Cohesion: High-density networks promote trust and
shared norms, enhancing collaboration among members
• Resource Accessibility: Lower-density networks may contain
structural holes—gaps in the network—that, when bridged,
provide access to new resources and information.
Instagram

• Network Structure: Instagram's overall network exhibits low density,


meaning that while users have numerous connections, the vast
number of possible connections results in a sparse network.
• Community Clusters: Within this sparse network, densely connected
subgroups or communities exist where users interact more
frequently, facilitating rapid information dissemination and strong
social cohesion.
• Understanding these density dynamics on Instagram helps in
analyzing how information spreads, how communities form, and the
overall structure of user interactions on the platform.
Clusters in Social Networks
In social networks, we often find
 tightly-knit groups here and there
 less dense ties away from these
groups

Indicative of friendship structures in social


media

Measure used to capture these


phenomena
 Local clustering coefficient
 Global clustering coefficient
A Facebook Friendship Network Example
https://blog.dhimmel.com/friendship-network/
Local Clustering Coefficient
In a network 𝐺(𝑉, 𝐸), the local clustering coefficient of node 𝑣𝑖 ∈ 𝑉,
denoted 𝐶𝑖 , is defined as
𝑒2
1 3 𝐶𝑖
𝑒6 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑑𝑔𝑒𝑠 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟𝑠 𝑜𝑓 𝑣𝑖
𝑒1
𝑒3 =
𝑒5 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑒𝑑𝑔𝑒𝑠 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟𝑠 𝑜𝑓 𝑣𝑖
5
𝑒4
2 𝑒7 In graph 𝐺1 ,
4  the local clustering coefficient of node 2 is 2Τ3
𝐺1  the local clustering coefficient of node 3 is 3Τ6 i.e. 1Τ2
 and so on…
Local Clustering Coefficient: Example

• Thin lines depict connections to neighbors


• Dashed lines are the missing link among neighbors
• Solid lines indicate connected neighbors
– When none of neighbors are connected 𝐶 = 0
– When all neighbors are connected 𝐶 = 1
Global Clustering Coefficient
The global clustering coefficient 𝐶 of a network 𝐺 is
3 defined as
2 3
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑜𝑠𝑒𝑑 𝑡𝑟𝑖𝑝𝑙𝑒𝑡𝑠 𝑖𝑛 𝐺
𝐶=
2 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑖𝑝𝑙𝑒𝑡𝑠 (𝑜𝑝𝑒𝑛 & 𝑐𝑙𝑜𝑠𝑒𝑑) 𝑖𝑛 𝐺

1
Closed Triplet
In the graph 𝐺2 , there is three closed triplet viz., 1
[1,2,3], [2,3,1], and [3,1,2].
3 In the graph 𝐺2 , there is five closed triplets, viz.,
(1,2,3), (2,3,1), (3,1,2), (2,1,4), and (3,1,4).
2 4
Thus, the global clustering coefficient of the graph 𝐺2 is
3Τ .
1 5 𝐺2
Open Triplet
Global Clustering Coefficient

The global clustering coefficient may also be written as


3 × 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑖𝑎𝑛𝑔𝑙𝑒𝑠 𝑖𝑛 𝐺
𝐶=
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑖𝑝𝑙𝑒𝑡𝑠 (𝑜𝑝𝑒𝑛 & 𝑐𝑙𝑜𝑠𝑒𝑑) 𝑖𝑛 𝐺

In other words, if 𝐴 = (𝐴𝑖𝑗 ) is the adjacency matrix of the


graph 𝐺, then
σ𝑖,𝑗,𝑘(𝐴𝑖𝑗 𝐴𝑗𝑘 𝐴𝑘𝑖 )
𝐶= σ𝑖 𝑘𝑖 (𝑘𝑖 −1)
where 𝑘𝑖 = σ𝑗 𝐴𝑖𝑗
local clustering coefficient -Facebook Friend Networks
• It quantifies the likelihood that two neighbors of a given node are also neighbors of each other.
• On Facebook, users form friendship connections, creating a vast social network. Studies have
shown that these networks exhibit high local clustering coefficients, meaning that if User A is
friends with User B and User C, there's a significant probability that User B and User C are also
friends.
• This high clustering reflects the tendency of individuals to form tightly knit groups, such as
families, colleagues, or interest-based communities.
• Significance:
• Community Detection: High local clustering coefficients indicate the presence of cohesive
subgroups within the larger network.
• Information Dissemination: In highly clustered networks, information can spread rapidly within
these subgroups due to the dense interconnections.
• Understanding the local clustering coefficient in platforms like Facebook helps in analyzing the
structure and dynamics of online social interactions.
LinkedIn Professional Networks
• On LinkedIn, professionals connect based on shared work experiences,
educational backgrounds, or industry affiliations. This often leads to the
formation of tightly-knit clusters within the broader network.
• High Local Clustering: Consider a group of colleagues working in the same
department. If one member connects with another, it's likely that this new
connection is also connected to other members of the group, resulting in a high
local clustering coefficient.
• Low Local Clustering: A professional who connects with individuals from diverse
industries or regions may have connections that are not interconnected, leading
to a lower local clustering coefficient.
• Understanding the local clustering coefficient in platforms like LinkedIn helps in
analyzing the structure and dynamics of professional interactions.
Global clustering coefficient
• measures the overall tendency for nodes in a network to cluster together,
providing insight into the prevalence of tightly knit groups within the entire
network.
• In platforms like Facebook, users often form tightly connected groups based on
shared interests, locations, or affiliations.
• Studies have shown that such social networks exhibit high global clustering
coefficients, indicating a significant presence of these tightly knit communities.
• This high clustering reflects the tendency of individuals to form groups where
members are closely connected, facilitating efficient information dissemination
and reinforcing social cohesion within the platform
• Understanding the global clustering coefficient in social networks helps to analyze
the overall cohesiveness of the network, the efficiency of information spread, and
the robustness of the network's structure.
Global clustering coefficient
• Instagram, allows users to follow each other, creating a complex network of
connections.
• An analysis of an individual's Instagram network revealed a clustering coefficient of
approximately 0.22. This indicates a moderate level of clustering, suggesting that while
there are some tightly knit groups within this user's network, many connections remain
unconnected to each other
• Implications of a Low Global Clustering Coefficient:
– Dispersed Network Structure: suggests that the network lacks tightly knit groups,
meaning that the nodes' neighbors are less likely to be connected to each other.
– Slower Information Dissemination: information may spread more slowly, as there
are fewer densely connected clusters to facilitate rapid dissemination.
– Weaker Community Formation: The absence of tightly knit communities might
imply weaker social cohesion and a lower likelihood of collective action among
members.
Connected Components

In a typical social network, there are loose links that connects the tightly-knit
clusters
In an undirected network 𝐺, two nodes 𝑣𝑖 and 𝑣𝑗 are said to be connected if
there exists a path between 𝑣𝑖 and 𝑣𝑗 .
An entire network is said to be connected if any pair of nodes in the network is
connected.
Connected subnetworks of a network, if exist, are called components of the
network.
In real-world networks, there often exist one giant component (consuming
major chunk of nodes) and many smaller components.
In a network, connectedness shows resilience to link breakdowns.
Finding Connected Components

𝑛=0 The network 𝐺 with all nodes coloured red

𝑛=1 Choose a random node 𝑣, colour it blue, and set 𝑛 to 1


𝑣

Apply BFS from node 𝑣, and colour with blue all the
𝑛=3 nodes reached thereof, and increment 𝑛 each time
𝑣
No more node can be reached from 𝑣 using BFS. We
𝑛=4 get a component in blue.
𝑣
𝑣
𝑛=4 Since 𝑛 ≠ 9, we choose a red node as 𝑣, repeat the
steps above to find other components
Centrality in a Network
Influential players often play central roles in a network
Defining/Identifying influential players always remain hard
 Some players attract limelight
 Some others play behind the scene
 Many others do important linkage
To identify influential players, we require
 to define a notion of influence
 to device measure that can capture that influence
 In social network analysis, centrality refers to a set of metrics used to determine
the importance or influence of individual nodes within a network.
 These metrics help identify key actors who hold significant positions in terms of
connectivity, information flow, or control within the network.
Centrality Measures in a Social Graph
• Degree Centrality: Counts the number of direct connections a node
has, indicating its immediate influence.
• Closeness Centrality: Calculates the average shortest path from a
node to all other nodes, reflecting how quickly it can access the
entire network.
• Betweenness Centrality: Measures the number of times a node acts
as a bridge along the shortest path between two other nodes,
highlighting its role in information flow.
• Eigenvector Centrality: Assigns relative scores to nodes based on
the principle that connections to high-scoring nodes contribute
more to a node's score, capturing influence within the network.
Centrality Measures in a Social Graph

• PageRank: An adaptation of eigenvector centrality used by


search engines, ranking nodes based on the quality and
quantity of their connections.
• Katz Centrality: Considers the total number of walks between
nodes, with longer walks being penalized, to measure influence
that accounts for both direct and indirect connections.
• Harmonic Centrality: A variant of closeness centrality that uses
the harmonic mean of distances, providing a more robust
measure in disconnected networks.
Degree Centrality
• Degree centrality: ranks nodes with more connections higher in terms of
centrality

• 𝑑𝑖 is the degree (number of friends) for node 𝑣𝑖


–i.e., the number of length-1 paths (can be generalized)
In this graph, degree centrality for node
𝑣1 is 𝑑1=8 and for all others is 𝑑𝑗 =
1, 𝑗 ≠ 1
Degree Centrality in Directed Graphs

• In directed graphs, we can either use the in-degree, the out-degree, or


the combination as the degree centrality value:
• In practice, mostly in-degree is used.

𝑑𝑖𝑜𝑢𝑡 is the number of outgoing links for node 𝑣𝑖


Normalized Degree Centrality

• Normalized by the maximum possible degree

• Normalized by the maximum degree

• Normalized by the degree sum


Degree Centrality (Directed Graph)Example
Node In-Degree Out-Degree Centrality Rank
B C F
A 1 3 1/2 1
B 1 2 1/3 3
C 2 3 1/2 1
D
D 3 1 1/6 5
A
E 2 1 1/6 5
F 2 2 1/3 3
E G
G 2 1 1/6 5

Normalized by the maximum possible degree


Degree Centrality (undirected Graph) Example

Node Degree Centrality Rank


B C F
A 4 2/3 2

B 3 1/2 5

C 5 5/6 1
D
D 4 2/3 2
A
E 3 1/2 5

F 4 2/3 2
E G
G 3 1/2 5
Degree Centrality – Normalization
• Centrality of the simplest kind
• In a sense, captures the popularity of a player within a network 1 3
• Quantifies the direct influence of a node on its local neighbourhood
• The degree centrality 𝐶𝑑 𝑣 of a node 𝑣 in a network 𝐺(𝑉, 𝐸) is defined as: 5
2
𝐝𝐞𝐠(𝐯)
• 𝐂𝐝 𝐯 = 4
𝐦𝐚𝐱 𝐝𝐞𝐠(𝐮)
𝐮∈𝐕 𝑮𝟏
• Particularly useful for marketing scenarios, wherein the detected influential user can
promote a product/service across her followers
• Degree centrality of the nodes 1 through 5 in network 𝐺1 are 2Τ4 , 3Τ4 , 4Τ4 , 2Τ4 , and 3Τ4 ,
respectively; i.e., 0.5, 0.75, 1.0, 0.5, and 0.75, respectively. So, node 3 is most central
according to degree centrality measure.
Degree Centrality
• Influencer Identification:
– Users with high degree centrality are often considered influencers due to their
extensive direct connections. Identifying these users is crucial for targeted marketing
and outreach campaigns.
• Information Dissemination:
– High degree centrality indicates a user's potential to rapidly disseminate information
across the network, making them pivotal in spreading news, trends, or viral content.
• Community Detection:
– Analyzing degree centrality helps in identifying central figures within communities or
subgroups, providing insights into the structure and cohesion of these clusters.
• Network Robustness Assessment:
– Understanding which nodes have high degree centrality aids in assessing the
network's vulnerability. The removal of such central nodes can significantly impact
the network's connectivity and information flow.
Closeness Centrality
• A means for detecting nodes that can spread information very efficiently through a
graph
• The measure is useful in
– Examining/restricting the spread of fake news/misinformation in social media
– Examining/restricting the spread of a disease in epidemic modelling
– Controlling/restricting the flow of vital information and resources within an
organization (a terrorist network, for example)
• The closeness centrality 𝐶(𝑣) of a node 𝑣 in a network 𝐺(𝑉, 𝐸) is defined as
𝑉 −1
• 𝐶 𝑣 = σ𝑢∈𝑉∖{𝑣} 𝑑(𝑢,𝑣)

• Where 𝑑(𝑢, 𝑣) denotes the distance of node 𝑢 from node 𝑣


• The measure indicates how close a node from the rest of the network
Closeness Centrality
• The intuition is that influential/central nodes can quickly reach other
nodes

• These nodes should have a smaller average shortest path length to


others

Closeness centrality:
Closeness Centrality
In graph 𝐺1 , the closeness centrality for the nodes are as follows
5−1 4
𝐶 1 = = = 0.67
1 3 1+1+2+2 6
5−1 4
𝐶 2 = = = 0.80
1+1+2+1 5
5 5−1 4
𝐶 3 = = = 1.0
1+1+1+1 4
2 5−1 4
𝐶 4 = = = 0.67
4
2+2+1+1 6
5−1 4
𝐶 1 = = = 0.80
𝑮𝟏 2+1+1+1 5
Clearly, node 3 is most central according to closeness centrality
measure
Closeness Centrality: Example 1
Closeness Centrality: Example 2 (Undirected)

Closeness
Node A B C D E F G H I D_Avg Centrality Rank
A 0 1 2 1 2 1 2 3 2 1.750 0.571 1
B 1 0 1 2 1 2 3 4 3 2.125 0.471 3
C 2 1 0 3 2 3 4 5 4 3.000 0.333 8
D 1 2 3 0 1 2 3 4 3 2.375 0.421 4
E 2 1 2 1 0 3 4 5 4 2.750 0.364 7
F 1 2 3 2 3 0 1 2 1 1.875 0.533 2
G 2 3 4 3 4 1 0 3 2 2.750 0.364 7
H 3 4 5 4 5 2 3 0 1 3.375 0.296 9
I 2 3 4 3 4 1 2 1 0 2.500 0.400 5
Closeness Centrality: Example 3 (Directed)

Closeness
Node A B C D E F G H I D_Avg Centrality Rank
A 0 1 2 3 2 2 1 3 3 2.125 0.471 1
B 3 0 1 2 1 4 4 2 3 2.500 0.400 2
C 4 5 0 7 6 3 5 1 2 4.125 0.242 9
D 1 2 3 0 3 3 2 4 5 2.875 0.348 3
E 2 3 4 1 0 4 3 5 5 3.375 0.296 6
F 1 2 3 4 3 0 2 4 4 2.875 0.348 4
G 2 3 4 5 4 1 0 5 2 3.250 0.308 5
H 4 4 5 6 5 2 4 0 1 3.875 0.258 8
I 2 3 4 5 4 1 4 5 0 3.500 0.286 7
Simple Computation – Closeness Centrality

• The number next to each node is the distance from that node to
the square red node as measured by the length of the shortest
path.
• The green edges illustrate one of the two shortest paths
between the red square node and the red circle node.
• The closeness of the red square node is therefore
5/(1+1+1+2+2) = 5/7.
Closeness Centrality - Significance
• In social network analysis, Nodes with high closeness centrality can efficiently
disseminate information or influence others due to their strategic positioning.
• Facebook:
– A user connected to various groups and communities can quickly share information
across the platform.
– For instance, community managers or users active in multiple interest groups often
have high closeness centrality, enabling them to disseminate news or updates
efficiently.
• LinkedIn:
– Professionals connected across diverse industries and regions can leverage their high
closeness centrality to access and share job opportunities, industry insights, or
professional recommendations swiftly.
– Recruiters or industry leaders often exhibit high closeness centrality, facilitating
efficient communication within the professional network.
Closeness Centrality - Significance
• Instagram:
– Influencers who engage with various communities, such as fashion, travel, and
technology, possess high closeness centrality.
– Their diverse connections allow them to rapidly spread trends, brand promotions,
or messages across different audience segments.
• Twitter:
– A user who actively engages with various communities and topics can achieve high
closeness centrality.
– For instance, a journalist covering diverse subjects may connect with multiple user
groups, enabling them to quickly spread information across the platform.
– Research has utilized centrality measures to identify social media influencers
within specific industries on Twitter.
Betweenness Centrality
• A measure to compute how central a node is in between paths of the network
• A measure to compute how many (shortest) paths of the network pass through the
node
• Useful in identifying
– the articulation points, i.e., the points in a network which, if removed, may
disconnect the network
– The super spreaders in analyzing disease spreading in epidemiology
– the suspected spies in security networks
• The betweenness centrality 𝐶𝐵 (𝑣) of a node 𝑣 in a network 𝐺(𝑉, 𝐸) is defined as
𝜎𝑥𝑦 (𝑣)
• 𝐶𝐵 𝑣 = σ𝑥,𝑦∈𝑉∖{𝑣}
𝜎𝑥𝑦
• where 𝜎𝑥𝑦 denotes the number of shortest paths between nodes 𝑥 and 𝑦 in the
network, 𝜎𝑥𝑦 (𝑣) denotes the same passing though 𝑣. If 𝑥 = 𝑦 , then 𝜎𝑥𝑦 = 1.
Betweenness Centrality
• To find the betweenness centrality of node 𝑣 = 3 in graph 𝐺1
• The following matrix is of the form 𝜎𝑥𝑦 𝑣 |𝜎𝑥𝑦
1 3 𝑣
𝝈𝒙𝒚 𝒗 |𝝈𝒙𝒚 1 2 3 4 5

5 1 0|1 0|1 -- 1|1 1|2


2 2 0|1 0|1 -- 1|2 0|1
4
3 -- -- -- -- --
𝐺1
4 1|1 1|2 -- 0|1 0|1

5 1|2 0|1 -- 0|1 0|1

1 1 1 1 1 1
• Thus the betweenness centrality of node 3 = + + + + + = 4
1 2 2 1 2 2
Betweenness Centrality
Betweenness Centrality: Example 2

Node Betweenness Centrality Rank


A 16 + 1/2 + 1/2 1
B 7+5/2 3
C 0 7
D 5/2 5
E 1/2 + 1/2 6
F 15 + 2 1
G 0 7
H 0 7
I 7 4
The betweenness
centrality (BWC) of a
vertex is the sum
of the fraction of
shortest paths going
through the vertex
between any two
vertices, considered
over all pairs of
vertices.
Significance of Betweenness Centrality in Social Networks
• Information Control: Individuals with high betweenness centrality can
influence the spread of information, as they connect disparate groups
and can control or filter the information passing through them.
• Community Bridging: These nodes often serve as connectors between
different social circles or communities, enabling interactions and
collaborations that might not occur otherwise.
• Network Resilience: Edges with high betweenness centrality are crucial
for maintaining network connectivity. Their removal can disrupt
communication between significant portions of the network, indicating
potential vulnerabilities.
Real World Examples
• Social Media Influencers:
• On platforms like Twitter, users with high betweenness centrality often act as hubs
connecting various user groups.
• They play a pivotal role in disseminating information across different communities,
amplifying messages beyond their immediate followers.
• Facebook:
• In a study analyzing a personal Facebook network, it was observed that certain individuals
connected otherwise separate groups, such as family and coworkers.
• These individuals had high betweenness centrality, acting as bridges and facilitating
information flow between distinct social circles.
• Instagram:
• Research focusing on identifying key opinion leaders within Instagram networks found
that users with high betweenness centrality played crucial roles in disseminating
information across various communities.
• These influencers connected diff user groups, enhancing the spread of content & ideas.
Betweenness Centrality: Variants

The edge betweenness centrality refers to the fraction of all pairs of


shortest paths of the network that pass through a given edge.
Computation is more-or-less similar to that of betweenness
centrality
The flow betweenness centrality the fraction of all paths (not
necessarily the shortest paths) of the network that passes through a
given edge.
Clearly, flow betweenness centrality measure is computationally
expensive than betweenness or edge betweenness centrality
measures.
Edge Betweeness Centrality
A
/\
Steps to Calculate Edge Betweenness
B C
Centrality:
|X|
1.Identify All Pairs of Nodes:
D E
1.List all unique pairs of nodes in the graph:
\/
1. (A, B)
F
2. (A, C)
3. (A, D)
5 nodes (A, B, C, D, E) and 6 edges 4. (A, E)
5. (B, C)
Edges: AB, AC, BC, BD, CE, DE 6. (B, D)
7. (B, E)
8. (C, D)
9. (C, E)
10.(D, E)
2. Determine Shortest Paths:
1. For each pair, identify all shortest paths connecting them. In
A this example, assume all edges have equal weight.
/\ 2. (A, B): Shortest path is [A-B]
B C 3. (A, C): Shortest path is [A-C]
|X| 4. (A, D): Shortest paths are [A-B-D] and [A-C-E-D]
D E 5. (A, E): Shortest paths are [A-C-E] and [A-B-D-E]
\/ 6. (B, C): Shortest paths are [B-A-C] and [B-D-E-C]
F 7. (B, D): Shortest path is [B-D]
8. (B, E): Shortest paths are [B-D-E] and [B-A-C-E]
9. (C, D): Shortest paths are [C-A-B-D] and [C-E-D]
10. (C, E): Shortest path is [C-E]
11. (D, E): Shortest path is [D-E]
3. Count Shortest Paths Passing Through Each Edge:
For each edge, count how many of the shortest paths identified in step 2 pass
through it.
•Edge AB:
A • Passes through: (A, B), (A, D), (B, C), (B, E)
• Total: 4 paths
•Edge AC:
/\ • Passes through: (A, C), (A, E), (B, C)
• Total: 3 paths
B C •Edge BC:
• Passes through: (A, D), (B, C), (C, D)
|X| • Total: 3 paths
•Edge BD:
D E • Passes through: (A, D), (B, D), (B, E)
• Total: 3 paths
\/ •Edge CE:
• Passes through: (A, E), (C, E), (C, D)
F • Total: 3 paths
•Edge DE:
• Passes through: (A, D), (A, E), (B, E), (C, D), (D, E)
• Total: 5 paths
Calculate Edge Betweenness
•Edge AB:
Centrality:
• (A, B): 1 path → Contribution: 1
• The betweenness centrality
• (A, D): 2 paths, 1 passes through AB → Contribution: 1/2
for each edge is the sum of
• (B, C): 2 paths, 1 passes through AB → Contribution: 1/2
the fractions of shortest
• (B, E): 2 paths, 1 passes through AB → Contribution: 1/2
paths between all pairs of
• Total Betweenness: 1 + 1/2 + 1/2 + 1/2 = 2.5
nodes that pass through that
•Edge AC:
edge.
• (A, C): 1 path → Contribution: 1
• If there are multiple shortest
• (A, E): 2 paths, 1 passes through AC → Contribution: 1/2
paths between a pair of
• (B, C): 2 paths, 1 passes through AC → Contribution: 1/2
nodes, each path is assigned
• Total Betweenness: 1 + 1/2 + 1/2 = 2
an equal fraction.
•Edge BC:
• (A, D): 2 paths, 1 passes through BC → Contribution: 1/2
• (B, C): 2 paths, 1 passes through BC → Contribution: 1/2
• (C, D): 2 paths, 1 passes through BC → Contribution: 1/2
• Total Betweenness: 1/2 + 1/2 + 1/2 = 1.5
•Edge BD:
• (A, D): 2 paths, 1 passes through BD → Contribution: 1/2
• (B, D): 1 path → Contribution: 1
• (B, E): 2 paths, 1 passes through BD → Contribution: 1/2
• Total Betweenness: 1/2 + 1 + 1/2 = 2
•Edge CE:
• (A, E): 2 paths, 1 passes through CE → Contribution: 1/2
• (C, E): 1 path → Contribution: 1
• (C, D): 2 paths, 1 passes through CE → Contribution: 1/2
• Total Betweenness: 1/2 + 1 + 1/2 = 2
•Edge DE:
• (A, D): 2 paths, 1 passes through DE → Contribution: 1/2
• (A, E): 2 paths, 1 passes through DE → Contribution: 1/2
• (B, E): 2 paths, 1 passes through DE → Contribution: 1/2
• (C, D): 2 paths, 1 passes through DE → Contribution: 1/2
• (D, E): 1 path → Contribution: 1
• Total Betweenness: 1/2 + 1/2 + 1/2 + 1/2 + 1 = 3
 Social networks exhibit the following
properties:
Assortative Mixing Homophily - individuals often
choose to associate with others
having similar characteristics
Heterophily - tendency of
individuals to form connections with
others who are dissimilar to
themselves in certain attributes,
such as demographics, beliefs, or
behaviors.
https://www.wolfram.com/mathematica/new-in-9/social-
network-analysis/homophily-and-assortativity-mixing.html
Assortativity or assortative mixing is a
measure to gauge these mixing
tendencies
Examples

• Homophily • Heterophily
–On platforms like Facebook and – On platforms like LinkedIn,
Twitter, users often connect professionals often connect with
with others who share similar others from different industries,
political views, hobbies, or roles, or geographic locations to
diversify their networks.
cultural backgrounds.
– Such heterophilous connections can
–This can create echo chambers lead to new opportunities, insights,
where individuals are primarily and collaborations that wouldn't
exposed to information that arise within a more homogenous
reinforces their existing beliefs. network.
Assortative Mixing
• Understanding dynamics of Homophily and Heterophily is crucial for
comprehending how information spreads and how communities form on social
media platforms.
• While homophily can lead to echo chambers, where users are exposed primarily
to similar perspectives, heterophily can facilitate exposure to diverse viewpoints
and foster broader discussions.
• Assortativity (or assortative mixing) in social networks refers to the tendency
of nodes in a network to connect with other nodes that have similar attributes
The phenomenon of particular interest is the assortative mixing by degree
High degree nodes often prefers to connect other high degree nodes
Low degree nodes seen to connect other low degree nodes
Types of Assortativity in Social Networks
• Degree Assortativity:
–Measures whether high-degree nodes tend to connect with other
high-degree nodes, and low-degree nodes with low-degree nodes.
–Example: In a professional networking site, highly connected
influencers tend to connect with other influencers.
• Attribute Assortativity:
–Measures whether nodes with similar categorical or numerical
attributes (e.g., age, gender, profession) tend to be connected.
–Example: In social networks like Facebook, people tend to be friends
with others of the same age, cultural background, or interests.
Assortative Mixing
• Assortative Networks: Nodes prefer to connect with similar nodes.
–Example: Academic collaborations (researchers from the same field
often collaborate).
• Disassortative Networks: Nodes tend to connect with dissimilar nodes.
–Example: Buyer-seller networks (companies connect with customers,
not other companies).
• Applications in Social Network Analysis:
–Understanding information flow and community structure.
–Detecting echo chambers in social media.
–Studying resilience and fragmentation of networks.
Assortative Mixing
A common practice to find similarity between nodes is to use a correlation
coefficient
The Pearson correlation coefficient is a good choice if we want degree-based
assortativity
For two data (degree) distribution 𝑥 and 𝑦, the Pearson correlation coefficient
𝑟𝑥𝑦 is given by
𝑁 σ 𝑥𝑦 − σ 𝑥 σ 𝑦
𝑟𝑥𝑦 =
(𝑁 σ 𝑥 2 − (σ 𝑥)2 )(𝑁 σ 𝑦 2 − (σ 𝑦)2 )
 If 𝑟𝑥𝑦 = 1 , then nodes 𝑥 and 𝑦 are perfectly assortative (homophily)
If 𝑟𝑥𝑦 = −1 , then nodes 𝑥 and 𝑦 are perfectly disassortative (heterophily)
If 𝑟𝑥𝑦 = 0 , then nodes 𝑥 and 𝑦 are non-assortative
Structural Metrics – Measuring Relationships
Transitivity
refers to the tendency of individuals who are connected to a common
person to also be connected to each other, forming a triangular
relationship.
This concept measures the density of such triangles in the network,
indicating the extent to which friends of friends are likely to become
friends themselves.
A higher transitivity suggests a more interconnected and cohesive
network.
For example, if person A is connected to person B, and person B is
connected to person C, transitivity examines the likelihood that person
A is also connected to person C.
Transitivity
• Mathematic representation:
– For a transitive relation 𝑅:

• In a social network: 𝒄𝑹𝒂 or 𝒂𝑹𝒄 ?


– Transitivity is when a friend of my friend is my friend
– Transitivity in a social network leads to a denser graph, which in turn is closer to a
complete graph
– We can determine how close graphs are to the complete graph by measuring
transitivity
Transitivity
• A complete graph is surely transitive
• A measure of transitivity intends to capture how close a
network is to a complete graph
• A network with higher transitivity are likely to form dense
clusters
• Two ways to capture this tendency
–Local clustering coefficient
–Global clustering coefficient
Measuring Transtivity
• Transitivity is commonly measured using the global clustering coefficient, which
evaluates the overall prevalence of interconnected triplets (three-node subgraphs) in
the network.
– Triangles: Sets of three nodes where each node is directly connected to the other
two, forming a closed loop.
– Connected Triplets: Sets of three nodes with at least two connections among them,
which may form open or closed triplets.
• A higher global clustering coefficient indicates a greater tendency for nodes to cluster
together, revealing tightly knit communities within the network.
• Another approach is the local clustering coefficient, which assesses transitivity from
the perspective of individual nodes.
• Understanding and measuring transitivity in social networks help identify the presence
of tightly connected groups, inform community detection algorithms, and enhance
predictions about the spread of information or behaviors through the network.
[Global] Clustering Coefficient
• Clustering coefficient measures transitivity in undirected graphs
– Count paths of length two and check whether the third edge exists

When counting triangles, since every triangle has 6 closed paths


of length 2
Clustering Coefficient and Triples
Or we can rewrite it as

𝑣𝑖 𝑣𝑗 𝑣𝑘 and 𝑣𝑗 𝑣𝑘 𝑣𝑖 are
• Triple: an ordered set of three nodes, different triples
• The same members
– connected by two (open triple) edges or • First missing edge 𝑒(𝑣𝑘 , 𝑣𝑖 )
– three edges (closed triple) and second missing 𝑒(𝑣𝑖 , 𝑣𝑗 )

• A triangle can miss any of its three edges 𝑣𝑖 𝑣𝑗 𝑣𝑘 and 𝑣𝑘 𝑣𝑗 𝑣𝑖 are the same
– A triangle has 3 Triples triple
[Global] Clustering Coefficient: Example
Local Clustering Coefficient
• Local clustering coefficient measures transitivity at the node level
– Commonly employed for undirected graphs
– Computes how strongly neighbors of a node 𝑣 (nodes adjacent to 𝑣) are
themselves connected

In an undirected graph, the denominator


can be rewritten as:

Provides a way to determine


structural holes Structural
Holes
Local Clustering Coefficient: Example
• Thin lines depict connections to neighbors
• Dashed lines are the missing link among neighbors
• Solid lines indicate connected neighbors
– When none of neighbors are connected 𝐶 = 0
– When all neighbors are connected 𝐶 = 1
 The ratio of the number of relations
Reciprocity which are reciprocated (i.e. there is an
edge in both directions) over the total
• Relevant for directed number of relations in the network 1 2
networks
 …where two vertices are said to be
• A measure of the related if there is at least one edge
likelihood of vertices in between them
a directed network to
be mutually linked.  In the example to the right this would
be 2/5=0.4 (whether this is considered 3 4
• Informally, reciprocity
high or low depends on the context)
refers to: “If you would
follow me, most likely I  A useful indicator of the degree of Reciprocity for network = 0.4
shall follow you back” mutuality and reciprocal exchange in a
• May be considered a network, which relate to social
simplified version of cohesion
transitivity  Only makes sense in directed graphs
Reciprocity
• Reciprocity counts the closed loops of length 2
• The reciprocity 𝑅 of a network 𝐺 is defined as
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑐𝑖𝑝𝑟𝑜𝑐𝑎𝑙 𝑝𝑎𝑖𝑟𝑠 𝑖𝑛 𝐺
•𝐶 =
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑎𝑖𝑟𝑠 (𝑟𝑒𝑐𝑖𝑝𝑟𝑜𝑐𝑎𝑙 & 𝑛𝑜𝑛𝑟𝑒𝑐𝑖𝑝𝑟𝑜𝑐𝑎𝑙) 𝑖𝑛 𝐺
1 1
• For graph G, the reciprocity is
3 3

2 𝑮
Reciprocity

• The reciprocity 𝑅 for a graph 𝐺(𝑉, 𝐸) having adjacency matrix 𝐴 = 𝑎𝑖𝑗


is given by
2
–𝑅 = σ (𝑎 . 𝑎 )
|𝐸| 𝑖<𝑗 𝑖𝑗 𝑗𝑖
–On simplification,
2 1 𝑇𝑟𝑎𝑐𝑒(𝐴2 )
–𝑅 = × 𝑇𝑟𝑎𝑐𝑒 𝐴2 =
|𝐸| 2 |𝐸|

• In the above expression, 𝑇𝑟𝑎𝑐𝑒(∙) function denotes the sum of the


diagonal elements of its argument square matrix
Transitivity and Reciprocity in Real Social Networks
• on platforms like Facebook and LinkedIn, users often form groups based on shared
interests or professional backgrounds, leading to high transitivity and robust
community engagement.
• On platforms like Instagram and X, reciprocal interactions—such as mutual follows,
likes, or comments—enhance user engagement and retention by creating a sense of
community and belonging.
• Understanding and analyzing these metrics enable social media platforms to:
– Enhance User Experience: By identifying highly transitive clusters, platforms can
recommend relevant connections or content
– Improve Network Stability: High reciprocity contributes to user retention, as mutual
interactions reinforce users' commitment to the platform.
– Facilitate Targeted Marketing: Brands can leverage information about transitive
groups and reciprocal relationships to design targeted marketing strategies, ensuring
their messages reach the most engaged and relevant audiences.
Structural Equivalence - Similarity
• Two nodes are said to be exactly structurally equivalent if
they have the same relationships to all other nodes
• Two actors must be exactly substitutable in order to be
structurally equivalent
• In the attached network,
– nodes 𝐸 and 𝐹 are structurally equivalent, since these two
nodes have same pattern ties (viz. a single tie) with the
node 𝐵
– Also, nodes 𝐻 and 𝐼 are structurally equivalent, since
https://en.wikipedia.org/wiki/Similarity_(network_scien
ce)#:~:text=Similarity%20in%20network%20analysis%20
these two nodes have same pattern ties (viz. a single tie)
occurs,automorphic%20equivalence%2C%20and%20reg
ular%20equivalence.
with the node 𝐷
• Exact structural equivalence is likely to be rare (particularly
in large networks)
• the degree of structural equivalence is what interests us the
most
Measuring Structural Equivalence
• Common Neighbors
– number of common neighbors shared in the neighborhoods of the nodes 𝑎 and 𝑏
– 𝜎𝐶𝑁 𝑎, 𝑏 = |𝑁(𝑎) ∩ 𝑁(𝑏)|
• Jaccard Similarity
– Normalizes the common neighbors by the combined size of the neighborhoods of
the two nodes
|𝑁(𝑎)∩𝑁(𝑏)|
– 𝜎𝐶𝑁 𝑎, 𝑏 =
|𝑁(𝑎)∪𝑁(𝑏)|
• Cosine Similarity
– normalizes the common neighbors by the individual sizes of the neighborhoods
|𝑁(𝑎)∩𝑁(𝑏)|
– 𝜎𝐶𝑁 𝑎, 𝑏 =
𝑁 𝑎 |𝑁(𝑏)|

You might also like