0% found this document useful (0 votes)
16 views38 pages

Introduction To Social Network Analysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views38 pages

Introduction To Social Network Analysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Introduction to Social

Network Analysis

Saeed Roshani
Assistant Professor of Technology Management
Types of Networks
Directed Networks
A network where relationships (edges)
between nodes have a direction. Each
edge points from one node to another,
representing asymmetric relationships.
Use Cases: Communication patterns,
authority structures, citation networks.
Undirected Networks
A network where edges between nodes do
not have a direction. Connections are
bidirectional, representing symmetric
relationships.
Use Cases: Collaboration networks,
mutual relationships, co-authorship
networks.
Weighted Networks
A network where edges carry a weight
representing the strength, frequency, or
capacity of the relationship between
nodes.
Use Cases: Financial networks
(transaction amounts), supply chain
analysis (goods volume), social
interaction intensity.
Bipartite Networks
A network with two distinct sets of nodes
where edges only connect nodes from
different sets.
Example: Movie recommendation
networks (one set of nodes = users, the
other set = movies, with edges
representing who has watched what).

Use Cases: Financial networks


(transaction amounts), supply chain
analysis (goods volume), social
interaction intensity.
Multilayer Networks
A network where nodes can participate in
multiple layers or types of relationships,
often represented by multiple types of
edges.
Example: Social media networks where
users interact through different platforms
(e.g., Facebook, Instagram, LinkedIn).

Use Cases: Complex systems analysis,


social influence spread across various
platforms, global supply chains that
involve multiple kinds of relationships
(e.g., suppliers, distributors, regulators).
Homophily vs Heterophily in Networks
Homophily Networks: Networks where nodes with
similar attributes are more likely to be connected.
Example: People with similar interests or professions
forming tightly-knit communities.

Heterophily Networks: Networks where nodes with


different attributes connect.
Example: Interdisciplinary research collaborations
between scientists from different fields.
Static vs Dynamic Networks
Networks where nodes and edges evolve
over time, capturing changing
relationships.

Use Cases: Longitudinal studies, tracking


social movements, or communication
patterns over time.
Random vs Scale-Free Networks

Random Networks: Networks where edges between nodes are


created randomly. Degree distribution is typically normal,
meaning most nodes have a similar number of connections.
• Example: A hypothetical network generated for
theoretical studies (e.g., Erdős–Rényi model).
• Use Cases: Theoretical modeling and comparison with
real-world networks.
Scale-Free Networks: Networks where a few nodes (hubs)
have a disproportionately large number of connections,
following a power-law distribution.
• Example: The internet, where a few websites receive
most of the traffic (e.g., Google, Facebook).
• Use Cases: Understanding the robustness of networks,
identifying key players in systems like airline routes or
communication platforms.
Level of Analysis
Macro-Level Network Analysis
Network Density
Density is the ratio of the number
of actual edges in a network to the
maximum possible number of
edges. It measures how
interconnected the network is.
Degree Distribution

Degree distribution describes the frequency of nodes with a certain


number of connections (degree) within a network.
Average Path Length
The average path length is the average
number of steps along the shortest paths
for all possible pairs of network nodes.

Interpretation
• Small average path lengths indicate that nodes are
generally close to each other, often a feature of "small-
Steps to use the formula: world" networks.
1.Identify all pairs of nodes i and j. • A long average path length suggests that
2.Find the shortest path d(i,j) between each pair of communication or interaction between nodes is more
difficult or slower.
nodes.
3.Sum all the shortest path distances.
4.Divide the total sum by the number of unique pairs
of nodes, which is N(N−1).
Connected Components
A connected component is a subgraph in
which any two nodes are connected by a
path, and which is connected to no
additional nodes in the network.

Interpretation:
The number and size of connected components give insights into the network's
overall structure. A network can be fragmented into several small components,
or it can have one large "giant component."

A network with many small disconnected components might indicate isolation


between groups or inefficiency in communication.
Assortativity
Assortativity is a measure of the tendency for
nodes to connect to other nodes that are similar in
some way (e.g., degree, attribute).

Types:
Degree Assortativity:
Do high-degree nodes tend to connect to other high-degree nodes?
Attribute Assortativity: What about:
Do nodes with similar attributes (e.g., age, profession) tend to be connected? Robustness?
Information Spread?
Interpretation:
• Positive assortativity indicates that similar nodes tend to be connected (e.g., in social networks,
people may tend to form connections with others who are similar in status or wealth).
• Negative assortativity means dissimilar nodes tend to connect (e.g., hierarchical structures where
leaders connect to followers).
Example:
Social networks are often assortative by degree (highly connected individuals connect with other highly
connected individuals).
Clustering Coefficient
The clustering coefficient measures the likelihood that two
neighbors of a node are also connected, reflecting the network's
tendency to form tightly knit groups.

• Local Clustering Coefficient: Measures how likely it is for the


neighbors of a particular node to be connected.

• Global Clustering Coefficient (Transitivity): An overall


measure of the network's tendency to form triangles (closed
triplets of nodes).
Micro-Level Network Analysis
Highest
t
Betweenness
Centrality Highest
c
Closeness
Centrality
s
e
b
q
j
d
r
g
u I
k
a
n
f L
Highest
Highest h Degree
Eigenvector m Centrality
Centrality
Degree Centrality
Degree centrality measures the number of direct
connections (edges) a node has. It quantifies the immediate
influence or popularity of a node in the network.

Interpretation:
Nodes with high degree centrality have more direct connections and
are often considered key players in spreading information or
maintaining communication in the network.
Example:
In a social network, a person with many friends or contacts has a high
degree centrality. In a collaboration network, a researcher with many
co-authors has a high degree centrality.
Betweenness Centrality
Betweenness centrality measures how often a
node appears on the shortest path between
other pairs of nodes. It quantifies a node’s role
as a bridge or gatekeeper in the network.

Interpretation:
Nodes with high betweenness centrality control information flow in the
network. They act as intermediaries between otherwise disconnected parts
of the network, giving them strategic importance.
Example:
In a corporate hierarchy, a manager who coordinates between different
departments has high betweenness centrality. In a communication network,
a node that links different subgroups is central in terms of betweenness.
Closeness Centrality
Closeness centrality measures how close a node
is to all other nodes in the network. It is the
inverse of the average distance from the node to
all other nodes.

Is node 4 is more central than node 3?

Interpretation:
A node with high closeness centrality can quickly reach all other
nodes in the network, making it efficient for spreading information.
Such nodes are central in terms of their proximity to others.

Example:
In a company's communication network, an employee who can reach
everyone in a few steps (perhaps through email or other
communication tools) has high closeness centrality.
Eigenvector Centrality
Eigenvector centrality measures a node’s influence
based on the importance of its neighbors. A node
has a high eigenvector centrality if it is connected
to many highly central nodes.

λ is a constant (the largest eigenvalue of the adjacency matrix).

Interpretation:
Nodes with high eigenvector centrality are not only well-connected
but are connected to other influential nodes. It extends the idea of
degree centrality by accounting for the quality of connections, not just
the quantity.
Example:
In a social network, a person who is well-connected to influential
people (e.g., celebrities or political leaders) has high eigenvector
centrality, even if they don't have many direct connections themselves.
Centrality Measure Definition Example Interpretation
Measures the number of
Degree Centrality direct connections a node A person with many friends in a social network. High values indicate many direct connections.
has.
Measures how often a
node lies on the shortest A manager who connects different departments High values indicate strategic positions for controlling
Betweenness Centrality
path between other in a company. information flow.
nodes.

Measures how close a An employee who can quickly reach others in a High values indicate the ability to quickly reach other
Closeness Centrality
node is to all other nodes. communication network. nodes.

Measures influence based


A person connected to many influential people in High values indicate connection to influential or well-
Eigenvector Centrality on the importance of a
a social network. connected nodes.
node's neighbors.
Measures influence
A person with few direct but many indirect High values indicate influence through both direct and
Katz Centrality accounting for direct and
influential connections. indirect connections.
indirect connections.
Ranks nodes based on
High values indicate importance based on backlinks
PageRank Centrality incoming links from A website with many important backlinks.
from influential nodes.
important nodes.
Sums the reciprocals of
A node in a transportation network that is High values indicate closeness to many nodes, even in
Harmonic Centrality distances to all other
centrally located. disconnected networks.
nodes.
Measures importance
A node that spreads a disease or information in High values indicate key roles in spreading processes
Percolation Centrality based on contribution to
an epidemiological model. like disease or information.
spreading processes.
Measures importance
Flow Betweenness A router facilitating a significant portion of High values indicate key roles in managing the flow of
based on contribution to
Centrality network traffic. information or resources.
the flow of information.
Messo-Level Network Analysis
Understanding Collaboration and Innovation Clusters
Meso-level network analysis focuses on understanding
the intermediate structure of a network. It zooms in
between the large-scale (macro-level) analysis of the
entire network and small-scale (micro-level) interactions
between individual nodes.
This level of analysis helps to identify communities,
subgroups, and collaborative clusters where innovation,
knowledge transfer, or cooperation is particularly concentrated.

Meso-level analysis is vital in technology forecasting because


technological advances often happen in communities or
clusters. These clusters are formed by companies, research
institutions, or even countries that share specific interests or
work on related technological developments.
Community
In network science, a community refers to a group of nodes
(such as people, organizations, or objects) within a network that
are more densely connected to each other than to the rest of the
network. Communities are also sometimes called clusters,
modules, or subgroups.

Community Detection
Community detection is the process of identifying these
communities within a network. The goal is to partition the
network in such a way that nodes are grouped into communities
based on the density of their internal connections.
Louvain Method
The Louvain Method is a widely-used algorithm for detecting
communities in large networks by optimizing a metric called modularity.

The Louvain Method begins by assigning each node to its own


community. It then iteratively merges communities to increase
modularity, stopping once no further improvements are possible.

One of the primary advantages of the Louvain Method is that it scales


well to large networks, and it does not require prior knowledge of the
number of communities. This makes it particularly useful in complex
networks, such as corporate alliances or technological collaboration
networks.
Girvan-Newman Algorithm
The Girvan-Newman algorithm works by successively removing edges
with high betweenness centrality from the network.

Betweenness centrality quantifies how often a node or edge lies on the


shortest path between other nodes, making it an ideal measure for
identifying bridges between communities. By removing these high-
betweenness edges, the algorithm gradually isolates groups of nodes
that form distinct communities.

While Girvan-Newman works well for small to medium-sized


networks, it can be computationally expensive for large networks. This
algorithm is especially useful when the goal is to understand the key
connections that hold different parts of a network together, such as in
small social networks or organizational structures.
Walktrap Algorithm

The Walktrap Algorithm is based on the idea that random walks


in a network are likely to stay within the same community. It
works by performing short random walks and using the results
to merge nodes into communities.

Nodes that are frequently visited together during these random


walks are assumed to belong to the same community. The
algorithm continues merging nodes and communities until an
optimal community structure is reached.

Walktrap works particularly well in networks where


communities are densely interconnected. It is well-suited for
identifying overlapping communities, such as in biological
networks or innovation clusters.
What is Modularity?

Modularity is a measure that quantifies the strength


of division of a network into communities or modules.
It answers the question: How well is a network
clustered into groups?
High modularity means that nodes within a
community are more densely connected to each other
than to the rest of the network.
Greedy Modularity Optimization

Greedy modularity optimization is a community detection modularity gain


algorithm that iteratively merges communities in a network to
When merging two communities, the modularity gain
maximize modularity. It is called "greedy" because it makes represents how much the modularity score QQQ improves or
locally optimal decisions at each step to increase modularity. changes due to that merge. The algorithm computes the
modularity gain for each possible pair of communities and
The algorithm starts with each node as its own community, then chooses the merge that results in the highest increase (or the
smallest decrease, in some cases) in modularity.
iteratively merges communities based on the
modularity gain. The number of communities is
not predetermined but is the result of the optimization process.
Once the optimal modularity is reached, the final number of
communities is set. This method is simple and efficient for
moderate-sized networks, but may not always find the global
maximum modularity. It can also struggle with large networks
where more sophisticated algorithms may perform better.
Cliques are a fundamental concept in graph theory and
network analysis. A clique in a network is a subset of
nodes where every node is directly connected to every
other node in that subset. In other words, a clique is a
complete subgraph of a network, where each node has an
edge to every other node in the clique.

Types of Cliques:

1.Maximal Clique: A clique that cannot be extended by adding an adjacent node. Example:
It’s a subset of nodes where every node is connected to every other node, and
adding any other node from the network would break the clique structure.
Consider a small social network where nodes
2.k-Clique: A clique with exactly kkk nodes. For example, a 3-clique consists of represent individuals, and edges represent
three nodes, all of which are connected. friendships. A clique in this network would be a
3.Maximum Clique: The largest clique (in terms of the number of nodes) found in group of people where everyone knows each other.
the network.
For example, if Alice, Bob, and Charlie are all
friends with each other, they form a 3-clique. If
Alice, Bob, Charlie, and Dave are all connected
(friends), they form a 4-clique.
Key Characteristics of Core-Periphery Structures:
Core:
1. The core nodes have strong, dense connections
among themselves.
2. Core nodes are often central, highly influential, or
important in the network.
3. These nodes usually maintain the cohesion of the
network.
4. Core nodes can be identified by their high degree,
centrality, or high participation in important activities.
A core-periphery structure is a network model that describes a
specific organization of nodes, where the network is divided Periphery:
into two distinct groups: the core and the periphery. The core 1. The periphery nodes have few connections to other
consists of nodes that are highly interconnected with each other, periphery nodes.
while the periphery is made up of nodes that are sparsely 2. Periphery nodes are primarily connected to the core
nodes but not strongly connected to each other.
connected to one another but connected to the core.
3. They tend to be less central and less influential than
core nodes.

You might also like