Introduction to Social Network Analysis (2021)
Introduction to Social Network Analysis (2021)
and Applications
Introduction to Social
Network Analysis
1
C.-L. Yang, Big Data, NTUST IM
Complex Networks
Networks = nodes + links
Not regular, but not random
Scale-free networks
Universal properties
8
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Political Blogs
10
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
LinkedIn Map
14
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Node vs. Links
Actors / nodes / vertices / points
▪ Computers / Telephones
▪ Persons / Employees
▪ Companies / Business Units
▪ Articles / Books
▪ Can have properties (attributes)
Ties / edges / arcs / lines / links
Connect pair of actors
Types of social relations
Allow different kind of flows
15
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Dyads, Triads and Relations
▪ actor
▪ dyad
▪ triad
friendship
▪ relation:
▪ collection of specific ties among
kinship
members of a group
Slide Credit: Johannes Putzke
16
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Strength of a Tie
▪ Social network
▪ finite set of actors and
relation(s) defined on them
▪ depicted in graph/
sociogram
Anna, female, 27
▪ labeled graph
Ken, male, 34
▪ Strength of a Tie
▪ dichotomous vs.
valued
▪ depicted in valued
graph or signed graph
(+/-)
5 2
Slide Credit: Johannes Putzke
17
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Strength of a Tie
▪ Strength of a Tie
▪ nondirectional vs.
adjacent node to/from directional
▪ depicted in directed graphs
incident node to (digraphs)
▪ nodes connected by arcs
▪ 3 isomorphism classes
▪ null dyad
+ - ▪ mutual / reciprocal /
symmetrical dyad
▪ asymmetric /
antisymmetric dyad
▪ converse of a digraph
▪ reverse direction of all arcs
Slide Credit: Johannes Putzke
18
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Why Graph Theory ?
Graphs used to model pair wise relations
between objects
Generally a network can be represented by a
graph
Many practical problems can be easily
represented in terms of graph theory
19
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Graph Theory - History
Begun in 1735
Mentioned in Leonhard Euler's paper on “Seven
Bridges of Konigsberg ” .
Problem : Walk all 7 bridges without crossing a bridge
twice
Eulerian graph
Gustav Kirchhoff
24
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Vertex and Edge
Vertex /Node
Basic Element
Drawn as a node or a dot.
Vertex set of G is usually denoted by V(G), or V or VG
Edge /Arcs
A set of two elements
Drawn as a line connecting two vertices, called end vertices,
or endpoints.
The edge set of G is usually denoted by E(G), or E or EG
Neighborhood
For any node v, the set of nodes it is connected to via an edge
is called its neighborhood and is represented as N(v)
25
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Graph: Example
n:= 6 , m:=7
Vertices (V) :={1,2,3,4,5,6}
Edge (E) := {1,2},{1,5},{2,3},{2,5},{3,4},{4,5},{4,6}}
N(4) := Neighborhood (4) ={6,5,3}
26
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Edge Types
Undirected;
E.g., distance between two cities, friendships…
Directed; ordered pairs of nodes.
Directed edges have a source (head, origin) and
target (tail, destination) vertices
Weighted ; usually weight is associated .
27
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Empty Graph / Edgeless graph
No edge
Null graph
No nodes
Obviously no edge
28
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Simple Graph (Undirected)
Simple Graph are undirected graphs without
loop or multiple edges
A = AT
29
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Directed graph : (digraph)
loop
multiple arc
arc node
loop
multiple arc
arc node
31
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Weighted Graph
Weighted graph is a graph for which each
edge has an associated weight
1.2 2
1 2 3 1 2 3
.2
.5 1.5 5 3
.3 1
4 5 6 4 5 6
.5
32
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Subgraph
Vertex and edge sets are subsets of those of
G
a supergraph of a graph G is a graph that
contains G as a subgraph.
33
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Adjacency Matrix
n-by-n matrix with Auv = 1 if (u, v) is an edge.
Diagonal Entries are self-links or loops
Symmetric matrix for undirected graphs
1 2 3 4 5 6 7 8
1 0 1 1 0 0 0 0 0
2 1 0 1 1 1 0 0 0
3 1 1 0 0 1 0 1 1
4 0 1 0 1 1 0 0 0
5 0 1 1 1 0 1 0 0
6 0 0 0 0 1 0 0 0
7 0 0 1 0 0 0 0 1
8 0 0 1 0 0 0 1 0
34
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Incidence Matrix
VxE
[vertex, edges] contains the edge's data
1,2 1,5 2,3 2,5 3,4 4,5 4,6
1 1 1 0 0 0 0 0
2 1 0 1 1 0 0 0
3 0 0 1 0 1 0 0
4 0 0 0 0 1 1 1
5 0 1 0 1 0 1 0
6 0 0 0 0 0 0 1
35
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Adjacency List
Edge list Adjacency List (node
list)
Edge List Node List
12 122
12 235
23 33
25 435
33 534
43
45
53
54
36
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Edge Lists for Weighted Graphs
Edge List
1 2 1.2
2 4 0.2
4 5 0.3
4 1 0.5
5 4 0.5
6 3 1.5
37
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Classification of Graph Terms
Global terms refer to a whole graph
Local terms refer to a single node in a graph
38
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Connected and Isolated vertex
Two vertices are connected if there is a path
between them
Isolated vertex – not connected
1 2 3
isolated vertex 4 5 6
39
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Adjacent nodes
Adjacent nodes
Two nodes are adjacent if they are connected
via an edge.
If edge e={u,v} ∈ E(G), we say that u and v are
adjacent or neighbors
An edge where the two end vertices are the
same is called a loop, or a self-loop
40
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Degree (Undirected Graphs)
Number of edges incident on a node
The degree of 5 is 3
41
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Degree (Directed Graphs)
In-degree: Number of edges entering
Out-degree: Number of edges leaving
Degree = indeg + outdeg
outdeg(1)=2
indeg(1)=0
outdeg(2)=2
indeg(2)=2
outdeg(3)=1
indeg(3)=4
42
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Walk and Trial
walk: a path in which edges/nodes can be
repeated.
43
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Paths
Path: is a sequence P of nodes v1, v2, …, vk-1, vk
No vertex can be repeated
A closed path is called a cycle
The length of a path or cycle is the number of edges visited
in the path or cycle
C3 C4 C5
45
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Walks, Trails, Paths
▪ (Directed) Walk (W)
▪ sequence of nodes and lines starting and ending with (different) nodes
(called origin and terminus)
▪ Nodes and lines can be included more than once
▪ Inverse of a (directed) walk (W-1)
▪ Walk in opposite order
▪ Length of a walk
▪ How many lines occur in the walk? (same line counts double, in
weighted graphs add line weights)
▪ (Directed) Trail
▪ Is a walk in which all lines are distinct
▪ (Directed) Path
▪ Walk in which all nodes and all lines are distinct
▪ Every path is a trail and every trail is a walk
Slide Credit: Johannes Putzke
46
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Walks, Trails and Paths - Repetition
l3
l2 l4 l5
l7
l1
l6
▪ W = n1 l1 n2 l2 n3 l4 n5 l6 n6 ▪ Path
▪ n1 ▪ origin
▪ n3 ▪ terminus
▪ W = n1 l1 n2 l2 n3 l4 n5 l4 n3 ▪ Walk
▪ W = n1 l1 n2 l2 n3 l4 n5 l5 n4 l3 n3 ▪ Trail
48
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Reachability, Distances and Diameter
▪ Reachability
▪ If there is a path between
nodes ni and nj
▪ Geodesic
▪ Shortest path between two nodes
▪ (Geodesic) Distance d(i,j)
▪ Length of Geodesic (also called „degrees of separation“)
50
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Six degrees of separation
"The small-world
problem". Stanley
Milgram, 1967
"An experimental
study of the small
world problem J.
Travers, S.
Milgram, 1969
52
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Stanley Milgram’s 1969 experiment
53
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
54
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Result of Stanley Milgram’s 1969
experiment
Reached the target N = 64, 29%
Average chain length <L> = 5.2
Channels: hometown <L> = 6.1
Business contacts <L> = 4.6
From Boston <L> = 4.4
From Nebraska <l> = 5.7
55
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Small World
Email graph:
D. Watts (2001), 48,000 senders, <L> 6
MSN Messenger graph:
J. Lescovec et. al (2007), 240 million users, <L>
6.6
Facebook graph:
L. Backstrom et. al (2012), 721 million users (≈
69 billion friendship links), <L> 4:74
56
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Cayley Tree
Bethe lattice: infinite cycle free graph where every node
connect to z neighbors - coordination number
per shell Nk = z(z -1)k-1
total N = 1 + σ𝐿𝑘=1 z(z −1)k−1
Estimates:
zL = N, L = log N/log z
N ≈ 6.7 billion, z = 50 friends, L ≈ 5.8
z=3
57
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab