0% found this document useful (0 votes)
9 views57 pages

Introduction to Social Network Analysis (2021)

The document introduces social network analysis, emphasizing its importance in understanding relationships among individuals or organizations through nodes and links. It discusses various concepts such as graph theory, types of graphs, and the significance of social networks in modern society. The content is structured to provide foundational knowledge for analyzing complex networks in fields like finance, transportation, and biology.

Uploaded by

mmpham1501
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views57 pages

Introduction to Social Network Analysis (2021)

The document introduces social network analysis, emphasizing its importance in understanding relationships among individuals or organizations through nodes and links. It discusses various concepts such as graph theory, types of graphs, and the significance of social networks in modern society. The content is structured to provide foundational knowledge for analyzing complex networks in fields like finance, transportation, and biology.

Uploaded by

mmpham1501
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

IM5211701 – Big Data Analytics

and Applications
Introduction to Social
Network Analysis

Instructor: Chao-Lung Yang, Ph.D.


Department of Industrial
Management
National Taiwan University of
Science and Technology

1
C.-L. Yang, Big Data, NTUST IM
Complex Networks
 Networks = nodes + links
 Not regular, but not random
 Scale-free networks
 Universal properties

Slide Credit: Leonid E. Zhukov


2
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Financial Sector

Slide Credit: Leonid E. Zhukov


3
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Transportation

Image credit: https://www.metro.taipei/cp.aspx?n=91974F2B13D997F1


4
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Biology – Protein-Protein
Interactions

Slide Credit: Leonid E. Zhukov


5
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Graphing the history of philosophy

Image Credit: www.coppelia.io


6
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Why Study Social Network?
 Social network = Social media + networking
 2/3 of the global internet population visit
social networks
 Nielsen, Global Faces & Networked Places, 2009
 78% of people trust the recommendation of
other consumers
 Nielsen “TRUST IN ADVERTISING” REPORT,
OCTOBER 2007
 Time spent on social networks is growing at 3X
the overall internet
 Nielsen, Global & Networked Places, 2009
7
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Social Network Analysis
 A set of relational methods for systematically
understanding and identifying connections among
actors.
 is motivated by a structural intuition based on ties
linking social actors
 is grounded in systematic empirical data
 draws heavily on graphic imagery
 relies on the use of mathematical and/or computational
models.
 Social Network Analysis embodies a range of
theories relating types of observable social spaces
and their relation to individual and group behavior.

8
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Political Blogs

Slide Credit: Leonid E. Zhukov


9
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Social Network – Relations among
People

10
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
LinkedIn Map

Slide Credit: Yi-Lin Chen


11
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Facebook Network

Social graph 500mln people, Paul Butler, 2010


12
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
What is a network?
 Terminology
 Multiple terms mentioned are similar

Slide Credit: Leonid E. Zhukov


13
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
What is a Social Network ?
 A social structure made up of individuals (or
organizations) called "nodes", which are
tied (connected) by one or more specific
types of interdependency, such as friendship,
common interest

14
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Node vs. Links
 Actors / nodes / vertices / points
▪ Computers / Telephones
▪ Persons / Employees
▪ Companies / Business Units
▪ Articles / Books
▪ Can have properties (attributes)
 Ties / edges / arcs / lines / links
 Connect pair of actors
 Types of social relations
 Allow different kind of flows

15
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Dyads, Triads and Relations
▪ actor

▪ dyad

▪ triad

friendship
▪ relation:
▪ collection of specific ties among
kinship
members of a group
Slide Credit: Johannes Putzke
16
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Strength of a Tie
▪ Social network
▪ finite set of actors and
relation(s) defined on them
▪ depicted in graph/
sociogram
Anna, female, 27
▪ labeled graph
Ken, male, 34
▪ Strength of a Tie
▪ dichotomous vs.
valued
▪ depicted in valued
graph or signed graph
(+/-)
5 2
Slide Credit: Johannes Putzke
17
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Strength of a Tie
▪ Strength of a Tie
▪ nondirectional vs.
adjacent node to/from directional
▪ depicted in directed graphs
incident node to (digraphs)
▪ nodes connected by arcs
▪ 3 isomorphism classes
▪ null dyad
+ - ▪ mutual / reciprocal /
symmetrical dyad
▪ asymmetric /
antisymmetric dyad
▪ converse of a digraph
▪ reverse direction of all arcs
Slide Credit: Johannes Putzke
18
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Why Graph Theory ?
 Graphs used to model pair wise relations
between objects
 Generally a network can be represented by a
graph
 Many practical problems can be easily
represented in terms of graph theory

19
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Graph Theory - History
 Begun in 1735
 Mentioned in Leonhard Euler's paper on “Seven
Bridges of Konigsberg ” .
 Problem : Walk all 7 bridges without crossing a bridge
twice
 Eulerian graph

Slide Credit: Prem Sankar C


20
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Graph Theory - History
 Cycles in Polyhedra - polyhedron with no
Hamiltonian cycle

Thomas P. Kirkman William R. Hamilton

Hamiltonian cycles in Platonic graphs


Slide Credit: Prem Sankar C
21
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Graph Theory - History
Trees in Electric Circuits

Gustav Kirchhoff

Slide Credit: Prem Sankar C


22
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Graph Theory - History
Four Colors of Maps

Arthur Cayley Auguste DeMorgan

Slide Credit: Yang Mu


23
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Graph Definition
 A graph is a collection of nodes and edges
 Denoted by G = (V, E).
 V = nodes (vertices, points).
 E = edges (links, arcs) between pairs of nodes.
 Graph size parameters: n = |V|, m = |E|.

24
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Vertex and Edge
 Vertex /Node
 Basic Element
 Drawn as a node or a dot.
 Vertex set of G is usually denoted by V(G), or V or VG
 Edge /Arcs
 A set of two elements
 Drawn as a line connecting two vertices, called end vertices,
or endpoints.
 The edge set of G is usually denoted by E(G), or E or EG
 Neighborhood
 For any node v, the set of nodes it is connected to via an edge
is called its neighborhood and is represented as N(v)

25
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Graph: Example
 n:= 6 , m:=7
 Vertices (V) :={1,2,3,4,5,6}
 Edge (E) := {1,2},{1,5},{2,3},{2,5},{3,4},{4,5},{4,6}}
 N(4) := Neighborhood (4) ={6,5,3}

26
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Edge Types
 Undirected;
 E.g., distance between two cities, friendships…
 Directed; ordered pairs of nodes.
 Directed edges have a source (head, origin) and
target (tail, destination) vertices
 Weighted ; usually weight is associated .

27
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Empty Graph / Edgeless graph
 No edge

 Null graph
 No nodes

 Obviously no edge

28
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Simple Graph (Undirected)
 Simple Graph are undirected graphs without
loop or multiple edges
 A = AT

29
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Directed graph : (digraph)

 Edges have directions


 A !=AT

loop

multiple arc

arc node

System Informatics &


Management Science Lab
Directed Graph (digraph)
 Edges have directions

loop

multiple arc

arc node

31
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Weighted Graph
 Weighted graph is a graph for which each
edge has an associated weight

1.2 2
1 2 3 1 2 3
.2
.5 1.5 5 3
.3 1
4 5 6 4 5 6

.5

32
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Subgraph
 Vertex and edge sets are subsets of those of
G
 a supergraph of a graph G is a graph that
contains G as a subgraph.

33
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Adjacency Matrix
 n-by-n matrix with Auv = 1 if (u, v) is an edge.
 Diagonal Entries are self-links or loops
 Symmetric matrix for undirected graphs

1 2 3 4 5 6 7 8
1 0 1 1 0 0 0 0 0
2 1 0 1 1 1 0 0 0
3 1 1 0 0 1 0 1 1
4 0 1 0 1 1 0 0 0
5 0 1 1 1 0 1 0 0
6 0 0 0 0 1 0 0 0
7 0 0 1 0 0 0 0 1
8 0 0 1 0 0 0 1 0

34
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Incidence Matrix
 VxE
 [vertex, edges] contains the edge's data
1,2 1,5 2,3 2,5 3,4 4,5 4,6

1 1 1 0 0 0 0 0
2 1 0 1 1 0 0 0
3 0 0 1 0 1 0 0
4 0 0 0 0 1 1 1
5 0 1 0 1 0 1 0
6 0 0 0 0 0 0 1

35
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Adjacency List
 Edge list  Adjacency List (node
list)
Edge List Node List
12 122
12 235
23 33
25 435
33 534
43
45
53
54

36
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Edge Lists for Weighted Graphs

Edge List
1 2 1.2
2 4 0.2
4 5 0.3
4 1 0.5
5 4 0.5
6 3 1.5

37
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Classification of Graph Terms
 Global terms refer to a whole graph
 Local terms refer to a single node in a graph

38
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Connected and Isolated vertex
 Two vertices are connected if there is a path
between them
 Isolated vertex – not connected

1 2 3

isolated vertex 4 5 6

39
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Adjacent nodes
 Adjacent nodes
 Two nodes are adjacent if they are connected
via an edge.
 If edge e={u,v} ∈ E(G), we say that u and v are
adjacent or neighbors
 An edge where the two end vertices are the
same is called a loop, or a self-loop

40
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Degree (Undirected Graphs)
 Number of edges incident on a node

The degree of 5 is 3

41
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Degree (Directed Graphs)
 In-degree: Number of edges entering
 Out-degree: Number of edges leaving
 Degree = indeg + outdeg
outdeg(1)=2
indeg(1)=0

outdeg(2)=2
indeg(2)=2

outdeg(3)=1
indeg(3)=4

42
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Walk and Trial
 walk: a path in which edges/nodes can be
repeated.

 trail: no edge can be repeat a-b-c-d-e-b-d

43
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Paths
 Path: is a sequence P of nodes v1, v2, …, vk-1, vk
 No vertex can be repeated
 A closed path is called a cycle
 The length of a path or cycle is the number of edges visited
in the path or cycle

Walks and Paths


1,2,5,2,3,4 1,2,5,2,3,2,1 1,2,3,4,6
walk of length 5 Cycle walk of path of length 4
44
length
C.-L. Yang, Big 6 IM
Data, NTUST System Informatics &
Management Science Lab
Cycle
 Cycle - closed path: cycle (a-b-c-d-a) ,
closed if x=y
 Cycles denoted by Ck, where k is the number
of nodes in the cycle

C3 C4 C5

45
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Walks, Trails, Paths
▪ (Directed) Walk (W)
▪ sequence of nodes and lines starting and ending with (different) nodes
(called origin and terminus)
▪ Nodes and lines can be included more than once
▪ Inverse of a (directed) walk (W-1)
▪ Walk in opposite order
▪ Length of a walk
▪ How many lines occur in the walk? (same line counts double, in
weighted graphs add line weights)
▪ (Directed) Trail
▪ Is a walk in which all lines are distinct
▪ (Directed) Path
▪ Walk in which all nodes and all lines are distinct
▪ Every path is a trail and every trail is a walk
Slide Credit: Johannes Putzke
46
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Walks, Trails and Paths - Repetition
l3

l2 l4 l5
l7
l1
l6

▪ W = n1 l1 n2 l2 n3 l4 n5 l6 n6 ▪ Path
▪ n1 ▪ origin
▪ n3 ▪ terminus
▪ W = n1 l1 n2 l2 n3 l4 n5 l4 n3 ▪ Walk
▪ W = n1 l1 n2 l2 n3 l4 n5 l5 n4 l3 n3 ▪ Trail

Slide Credit: Johannes Putzke


47
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Shortest Path
 Shortest Path is the path between two nodes
that has the shortest length
 Length – number of edges.
 Distance between u and v is the length of a
shortest path between them
 The diameter of a graph is the length of the
longest shortest path between any pairs of
nodes in the graph

48
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Reachability, Distances and Diameter
▪ Reachability
▪ If there is a path between
nodes ni and nj
▪ Geodesic
▪ Shortest path between two nodes
▪ (Geodesic) Distance d(i,j)
▪ Length of Geodesic (also called „degrees of separation“)

Slide Credit: Johannes Putzke


49
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Six degrees of separation
 “Any two people are on average separated
no more that by six intermediate
connections”
 John Guare play (1991) and movie (1993), ”Six
Degrees of Separation”

50
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Six degrees of separation

Slide Credit: Leonid E. Zhukov


51
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Small World

 "The small-world
problem". Stanley
Milgram, 1967
 "An experimental
study of the small
world problem J.
Travers, S.
Milgram, 1969

52
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Stanley Milgram’s 1969 experiment

 296 volunteers, 217 sent


 196 in Nebraska (1300
miles)
 100 in Boston (25 miles)
 Target in Boston
 Name, address,
occupation,
 job, hometown

53
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
54
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Result of Stanley Milgram’s 1969
experiment
 Reached the target N = 64, 29%
 Average chain length <L> = 5.2
 Channels: hometown <L> = 6.1
 Business contacts <L> = 4.6
 From Boston <L> = 4.4
 From Nebraska <l> = 5.7

55
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Small World
 Email graph:
 D. Watts (2001), 48,000 senders, <L> 6
 MSN Messenger graph:
 J. Lescovec et. al (2007), 240 million users, <L>
6.6
 Facebook graph:
 L. Backstrom et. al (2012), 721 million users (≈
69 billion friendship links), <L> 4:74

56
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab
Cayley Tree
 Bethe lattice: infinite cycle free graph where every node
connect to z neighbors - coordination number
 per shell Nk = z(z -1)k-1
 total N = 1 + σ𝐿𝑘=1 z(z −1)k−1
 Estimates:
 zL = N, L = log N/log z
 N ≈ 6.7 billion, z = 50 friends, L ≈ 5.8

z=3
57
C.-L. Yang, Big Data, NTUST IM System Informatics &
Management Science Lab

You might also like