0% found this document useful (0 votes)
57 views

Data Mining and BI: Social Network Analytics: Random Graphs

This document discusses random graph models and how they can be used to model real-world networks. It introduces the Erdos-Renyi random graph model and describes its key properties like the binomial degree distribution and emergence of a giant component. It then discusses more realistic models like preferential attachment models, which incorporate growth over time and preferential attachment to high-degree nodes, resulting in power-law degree distributions often seen in real networks.

Uploaded by

marouli90
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

Data Mining and BI: Social Network Analytics: Random Graphs

This document discusses random graph models and how they can be used to model real-world networks. It introduces the Erdos-Renyi random graph model and describes its key properties like the binomial degree distribution and emergence of a giant component. It then discusses more realistic models like preferential attachment models, which incorporate growth over time and preferential attachment to high-degree nodes, resulting in power-law degree distributions often seen in real networks.

Uploaded by

marouli90
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Data Mining and BI: Social

Network Analytics
Random Graphs

Credits: Lada Adamic


Source: https://github.com/ladamalina/coursera-sna/tree/master/Week%202.%20Random%20Graph%20Models
Outline
● Introduction to random graphs
● Degree Distribution
● Giant Component
● Average Shortest Path
Network models
● Why model?
○ simple representation of complex network
○ can derive properties mathematically
○ predict properties and outcomes
● Also: to have a strawman
○ In what ways is your real-world network different from hypothesized model?
○ What insights can be gleaned from this?
Erdös and Rényi
Erdös-Renyi: simplest network model
● Assumptions
○ nodes connect at random
○ network is undirected
● Key parameter (besides number of nodes N) : p or M
○ p = probability that any two nodes share an edge
○ M = total number of edges in the graph
what they look like
Binomial degree distribution
● (N-1,p)-model: For each potential edge we flip a biased coin
○  with probability p we add the edge
○  with probability (1-p) we don’t

Can be approximated
by Poisson distribution
Degree Distribution
● What is the probability that a node has 0,1,2,3, … edges?
● Probabilities sum to 1
Quiz
● The maximum degree of a node in a simple (no multiple edges between the
same two nodes) N node graph is
○ N
○ N-1
○ N-2
Fact
● In an Erdos-Renyi random graph the maximal degree does not vary much
from the average
○ The degrees of the nodes tend to be similar
Fact
● Random networks do not have large hubs
Giant Component
● As N increases, a giant component emerges
○ I.e. a subgraph that comprises a fraction of the whole graph
● What is the average degree z at which the giant component starts to emerge?
○ 0
○ 1
○ 3/2
○ 3
Percolation threshold
● Percolation threshold: how many edges need to be
added before the giant component appears?
● As the average degree increases to z = 1,
a giant component suddenly appears average degree
Giant component: Another angle
● How many other friends besides you does each of your friends have?
○ By property of degree distribution the average degree of your friends, you excluded, is z
○ so at z = 1, each of your friends is expected to have another friend, who in turn have another
friend, etc.
○ the giant component emerges
Why just one giant component?
● What if you had 2, how long could they be sustained as the network
densifies?
Average Shortest Path
● How many hops on average between each pair of nodes?
● again, each of your friends has z = avg. degree friends besides you
● ignoring loops, the number of people you have at distance l is zl
Friends at distance l
Nl = zl

Scaling: Average shortest path lav

lav ~ logN/logz
What does it mean in practice
● Erdös-Renyi networks can grow to be very large but nodes will be just a few
hops apart
Logarithmic axes
● powers of a number will be uniformly spaced (20, 21, 22, 23, 24,...)
Erdös-Renyi avg. shortest path in log-log
Realism
● Consider alternative mechanisms of constructing a network that are also fairly
“random”.
● How do they stack up against Erdös-Renyi?
Other models
Introduction model
● Prob-link is the p (probability of any two nodes sharing an edge) that we are
used to
● But, with probability prob-intro the other node is selected among one of our
friends’ friends and not completely at random
Static Geographical model
● Each node connects to num-neighbors of its closest neighbors
Random encounter
● People move around randomly and connect to people they bump into
Growth model
● Instead of starting out with a fixed number of nodes, nodes are added over
time
Conclusion
● in some instances the ER model is plausible
● if dynamics are different, ER model may be a poor fit
Growth and preferential attachment
models
Example online Q&A site
Uneven participation
● Many people having replied few
Times Vs Few people having
replied many times
Real-world degree distributions
● Sexual networks
● Great variation in contact numbers
● Many people with small number of
partners Vs Few people with high
number of partners
Power-law distribution
● High skew (asymmetry)
● Straight line on a loglog plot (right) Vs linear plot (left)
Poisson distribution
● Little skew (asymmetry)
● Curved on a loglog plot (right) Vs linear plot (left)
Power law distribution
● Straight line on a log-log plot

ln(p(k))=c-αln(k)

● Exponentiate both sides to get that p(k), the probability of observing an node
of degree ‘k’ is given by:

p(k)=Ck-α

● C: normalization constant (probabilities over all k must sum to 1)


● α: power law exponent
2 ingredients in generating power-law networks
● nodes appear over time (growth)
● nodes prefer to attach to nodes with many connections (preferential
attachment, cumulative advantage)
Ingredient # 1: growth over time
nodes appear one by one, each selecting m other nodes at random to connect to

m=2
Random network growth
● one node is born at each time tick
● at time t there are t nodes
● change in degree ki of node i (born at time i, with 0 < i < t)

m/t

● There are m new edges being added per unit time (with 1 new node)
● The m edges are being distributed among t nodes
Age and degree
● On average ki(t)>kj(t)
● Older nodes on average have mode degrees
Ingredient #2: preferential attachment
● Preferential attachment
○ new nodes prefer to attach to well-connected nodes over less-well connected nodes
● Process also known as:
○ Cumulative advantage
○ Rich-get-richer
○ Matthew effect
Price's preferential attachment model for citation networks

● [Price 65]
○ each new paper is generated with m citations (mean)
○ new papers cite previous papers with probability proportional to their indegree (citations)
○ what about papers without any citations?
■ each paper is considered to have a “default” citation
■ probability of citing a paper with degree k, proportional to k+1
● Power law with exponent α = 2+1/m
Cumulative advantage: how?
● Copying mechanism
● Visibility
Barabasi-Albert model
● First used to describe skewed degree distribution of the World Wide Web
● Each node connects to other nodes with probability proportional to their
degree
○ the process starts with some initial subgraph
○ each new node comes in with m edges
○ probability of connecting to node i
● Results in power-law with exponent α = 3
Random Vs Preferential
Properties of the BA graph
●  The distribution is scale free with exponent α = 3

P(k) = 2m2/k3

●  The graph is connected


○ Every new vertex is born with a link or several links (depending on whether m = 1 or m > 1)
○ It then connects to an “older” vertex, which itself connected to another vertex when it was
introduced
○ And we started from a connected core
● The older are richer
○ Nodes accumulate links as time goes on, which gives older nodes an advantage since newer
nodes are going to attach preferentially – and older nodes have a higher degree to tempt them
with than some new kid on the block
Visualization
Summary: growth models
● Most networks aren't 'born', they are made
● Nodes being added over time means that older nodes can have more time to
accumulate edges
● Preference for attaching to 'popular' nodes further skews the degree
distribution toward a power-law

You might also like