0% found this document useful (0 votes)
5 views

Module VI_Mining Social Network Graph (2).pptx

Chapter 11 discusses mining social-network graphs, emphasizing the structure of social networks as graphs with nodes representing individuals and edges representing relationships. It explores various types of social networks, clustering methods, and community detection techniques, including the Clique Percolation Method (CPM) and the Girvan-Newman algorithm. The chapter highlights the significance of social network analysis in fields such as epidemiology, marketing, and intelligence gathering.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Module VI_Mining Social Network Graph (2).pptx

Chapter 11 discusses mining social-network graphs, emphasizing the structure of social networks as graphs with nodes representing individuals and edges representing relationships. It explores various types of social networks, clustering methods, and community detection techniques, including the Clique Percolation Method (CPM) and the Girvan-Newman algorithm. The chapter highlights the significance of social network analysis in fields such as epidemiology, marketing, and intelligence gathering.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

Chapter 11

Mining Social-Network Graphs


Contents
• Social Networks as Graphs
• Clustering of Social-Network Graphs
• Direct Discovery of Communities
• SimRank
• Counting triangles using Map-Reduce
What is Social Network?
• A dedicated website or other application
which enables users to communicate with
each other by posting information, comments,
messages, images, etc
Example of Social Networking Site
Technology : LinkedIn
• What is Your Network?
When your connections invite their connections, your
Network starts to grow.
Your Network is your connections, their connections, and so
on out from you at the center.
• How do you classify users?
Your Network contains professionals out to “three degrees”
— that is, friends-of-friends-of-friends. If each person had
10 connections (and some have many more) then your
network would contain 10,000 professionals.
How do you see who is in your Network?
• LinkedIn lets you see your network as one large group of
searchable professional profiles.
Social Network –Is bigdata???
• Volume
• Velocity
• Variety
Facebook Graph
Twitter Graph
Varieties of Social Networks
• There are many examples of social networks other than
“friends” networks. Here, let us enumerate some of the
other examples of networks that also exhibit locality of
relationships.
• Telephone Networks : Here the nodes represent phone
numbers, which are really individuals. There is an edge
between two nodes if a call has been placed between those
phones in some fixed period of time, such as last month, or
“ever.” The edges could be weighted by the number of calls
made between these phones during the period.
Communities in a telephone network will form from groups
of people that communicate frequently: groups of friends,
members of a club, or people working at the same
company, for example.
Varieties of Social Networks
• Email Networks The nodes represent email addresses,
which are again individuals.
• An edge represents the fact that there was at least one
email in at least one direction between the two addresses.
• Alternatively, we may only place an edge if there were
emails in both directions.
• Label edges as weak or strong. Strong edges represent
communication in both directions, while weak edges
indicate that the communication was in one direction only.
• The communities seen in email networks come from the
same sorts of groupings we mentioned in connection with
telephone networks.
• A similar sort of network involves people who text other
people through their cell phones.
Varieties of Social Networks
• Collaboration Networks Nodes represent individuals
who have published research papers.
• There is an edge between two individuals who
published one or more papers jointly.
• Optionally, we can label edges by the number of joint
publications.
• The communities in this network are authors working
on a particular topic.
• An alternative view of the same data is as a graph in
which the nodes are papers. Two papers are
connected by an edge if they have at least one author
in common.
Social Network As a Graph
• Is a social structure, normally represented as a
graph with:
• Individuals (or organisations) as nodes
• Relationships as edges
More about Social Network as a Graph
• A graph is a set of nodes or vertices V and a set of
edges or lines
• If an edge exists {a,b} then we can say that nodes
a and b are related to each other
• The edges themselves can be
1.unordered pairs of nodes
• 2. or in a directed graph (digraph), ordered
pairs of nodes where each edge has a direction,
sometimes called an arc
In this case {a,b} is an arc from a to b
• Graphs are (generally) non-reflexive; nodes are
not related to themselves
• Order is # of nodes, size is # of edges
Uses of Social Network Analysis
We can usually learn a lot about people from studying
their social network analysis
Epidemiology - to understand how patterns of human
contact affect the spread of diseases
Marketing and fashion - to uncover new trends and
major influencers
Networking - finding an optimal way of constructing a
computer network, locate points of failure and
bottlenecks
Intelligence - identifying insurgent networks and
determining leaders and active cells
Collaborative Filtering - if your friends like something
then there’s a good chance you will too • ... and loads
more
Social Network Analysis of 9/11 Terrorists
(www.orgnet.com)

Early in 2000, the CIA was informed of two terrorist


suspects linked to al-Qaeda. Nawaf Alhazmi and
Khalid Almihdhar were photographed attending a
meeting of known terrorists in Malaysia. After the
meeting they returned to Los Angeles, where they
had already set up residence in late 1999.
Social Network Analysis of 9/11 Terrorists
What do you do with these suspects? Arrest or deport them
immediately? No, we need to use them to discover more of
the al-Qaeda network.

Once suspects have been discovered, we can use their daily


activities to uncloak their network. Just like they used our
technology against us, we can use their planning process
against them. Watch them, and listen to their conversations
to see...
•who they call / email
•who visits with them locally and in other cities
•where their money comes from
The structure of their extended network begins to emerge as
data is discovered via surveillance.
Social Network Analysis of 9/11 Terrorists
Social Network Analysis of 9/11 Terrorists
Social Network Analysis of 9/11 Terrorists
We now have enough data for two key conclusions:
• All 19 hijackers were within 2 steps of the two original
suspects uncovered in 2000!
• Social network metrics reveal Mohammed Atta
emerging as the local leader
Clustering Social Networks
Clustering Social Networks
• Social networks have gained popularity recently with
the advent of sites such as MySpace, Friendster,
Facebook, etc. The number of users participating in
these networks is large, e.g., a hundred million in
MySpace, and growing.
• These networks are a rich source of data as users
populate their sites with personal information.
• A fundamental problem related to these networks is
the discovery of clusters or communities. Intuitively,
a cluster is a collection of individuals with dense
friendship patterns internally and sparse friendships
externally.
Distance Measures for Social-Network
Graphs
• When the edges of the graph have labels, these labels might
be usable as a distance measure, depending on what they
represented.
• But when the edges are unlabelled, as in a “friends” graph,
there is not much we can do to define a suitable distance.
• Nodes are close if they have an edge between them and
distant if not.
• Thus, we could say that the distance d(x, y) is 0 if there is an
edge (x, y) and 1 if there is no such edge.
• We could use any other two values, such as 1 and ∞, as long
as the distance is closer when there is an edge.
Distance Measures for Social-Network
Graphs
• Neither of these two-valued “distance
measures” – 0 and 1 or 1 and ∞ – is a true
distance measure.
• Violates the triangle inequality.
• Methods
– Hierarchical Method
– Point Assignment Method
– Betweenness Method
Applying Standard Clustering Methods
Hierarchical methods for a social-network graph :
• Suppose we use as the inter cluster distance the
minimum distance between nodes of the two clusters.
Hierarchical clustering of a social-network graph starts
by combining some two nodes that are connected by
an edge. Successively, edges that are not between two
nodes of the same cluster would be chosen randomly
to combine the clusters to which their two nodes
belong. The choices would be random, because all
distances represented by an edge are the same.
Hierarchical methods for a
social-network graph

At the highest level, it appears that there are two


communities {A, B, C} and {D, E, F, G}. However, we could
also view {D, E, F} and {D, F, G} as two sub-communities of
{D, E, F, G}; Finally, we could consider each pair of
individuals that are connected by an edge as a community
of size 2, although such communities are uninteresting.
Point Assignment method for a
social-network graph

Suppose we try a k-means approach to clustering. As we want two


clusters, we pick k = 2. If we pick two starting nodes at random, they
might both be in the same cluster. If we start with one randomly chosen
node and then pick another as far away as possible. However, suppose
we do get two suitable starting nodes, such as B and F. We shall then
assign A and C to the cluster of B and assign E and G to the cluster of F.
But D is as close to B as it is to F, so it could go either way, even though it
is “obvious” that D belongs with F.
Betweenness for social – network
Graph
• Since there are problems with standard clustering
methods, several specialized clustering techniques
have been developed to find communities in social
networks.
• one of the simplest technique called betweenness is
based on finding the edges that are least likely to be
inside a community.
Betweenness for social – network
Graph
• Betweenness of an edge (a, b) to be the number of
pairs of nodes x and y such that the edge (a, b) lies
on the shortest path between x and y.
• To be more precise, since there can be several
shortest paths between x and y, edge (a, b) is
credited with the fraction of those shortest paths
that include the edge (a, b). As in golf, a high score is
bad. It suggests that the edge (a, b) runs between
two different communities; that is, a and b do not
belong to the same community
Girvan-Newman Algorithm for
Betweenness
• Divisive hierarchical clustering based on the
notion of edge betweenness (Number of
shortest paths passing through the edge)
1. Repeat until no edges are left:
1. Calculate betweenness of edges
2. Remove edges with highest betweenness
2. Connected components are communities
3. Gives a hierarchical decomposition of the
network
Girvan-Newman Algorithm:
Example

12
1
33
49

Need to re-compute
betweenness at
every step

J. Leskovec, A. Rajaraman, J. Ullman: Mining


31
of Massive Datasets, http://www.mmds.org
Girvan-Newman: Example
Step 1: Step 2:

Step 3: Hierarchical network decomposition:

J. Leskovec, A. Rajaraman, J. Ullman: Mining


32
of Massive Datasets, http://www.mmds.org
Betweenness for social – network
Graph

In above example, the edge (B, D) has the highest


betweenness, as should surprise no one. In fact, this edge
is on every shortest path between any of A, B, and C to any
of D, E, F, and G. Its betweenness is therefore 3 × 4 = 12. In
contrast, the edge (D, F) is on only four shortest paths:
those from A, B, C, and D to F
Exercise1: MU IT Dec2016
Direct Discovery of Communities in a
Social Graph
Direct Discovery of Communities in a
Social Graph
• We can classify communities in a social
network as a group of entities which are
closely knit and can belong strictly to a single
community or can belong to more than one
community.
• Single community – Disjoint
• More than one – Overlapping
Direct Discovery of Communities in
a Social Graph

• Overlapping Community –is more natural


• How to identity community ??
• Method for community Detection :
- Clique Percolation Method (CPM)
Clique Percolation Method
(CPM)
What is CPM?
• Method to find overlapping communities

• Based on concept:

– internal edges of community likely to form cliques

– Intercommunity edges unlikely to form cliques


Clique
• Clique: Complete graph

• k-clique: Complete graph with k vertices


Clique
• Clique: Complete graph

• k-clique: Complete graph with k vertices

3-clique
Clique
• Clique: Complete graph

• k-clique: Complete graph with k vertices

4-clique
Clique
• Clique: Complete graph

• k-clique: Complete graph with k vertices

5-clique
k-Clique Communities
• Adjacent k-cliques

Two k-cliques are adjacent when they share k-1 nodes


k-Clique Communities
• Adjacent k-cliques

Two k-cliques are adjacent when they share k-1 nodes

k=3
k-Clique Communities
• Adjacent k-cliques

Two k-cliques are adjacent when they share k-1 nodes

k=3

Clique 1
k-Clique Communities
• Adjacent k-cliques

Two k-cliques are adjacent when they share k-1 nodes


Clique 2
k=3
k-Clique Communities
• Adjacent k-cliques
Two k-cliques are adjacent when they share k-1 nodes

Clique 3
k=3
k-Clique Communities
• Adjacent k-cliques
Two k-cliques are adjacent when they share k-1 nodes

Clique 2
k=3

Clique 1
k-Clique Communities
• Adjacent k-cliques
Two k-cliques are adjacent when they share k-1 nodes

Clique 2 Clique 3
k=3
k-Clique Communities
• k-clique community
Union of all k-cliques that can be reached from each
other through a series of adjacent k-cliques
k-Clique Communities
• k-clique community
Union of all k-cliques that can be reached from each
other through a series of adjacent k-cliques

Clique 2

k=3

Clique 1
k-Clique Communities
• k-clique community
Union of all k-cliques that can be reached from each
other through a series of adjacent k-cliques

Community 1
k=3
k-Clique Communities
• k-clique community
Union of all k-cliques that can be reached from each
other through a series of adjacent k-cliques

Community 1 Clique 3

k=3
k-Clique Communities
• k-clique community
Union of all k-cliques that can be reached from each
other through a series of adjacent k-cliques

Community 1 Community 2
k=3
CPM Algorithm
• Input :- The social graph G , representing a network
and a clique size k.
• Output : Set of discovered Communities C
• Step1 : All k-clique present in G are extracted
• Step 2: A new graph , the clique graph ,Gc formed
where each node represented an identified clique
and two vertices in Gc are connected by an edge ,if
they have k-1 common vertices.
• Step 3: Connected components in Gc are identified
• Step 4: Each connected component in Gc represents a
community.
• Step 5: Set C be the set of communities formed for G.
CPM Algorithm in short
• Locate maximal cliques
– Largest possible clique size can be determined from
degrees of vertices
– Starting from this size, find all cliques, then reduce size by
1 and repeat
• Convert from cliques to k-clique communities
CPM
Example 1
• Find all cliques for the given graph using CPM
showing all steps.
Exercise1: MU COMP May2016
Exercise 2: MU IT May2016
SimRank: A Measure of
Structural-Context Similarity
Motivation
• Many applications require a measure of
“similarity” between objects.
– Web search
– Shopping Recommendations
– Search for “Related Works” among scientific
papers
• But “similarity” may be domain-dependent
• Can we define a generic model for similarity?
Problem Statement
• Given a Graph G = (V, E), for each pair of
vertices a,b ∈ V, compute a similarity
(ranking) score s(a,b) based on the concept of
structural-context similarity.
Basic Graph Model
• Directed Graph G = (V,E)
– V = set of objects
– E = set of unweighted edges
– Edge (u,v) exists if there is an relation u 🡺 v
– I(v) = set of in-neighbors of vertex v
– O(v) = set of out-neighbors of vertex v
SimRank Similarity
• Recursive Model
– “Two objects are similar if they are referenced by
similar objects”
– That is, a ~ b if
• c 🡺 a and d 🡺 b, and
• c~d
– An object is equivalent to itself (score = 1)
• Example
1. ProfA ~ ProfB because both are
referenced by Univ.
2. StudentA ~ StudentB because they
are referenced by similar nodes
{ProfA,ProfB}
SimRank: Example
SimRank
SimRank

• We observe from the above that in the limit, the


walker is more than twice as likely to be at Picture 3
than at Picture 2. This analysis confirms the intuition
that Picture 3 is more like Picture 1 than Picture 2 is.
Counting Triangles
Counting triangles In Social Network

• In social network , identifying small communities and


counting their occurrence is very important.
• Small communities 🡺small subgraph=triangle
(3-clique)
• Why triangle only ??
--1. Homophily – Individuals to associate and form groups
---2. Transitivity
Triangle counting Problem
1) Given a graph G = (V, E),
• how many triangles does it have?
• Here a “triangle” is a set of three vertices that are
mutually adjacent in G.
2 )Given a graph G = (V, E), for every v ∈ V , how many
triangles in G include vertex v?

Solution to the second problem immediately yields a


solution to the first problem: just add up the n counts
and divide by 3 (since each triangle is counted exactly
once for each of the 3 vertices it contains).
Triangle Counting Algorithm
1.The brute force method
1. Count triangles in a graph is simply checking
every group of three vertices .
2. O (n3) –high computation cost
Triangle Counting Algorithm
2. Smarter approach
1. First list all two edge path that r formed in
graph.
2. Now check vertex triplets in which we already
know that two edges are present.

For every edge(x,y ) in graph check if (x,y,z)


forms a triangle ..if so add to the count of
triangle.
Triangle Counting Algorithm
Optimized way :
-- Count each triangle only once.
Step1
• Compute the degree of each node.
This part requires only that we examine each
edge and add 1 to the count of each of its two
nodes. The total time required is O(m).
Triangle Counting Algorithm
• Step2 :Create an index on edges, with the pair
of nodes at its ends as the key.
• That is, the index allows us to determine,
given two nodes, whether the edge between
them exists.
• A hash table suffices. It can be constructed in
O(m) time, and the expected time to answer a
query about the existence of an edge is a
constant
Triangle Counting Algorithm
Step 3:
- Create another index of edges, this one with
key equal to a single node.
- Given a node v, we can retrieve the nodes
adjacent to v in time proportional to the number
of those nodes.
Finding Triangles Using MapReduce
• Let G be a social graph.
• A relation on E (VxV) represent an edge.
• To avoid duplictaion :
• E(A,B) A<B
• Triangle are nothing but three way join
E(x,y) E(x,z) E(y,z)

• SELECT e1.A, e1.B, e2.B


• FROM E e1, E e2, E e3
• WHERE e1.A = e2.A AND e1.B = e3.A AND e2.B = e3.B
Summary
Social Networks as Graphs
Clustering of Social-Network Graphs
Direct Discovery of Communities
SimRank
Counting triangles using Map-Reduce
Thank you ! ! !

You might also like