0% found this document useful (0 votes)
45 views14 pages

BDA Module 5 COMP

Uploaded by

Siddhesh Mestry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views14 pages

BDA Module 5 COMP

Uploaded by

Siddhesh Mestry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

SSPM’s College Of Engineering, Kankavli

Class: BE COMP Subject: BDA

Module 5
Real-Time Big Data Models

What is the Use of Recommender System?


For business, the recommender system can increase sales. The customer service can
be personalized and thereby gain customer trust and loyalty. It increases the knowledge
about the customers. The recommender system can also give opportunities to persuade the
customers and decide on the discount offers.
For customers, the recommender system can help to narrow down their choices, find
things of their interests, and make navigation through the list easy and to discover new things.

Where Do We See Recommendations?


Recommendations are commonly seen in e-commerce systems, LinkedIn, friend
recommendation on Facebook, song recommendation at FM, news recommendations at
Forbes.com, etc.

What Does Recommender System Do?


It takes the user model consisting of ratings, preferences, demographics, etc. and
items with its descriptions as input, finds relevant score which is used for ranking, and finally
recommends items that are relevant to the user. By estimating the relevance, information
overload can be reduced. But the relevance can be context-dependent.

A Model for Recommendation Systems


Recommendation system is a facility that involves predicting user responses to options in web
applications.
We have seen the following recommendations:
1. “You may also like these…”, “People who liked this also liked…”.
2. If you download presentations from slideshare, it says “similar content you can save and
browse later”.

These suggestions are from the recommender system. The paradigms used are as follows:
1. Collaborative-filtering system:
It uses community data from peer groups for recommendations. This exhibits all those things
that are popular among the peers. Collaborative filtering systems recommend items based on
similarity measures between users and/or items. The items recommended to a user are those
preferred by similar users (community data). In this, user profile and contextual parameters along with
the community data are used by the recommender systems to personalize the recommendation list.

2. Content-based systems:
They examine properties of the items recommended. For example, if a user has watched many
“scientific fiction” movies, then the recommender system will recommend a movie classified in the
database as having the “scientific fiction” genre. Content-based systems take input from the user
profile and the contextual parameters along with product features to make the recommendation list.

1 BDA Module 4 notes by P A Ghadigaonkar


SSPM’s College Of Engineering, Kankavli
Class: BE COMP Subject: BDA

Applications of Social Network Mining


Applications of social network mining encompass numerous fields. A few popular ones are
enumerated here:
1. Viral marketing is an application of social network mining that explores how individuals
can influence the buying behaviour of others. Viral marketing aims to optimize the
positive word-of-mouth effect among customers. Social network mining can identify
strong communities and influential nodes. It can choose to spend more money marketing
to an individual if that person has many social connections.
2. Similarly in the e-commerce domain, the grouping together of customers with similar
buying profiles enables more personalized recommendation engines. Community
discovery in mobile ad-hoc networks can enable efficient message routing and posting.
3. Social network analysis is used extensively in a wide range of applications, which include
data aggregation and mining, network propagation modelling, network modelling and
sampling, user attribute and behaviour analysis, community-maintained resource
support, location-based interaction analysis, social sharing and filtering, recommender
systems, etc.
4. Many businesses use social network analysis to support activities such as customer
interaction and analysis, information system development analysis, targeted marketing,
etc.

Social Networks as a Graph


If we represent a social network is by a graph. The graph is typically very large, with
nodes corresponding to the objects and the edges corresponding to the links representing
relationships or interactions between objects. One node indicates one person or a group of
persons.
As an example, consider a set of individuals on a social networking site like LinkedIn.
LinkedIn allows users to create profiles and “connections” with each other in an online social
network which may represent real-world professional relationships. Thus, finding a person on
a LinkedIn network is like finding a path in a graph representing the social network.

Social Networks
The web-based dictionary Webopedia defines a social network as “A social structure made of
nodes that are generally individuals or organizations”. A social network represents relationships and
flows between people, groups, organizations, computers or other information/knowledge processing
entities. The term “Social Network” was coined in 1954 by J. A. Barnes. Examples of social networks
include Facebook, LinkedIn, Twitter, Reddit, etc.
The following are the typical characteristics of any social network:
1. In a social network scenario, the nodes are typically people. But there could be other
entities like companies, documents, computers, etc.
2. A social network can be considered as a heterogeneous and multi-relational dataset
represented by a graph. Both nodes and edges can have attributes. Objects may have class
labels.
3. There is at least one relationship between entities of the network. For example, social
networks like Facebook connect entities through a relationship called friends. In LinkedIn,
one relationship is “endorse” where people can endorse other people for their skills.

2 BDA Module 4 notes by P A Ghadigaonkar


SSPM’s College Of Engineering, Kankavli
Class: BE COMP Subject: BDA
4. In many social networks, this relationship need not be yes or no (binary) but can have a
degree. That means in LinkedIn, we can have a degree of endorsement of a skill like say
novice, expert, etc. Degree can also be a real number.
5. We assume that social networks exhibit the property of non-randomness, often called
locality. Locality is the property of social networks that says nodes and edges of the graph
tend to cluster in communities. This condition is the hardest to formalize, but the intuition
is that the relationships tend to cluster. That is, if entity A is related to both B and C, then
there is a higher probability than average that B and C are related. The idea is most
relationships in the real worlds tend to cluster around a small set of individuals.

Examples of Graph
Example 1.
Figure 1 below shows a small graph of the “followers” network of Twitter. The relationship
between the edges is the “follows” relationship. Jack follows Kris and Pete shown by the
direction of the edges. Jack and Mary follow each other shown by the bi-directional edges.
Bob and Tim follow each other as do Bob and Kris, Eve and Tim. Pete follows Eve and Bob,
Mary follows Pete and Bob follows Alex. Notice that the edges are not labeled, thus follows
is a binary connection. Either a person follows somebody or does not.

Figure 1: A social graph depicting follows relationship in Twitter.

Example 2.
Consider LiveJournal which is a free on-line blogging community where users declare
friendship to each other. LiveJournal also allows users to form a group which other members
can then join. The graph depicted in Figure 2 shows a portion of such a graph. Notice that the
edges are undirected, indicating that the “friendship” relation is commutative.

3 BDA Module 4 notes by P A Ghadigaonkar


SSPM’s College Of Engineering, Kankavli
Class: BE COMP Subject: BDA

Figure 2 Social graph of LiveJournal.

Example 3 As a third example, we consider DBLP (Digital Bibliography & Library Project).
Computer science bibliography provides a comprehensive list of research papers in computer
science. As depicted in Fig. 11.3, the graph shows a co-authorship relationship where two
authors are connected if they publish at least one paper together. The edge is labeled by the
number of papers these authors have co-authored together.

Figure 3. Social graph of DBLP.

Types of Social Networks


While Facebook, Twitter and LinkedIn might be the first examples that come to mind when
discussing social networks, these examples in no way represent the full scope of social networks that
exist. In this section, we give a brief overview of different scenarios that may result in social networks.
This is by no means an exhaustive list. The interested user can refer to
“http://snap.stanford.edu/data” for a rather large list of social network examples.

1. Collaboration Graph: Collaboration graphs display interactions which can be termed


collaborations in a specific setting; co-authorships among scientists and co-appearance in
movies by actors and actresses are two examples of collaboration graphs. Other examples
include the Wikipedia collaboration graph (connecting two Wikipedia editors if they have
ever edited the same article), or in sport the “NBA graph” whose vertices are players
where two players are joined by an edge if they have ever played together on the same
team.

4 BDA Module 4 notes by P A Ghadigaonkar


SSPM’s College Of Engineering, Kankavli
Class: BE COMP Subject: BDA
2. Who-Talks-to-Whom Graphs: These model a type of social network also called
“Communication” network. The best example is the Microsoft Instant Messenger (IM)
graph which captures a snapshot of a large group of users engaged in several billion
conversations in a time period. The edges between the two nodes of the graph, say, A and
B denote that A “talked-to” B in this time period. Edges can be labeled with the strength
of the communication too. Examples of similar networks include the email
communication networks with edges representing communication between the entities.
3. Information Linkage Graphs: Snapshots of the web are central examples of such network
datasets. Nodes are Web pages and directed edges represent links from one page to
another. The web consists of hundreds of millions of personal pages on social networking
and blogging sites, each connecting to web pages of interest to form communities.
Examples include product co-purchasing networks (nodes represent products and edges
link commonly co-purchased products), Internet networks (nodes represent computers
and edges communication), road networks (nodes represent intersections and edges
roads connecting the intersections), location-based online social networks, Wikipedia,
online communities like Reddit and Flickr, and many such more. The list is endless.
4. Heterogeneous Social Networks: Finally, a social network may have heterogeneous
nodes and links. This means that the network may consist of different types of entities
(multi-mode network) and relationships (multi-relational network). For example, a simple
co-purchasing network may have product nodes and customer nodes. Edges may exist if
a customer purchases a product. Similarly, in a typical e-commerce platform like Amazon,
it is reasonable to assume the existence of distributor information to add a third type of
node. We now have edges between a customer and a product, a product sold by a
distributor, and also a customer and a distributor. Different customers could buy the same
product form different distributors.
The natural way to represent such information is as a k-partite graph for some k > 1,
where k is the number of different node types. In general, a k-partite graph consists of k
disjoint sets of nodes, with no edges between nodes of the same set. One special graph is
a bi-partite graph where k = 2.

Clustering of Social Graphs


‐ The discovery of “communities” in a social network is one fundamental requirement in
several social network applications.
‐ Fundamentally, communities allow us to discover groups of interacting objects (i.e.,
nodes) and the relations between them.
‐ Typically in a social network we can define a “community” as a collection of individuals
with dense relationship patterns within a group and sparse links outside the group.
‐ Communities: It is group of node that have similar roles or functions within the network.
‐ It also defined as cluster of nodes that share similar attributes or characteristics.
‐ Communities allow us to discover groups of interacting objects (i.e., nodes) and the
relations between them. For example, in co-authorship networks, communities
correspond to a set of common scientific disciplines and a closely knit set of researchers
on that topic.
‐ One simple method for finding communities in a social graph is to use
clustering techniques on the social graph.
‐ For big data there is a popular graph clustering technique called the Girvan−Newman
Algorithm.

5 BDA Module 4 notes by P A Ghadigaonkar


SSPM’s College Of Engineering, Kankavli
Class: BE COMP Subject: BDA
‐ For Girvan Newman Algorithm you can refer following video link
https://youtu.be/JxFf_oLRq9o
‐ Example

6 BDA Module 4 notes by P A Ghadigaonkar


SSPM’s College Of Engineering, Kankavli
Class: BE COMP Subject: BDA

7 BDA Module 4 notes by P A Ghadigaonkar


SSPM’s College Of Engineering, Kankavli
Class: BE COMP Subject: BDA

8 BDA Module 4 notes by P A Ghadigaonkar


SSPM’s College Of Engineering, Kankavli
Class: BE COMP Subject: BDA

Girvan−Newman Algorithm
Girvan and Newman proposed a hierarchical divisive clustering technique for social
graphs that use EB as the distance measure. The basic intuition behind this algorithm is that
edges with EB are the most “vital” edges for connecting different dense regions of the
network, and by removing these edges we can naturally discover dense communities. The
algorithm is as follows:
1. Calculate EB score for all edges in the graph. We can store it in a distance matrix as usual.
2. Identify the edge with the highest EB score and remove it from the graph. If there are
several edges with the same high EB score, all of them can be removed in one step. If this
step causes the graph to separate into disconnected sub-graphs, these form the first-level
communities.
3. Re-compute the EB score for all the remaining edges.
4. Repeat from step 2. Continue until the graph is partitioned into as many communities as
desired or the highest EB score is below a pre-defined threshold value.

9 BDA Module 4 notes by P A Ghadigaonkar


SSPM’s College Of Engineering, Kankavli
Class: BE COMP Subject: BDA

Direct Discovery of Communities in a social graph

Overlapping communities: Many examples are available for both these types of communities
but in the social network arena where nodes are individuals, it is natural that the individuals
can belong to several different communities at a time and thus overlapping communities
would be more natural.
The last section discussed a few algorithms which resulted in mutually disjoint
communities. Further these algorithms used graph partitioning techniques for identifying
communities. In this section we give a very brief bird’s eye view of several other community
detection techniques that can also identify overlapping communities.

Clique Percolation Method (CPM)

For Cliques and Community and CPM method you can refer following video link

https://youtu.be/kZ9pd59_ToU

10 BDA Module 4 notes by P A Ghadigaonkar


SSPM’s College Of Engineering, Kankavli
Class: BE COMP Subject: BDA

11 BDA Module 4 notes by P A Ghadigaonkar


SSPM’s College Of Engineering, Kankavli
Class: BE COMP Subject: BDA

12 BDA Module 4 notes by P A Ghadigaonkar


SSPM’s College Of Engineering, Kankavli
Class: BE COMP Subject: BDA

13 BDA Module 4 notes by P A Ghadigaonkar


SSPM’s College Of Engineering, Kankavli
Class: BE COMP Subject: BDA

14 BDA Module 4 notes by P A Ghadigaonkar

You might also like