BDA Module 5 COMP
BDA Module 5 COMP
Module 5
Real-Time Big Data Models
These suggestions are from the recommender system. The paradigms used are as follows:
1. Collaborative-filtering system:
It uses community data from peer groups for recommendations. This exhibits all those things
that are popular among the peers. Collaborative filtering systems recommend items based on
similarity measures between users and/or items. The items recommended to a user are those
preferred by similar users (community data). In this, user profile and contextual parameters along with
the community data are used by the recommender systems to personalize the recommendation list.
2. Content-based systems:
They examine properties of the items recommended. For example, if a user has watched many
“scientific fiction” movies, then the recommender system will recommend a movie classified in the
database as having the “scientific fiction” genre. Content-based systems take input from the user
profile and the contextual parameters along with product features to make the recommendation list.
Social Networks
The web-based dictionary Webopedia defines a social network as “A social structure made of
nodes that are generally individuals or organizations”. A social network represents relationships and
flows between people, groups, organizations, computers or other information/knowledge processing
entities. The term “Social Network” was coined in 1954 by J. A. Barnes. Examples of social networks
include Facebook, LinkedIn, Twitter, Reddit, etc.
The following are the typical characteristics of any social network:
1. In a social network scenario, the nodes are typically people. But there could be other
entities like companies, documents, computers, etc.
2. A social network can be considered as a heterogeneous and multi-relational dataset
represented by a graph. Both nodes and edges can have attributes. Objects may have class
labels.
3. There is at least one relationship between entities of the network. For example, social
networks like Facebook connect entities through a relationship called friends. In LinkedIn,
one relationship is “endorse” where people can endorse other people for their skills.
Examples of Graph
Example 1.
Figure 1 below shows a small graph of the “followers” network of Twitter. The relationship
between the edges is the “follows” relationship. Jack follows Kris and Pete shown by the
direction of the edges. Jack and Mary follow each other shown by the bi-directional edges.
Bob and Tim follow each other as do Bob and Kris, Eve and Tim. Pete follows Eve and Bob,
Mary follows Pete and Bob follows Alex. Notice that the edges are not labeled, thus follows
is a binary connection. Either a person follows somebody or does not.
Example 2.
Consider LiveJournal which is a free on-line blogging community where users declare
friendship to each other. LiveJournal also allows users to form a group which other members
can then join. The graph depicted in Figure 2 shows a portion of such a graph. Notice that the
edges are undirected, indicating that the “friendship” relation is commutative.
Example 3 As a third example, we consider DBLP (Digital Bibliography & Library Project).
Computer science bibliography provides a comprehensive list of research papers in computer
science. As depicted in Fig. 11.3, the graph shows a co-authorship relationship where two
authors are connected if they publish at least one paper together. The edge is labeled by the
number of papers these authors have co-authored together.
Girvan−Newman Algorithm
Girvan and Newman proposed a hierarchical divisive clustering technique for social
graphs that use EB as the distance measure. The basic intuition behind this algorithm is that
edges with EB are the most “vital” edges for connecting different dense regions of the
network, and by removing these edges we can naturally discover dense communities. The
algorithm is as follows:
1. Calculate EB score for all edges in the graph. We can store it in a distance matrix as usual.
2. Identify the edge with the highest EB score and remove it from the graph. If there are
several edges with the same high EB score, all of them can be removed in one step. If this
step causes the graph to separate into disconnected sub-graphs, these form the first-level
communities.
3. Re-compute the EB score for all the remaining edges.
4. Repeat from step 2. Continue until the graph is partitioned into as many communities as
desired or the highest EB score is below a pre-defined threshold value.
Overlapping communities: Many examples are available for both these types of communities
but in the social network arena where nodes are individuals, it is natural that the individuals
can belong to several different communities at a time and thus overlapping communities
would be more natural.
The last section discussed a few algorithms which resulted in mutually disjoint
communities. Further these algorithms used graph partitioning techniques for identifying
communities. In this section we give a very brief bird’s eye view of several other community
detection techniques that can also identify overlapping communities.
For Cliques and Community and CPM method you can refer following video link
https://youtu.be/kZ9pd59_ToU