Domain-Independent Candidate Selection Techniques To Handle Heterogeneous Datasets in Semantic Web
Domain-Independent Candidate Selection Techniques To Handle Heterogeneous Datasets in Semantic Web
P.SUDHAKAR - 511813104010
Mr.G.RAJASEKARAN
J.SABARISH - 511813104008
AP/CSE
M.VIGNESH - 511813104012
1
DOMAIN: DATA MINING
Data Mining is defined as extracting information from huge
sets of data.
Data mining is the procedure of mining knowledge from
data.
APPLICATION:
Market Analysis
Fraud Detection
Production Control
Corporate Analysis
Risk Management
3/9/17 2
Due to the decentralized nature of the Semantic Web, the same
ABSTRACT
3
CONT
First of all, we propose HistSim that utilizes the matching
histories of the instances to prune instance pairs that are not
sufficiently similar to the same pool of other instances. A sigmoid
function based thresholding method is proposed to automatically
adjust the threshold for such commonality on-the-fly. We propose
DisNGram that selects candidate instance pairs by computing a
character-level similarity metric on discriminating literal values
that are chosen using domain-independent unsupervised
4
EXISTING SYSTEM
This is the most common form of text search on the Web.
Most search engines do their text query and retrieval
using keywords.
The keywords based searches they usually provide
results from blogs or other discussion boards. The user
cannot have a satisfaction with these results due to lack of
trusts on blogs etc. low precision and high recall rate.
5
DISADVANTAGES OF EXISTING SYSTEM:
6
PROPOSED SYSTEM
TECHNIQUES:
HistSim
DisNGram
7
HistSim:
HistSim candidate selection algorithm. Given a list of
instances, we compare each instance to every instance after
it (to avoid comparing two instances twice).
Each instance is associated with a set of paths as its context,
where a path starts from this given instance and ends on
another node in the entire graph
The HistSim algorithm relies on an actual entity matching
algorithm to determine whether one instance should be
added into the matching history of another instance.
Such matching histories are then utilized to decide whether
a pair of instances should be retained as a candidate pair or
not.
3/9/17 8
DisNGram:
We propose another candidate selection algorithm,
DisNGram.
Different from HistSim, DisNGram does not depend on
any actual entity matching algorithm for filtering non-
matching
3/9/17 9
ADVANTAGES OF PROPOSED SYSTEM:
10
SYSTEM ARCHITECTURE:
DATA FLOW DIAGRAM:
3/9/17 12
SYSTEM REQUIREMENTS:
HARDWARE REQUIREMENTS:
System
: Pentium IV 2.4 GHz.
Hard Disk
: 40 GB.
Floppy Drive
: 44 Mb.
Monitor
: 15 VGA Colour.
SOFTWARE REQUIREMENTS:
Operating system
: Windows 7.
Coding Language
: .NET
Database
: MySql 5
13
MODULE DESCRIPTION:
Profile-Based Personalization
Privacy Protection
Online Decision
3/9/17 14
Profile-Based Personalization :
3/9/17 15
Privacy Protection:
We propose a PWS framework that can generalize profiles
in for each query according to user-specified privacy
requirements.
Two predictive metrics are proposed to evaluate the
privacy breach risk and the query utility for hierarchical
user profile.
We develop two simple but effective generalization
algorithms for user profiles allowing for query-level
customization using our proposed metrics.
We also provide an online prediction mechanism based on
query utility for deciding whether to personalize a query.
3/9/17 16
Generalizing User Profile:
The generalization process has to meet specific condition,
to handle the user profile.
This is achieved by preprocessing the user profile. At
first, the process initializes the user profile by taking the
indicated parent user profile into account.
The process adds the inherited properties to the properties
of the local user profile. Thereafter the process loads the
data for the foreground and the background of the map
according to the described selection in the user profile.
3/9/17 17
Online Decision:
The profile-based personalization contributes little or even
reduces the search quality, while exposing the profile to a
server would for sure risk the users privacy. To address
this problem, we develop an online mechanism to decide
whether to personalize a query.
The basic idea is straightforward. if a distinct query is
identified during generalization, the entire runtime
profiling will be aborted and the query will be sent to the
server without a user profile
3/9/17 18
REFERENCES:
C. Bizer, T. Heath, and T. Berners-Lee, Linked data - the
story sofar, Int. J. Semantic Web Inf. Syst., vol. 5, no. 3, pp.
122, 2009.
J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D.
Kontokostas, P. N.Mendes, S. Hellmann, M. Morsey, P. van
Kleef, S. Auer et al.,Dbpediaa large-scale, multilingual
knowledge base extracted from wikipedia, Semantic Web,
2014.
D. Song and J. Heflin, Automatically generating data
linkages using a domain-independent candidate selection
approach, in10th International Semantic Web Conference,
2011, pp. 649664.
3/9/17 19
QUERIES???
20
THANK YOU
21