0% found this document useful (0 votes)
22 views

Domain-Independent Candidate Selection Techniques To Handle Heterogeneous Datasets in Semantic Web

The document proposes two candidate selection algorithms, HistSim and DisNGram, to improve the scalability of entity matching systems in semantic web datasets. HistSim utilizes matching histories of instances to prune non-similar instance pairs, using a threshold adjusted dynamically. DisNGram selects candidate instance pairs by computing a character-level similarity on discriminating literal values chosen through unsupervised learning. The algorithms aim to efficiently filter instance pairs to speed up the entity matching process in heterogeneous semantic web data.

Uploaded by

arunasekaran
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Domain-Independent Candidate Selection Techniques To Handle Heterogeneous Datasets in Semantic Web

The document proposes two candidate selection algorithms, HistSim and DisNGram, to improve the scalability of entity matching systems in semantic web datasets. HistSim utilizes matching histories of instances to prune non-similar instance pairs, using a threshold adjusted dynamically. DisNGram selects candidate instance pairs by computing a character-level similarity on discriminating literal values chosen through unsupervised learning. The algorithms aim to efficiently filter instance pairs to speed up the entity matching process in heterogeneous semantic web data.

Uploaded by

arunasekaran
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Domain-Independent Candidate Selection

Techniques to handle Heterogeneous Datasets


in Semantic Web
FIRST REVIEW DATE : 25.02.2017

SUBMITTED BY: GUIDED BY:

P.SUDHAKAR - 511813104010
Mr.G.RAJASEKARAN
J.SABARISH - 511813104008
AP/CSE
M.VIGNESH - 511813104012

1
DOMAIN: DATA MINING
Data Mining is defined as extracting information from huge
sets of data.
Data mining is the procedure of mining knowledge from
data.
APPLICATION:
Market Analysis
Fraud Detection
Production Control
Corporate Analysis
Risk Management

3/9/17 2
Due to the decentralized nature of the Semantic Web, the same
ABSTRACT

real-world entity may be described in various data sources with


different ontologies and assigned syntactically distinct
identifiers. In order to facilitate data utilization and
consumption in the Semantic Web, without compromising the
freedom of people to publish their data, one critical problem is
to appropriately interlink such heterogeneous data. This
interlinking process is sometimes referred to as Entity
Matching, i.e., finding which identifiers refer to the same real-
world entity. In this paper, we propose two candidate selection
algorithms to improve the scalability of entity matching
systems.

3
CONT
First of all, we propose HistSim that utilizes the matching
histories of the instances to prune instance pairs that are not
sufficiently similar to the same pool of other instances. A sigmoid
function based thresholding method is proposed to automatically
adjust the threshold for such commonality on-the-fly. We propose
DisNGram that selects candidate instance pairs by computing a
character-level similarity metric on discriminating literal values
that are chosen using domain-independent unsupervised

4
EXISTING SYSTEM
This is the most common form of text search on the Web.
Most search engines do their text query and retrieval
using keywords.
The keywords based searches they usually provide
results from blogs or other discussion boards. The user
cannot have a satisfaction with these results due to lack of
trusts on blogs etc. low precision and high recall rate.

5
DISADVANTAGES OF EXISTING SYSTEM:

In early search engine that offered not clear to search


terms.
User plays an important role in the intelligent semantic
search engine.

6
PROPOSED SYSTEM

We propose two candidate selection algorithms to


improve the scalability of entity matching systems
The proposed techniques aim at speeding up the entity
matching process by efficiently filtering.

TECHNIQUES:

HistSim

DisNGram
7
HistSim:
HistSim candidate selection algorithm. Given a list of
instances, we compare each instance to every instance after
it (to avoid comparing two instances twice).
Each instance is associated with a set of paths as its context,
where a path starts from this given instance and ends on
another node in the entire graph
The HistSim algorithm relies on an actual entity matching
algorithm to determine whether one instance should be
added into the matching history of another instance.
Such matching histories are then utilized to decide whether
a pair of instances should be retained as a candidate pair or
not.
3/9/17 8
DisNGram:
We propose another candidate selection algorithm,
DisNGram.
Different from HistSim, DisNGram does not depend on
any actual entity matching algorithm for filtering non-
matching

3/9/17 9
ADVANTAGES OF PROPOSED SYSTEM:

It enhances the stability of the search quality.

It avoids the unnecessary exposure of the user profile.

10
SYSTEM ARCHITECTURE:
DATA FLOW DIAGRAM:

3/9/17 12
SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

System
: Pentium IV 2.4 GHz.
Hard Disk
: 40 GB.
Floppy Drive
: 44 Mb.
Monitor
: 15 VGA Colour.

SOFTWARE REQUIREMENTS:

Operating system
: Windows 7.
Coding Language
: .NET
Database
: MySql 5
13
MODULE DESCRIPTION:
Profile-Based Personalization

Privacy Protection

Generalizing User Profile

Online Decision

3/9/17 14
Profile-Based Personalization :

This paper introduces an approach to personalize digital


multimedia content based on user profile information.
Two main mechanisms were developed: a profile
generator that automatically creates user profiles
representing the user preferences, and a content-based
recommendation algorithm that estimates the user's
interest in unknown content by matching her profile to
metadata descriptions of the content. Both features are
integrated into a personalization system.

3/9/17 15
Privacy Protection:
We propose a PWS framework that can generalize profiles
in for each query according to user-specified privacy
requirements.
Two predictive metrics are proposed to evaluate the
privacy breach risk and the query utility for hierarchical
user profile.
We develop two simple but effective generalization
algorithms for user profiles allowing for query-level
customization using our proposed metrics.
We also provide an online prediction mechanism based on
query utility for deciding whether to personalize a query.

3/9/17 16
Generalizing User Profile:
The generalization process has to meet specific condition,
to handle the user profile.
This is achieved by preprocessing the user profile. At
first, the process initializes the user profile by taking the
indicated parent user profile into account.
The process adds the inherited properties to the properties
of the local user profile. Thereafter the process loads the
data for the foreground and the background of the map
according to the described selection in the user profile.

3/9/17 17
Online Decision:
The profile-based personalization contributes little or even
reduces the search quality, while exposing the profile to a
server would for sure risk the users privacy. To address
this problem, we develop an online mechanism to decide
whether to personalize a query.
The basic idea is straightforward. if a distinct query is
identified during generalization, the entire runtime
profiling will be aborted and the query will be sent to the
server without a user profile

3/9/17 18
REFERENCES:
C. Bizer, T. Heath, and T. Berners-Lee, Linked data - the
story sofar, Int. J. Semantic Web Inf. Syst., vol. 5, no. 3, pp.
122, 2009.
J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D.
Kontokostas, P. N.Mendes, S. Hellmann, M. Morsey, P. van
Kleef, S. Auer et al.,Dbpediaa large-scale, multilingual
knowledge base extracted from wikipedia, Semantic Web,
2014.
D. Song and J. Heflin, Automatically generating data
linkages using a domain-independent candidate selection
approach, in10th International Semantic Web Conference,
2011, pp. 649664.
3/9/17 19
QUERIES???

20
THANK YOU

21

You might also like