0% found this document useful (0 votes)

36 views21 pages

Domain-Independent Candidate Selection Techniques To Handle Heterogeneous Datasets in Semantic Web

The document proposes two candidate selection algorithms, HistSim and DisNGram, to improve the scalability of entity matching systems in semantic web datasets. HistSim utilizes matching histories of instances to prune non-similar instance pairs, using a threshold adjusted dynamically. DisNGram selects candidate instance pairs by computing a character-level similarity on discriminating literal values chosen through unsupervised learning. The algorithms aim to efficiently filter instance pairs to speed up the entity matching process in heterogeneous semantic web data.

Uploaded by

arunasekaran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views21 pages

Domain-Independent Candidate Selection Techniques To Handle Heterogeneous Datasets in Semantic Web

Uploaded by

arunasekaran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Domain-Independent Candidate Selection

Techniques to handle Heterogeneous Datasets

in Semantic Web
FIRST REVIEW DATE : 25.02.2017

SUBMITTED BY: GUIDED BY:

P.SUDHAKAR - 511813104010
Mr.G.RAJASEKARAN
J.SABARISH - 511813104008
AP/CSE
M.VIGNESH - 511813104012

1
DOMAIN: DATA MINING
Data Mining is defined as extracting information from huge
sets of data.
Data mining is the procedure of mining knowledge from
data.
APPLICATION:
Market Analysis
Fraud Detection
Production Control
Corporate Analysis
Risk Management

3/9/17 2
Due to the decentralized nature of the Semantic Web, the same
ABSTRACT

real-world entity may be described in various data sources with

different ontologies and assigned syntactically distinct
identifiers. In order to facilitate data utilization and
consumption in the Semantic Web, without compromising the
freedom of people to publish their data, one critical problem is
to appropriately interlink such heterogeneous data. This
interlinking process is sometimes referred to as Entity
Matching, i.e., finding which identifiers refer to the same real-
world entity. In this paper, we propose two candidate selection
algorithms to improve the scalability of entity matching
systems.

3
CONT
First of all, we propose HistSim that utilizes the matching
histories of the instances to prune instance pairs that are not
sufficiently similar to the same pool of other instances. A sigmoid
function based thresholding method is proposed to automatically
adjust the threshold for such commonality on-the-fly. We propose
DisNGram that selects candidate instance pairs by computing a
character-level similarity metric on discriminating literal values
that are chosen using domain-independent unsupervised

4
EXISTING SYSTEM
This is the most common form of text search on the Web.
Most search engines do their text query and retrieval
using keywords.
The keywords based searches they usually provide
results from blogs or other discussion boards. The user
cannot have a satisfaction with these results due to lack of
trusts on blogs etc. low precision and high recall rate.

5
DISADVANTAGES OF EXISTING SYSTEM:

In early search engine that offered not clear to search

terms.
User plays an important role in the intelligent semantic
search engine.

6
PROPOSED SYSTEM

We propose two candidate selection algorithms to

improve the scalability of entity matching systems
The proposed techniques aim at speeding up the entity
matching process by efficiently filtering.

TECHNIQUES:

HistSim

DisNGram
7
HistSim:
HistSim candidate selection algorithm. Given a list of
instances, we compare each instance to every instance after
it (to avoid comparing two instances twice).
Each instance is associated with a set of paths as its context,
where a path starts from this given instance and ends on
another node in the entire graph
The HistSim algorithm relies on an actual entity matching
algorithm to determine whether one instance should be
added into the matching history of another instance.
Such matching histories are then utilized to decide whether
a pair of instances should be retained as a candidate pair or
not.
3/9/17 8
DisNGram:
We propose another candidate selection algorithm,
DisNGram.
Different from HistSim, DisNGram does not depend on
any actual entity matching algorithm for filtering non-
matching

3/9/17 9
ADVANTAGES OF PROPOSED SYSTEM:

It enhances the stability of the search quality.

It avoids the unnecessary exposure of the user profile.

10
SYSTEM ARCHITECTURE:
DATA FLOW DIAGRAM:

3/9/17 12
SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

System
: Pentium IV 2.4 GHz.
Hard Disk
: 40 GB.
Floppy Drive
: 44 Mb.
Monitor
: 15 VGA Colour.

SOFTWARE REQUIREMENTS:

Operating system
: Windows 7.
Coding Language
: .NET
Database
: MySql 5
13
MODULE DESCRIPTION:
Profile-Based Personalization

Privacy Protection

Generalizing User Profile

Online Decision

3/9/17 14
Profile-Based Personalization :

This paper introduces an approach to personalize digital

multimedia content based on user profile information.
Two main mechanisms were developed: a profile
generator that automatically creates user profiles
representing the user preferences, and a content-based
recommendation algorithm that estimates the user's
interest in unknown content by matching her profile to
metadata descriptions of the content. Both features are
integrated into a personalization system.

3/9/17 15
Privacy Protection:
We propose a PWS framework that can generalize profiles
in for each query according to user-specified privacy
requirements.
Two predictive metrics are proposed to evaluate the
privacy breach risk and the query utility for hierarchical
user profile.
We develop two simple but effective generalization
algorithms for user profiles allowing for query-level
customization using our proposed metrics.
We also provide an online prediction mechanism based on
query utility for deciding whether to personalize a query.

3/9/17 16
Generalizing User Profile:
The generalization process has to meet specific condition,
to handle the user profile.
This is achieved by preprocessing the user profile. At
first, the process initializes the user profile by taking the
indicated parent user profile into account.
The process adds the inherited properties to the properties
of the local user profile. Thereafter the process loads the
data for the foreground and the background of the map
according to the described selection in the user profile.

3/9/17 17
Online Decision:
The profile-based personalization contributes little or even
reduces the search quality, while exposing the profile to a
server would for sure risk the users privacy. To address
this problem, we develop an online mechanism to decide
whether to personalize a query.
The basic idea is straightforward. if a distinct query is
identified during generalization, the entire runtime
profiling will be aborted and the query will be sent to the
server without a user profile

3/9/17 18
REFERENCES:
C. Bizer, T. Heath, and T. Berners-Lee, Linked data - the
story sofar, Int. J. Semantic Web Inf. Syst., vol. 5, no. 3, pp.
122, 2009.
J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D.
Kontokostas, P. N.Mendes, S. Hellmann, M. Morsey, P. van
Kleef, S. Auer et al.,Dbpediaa large-scale, multilingual
knowledge base extracted from wikipedia, Semantic Web,
2014.
D. Song and J. Heflin, Automatically generating data
linkages using a domain-independent candidate selection
approach, in10th International Semantic Web Conference,
2011, pp. 649664.
3/9/17 19
QUERIES???

20
THANK YOU

Synthetic Data Generation: A Beginner’s Guide
From Everand
Synthetic Data Generation: A Beginner’s Guide
Robert Johnson
No ratings yet
Unit I
No ratings yet
Unit I
65 pages
Big Data and Social Science-Dikompresi
No ratings yet
Big Data and Social Science-Dikompresi
81 pages
IRS Unit 4 by Krishna
No ratings yet
IRS Unit 4 by Krishna
23 pages
3453 Compressed
No ratings yet
3453 Compressed
55 pages
Scalable Entity Resolution
No ratings yet
Scalable Entity Resolution
66 pages
Irs Unit-4 Notes - 241202 - 150037
No ratings yet
Irs Unit-4 Notes - 241202 - 150037
18 pages
Web Search Engingine Indexing Crawling and Ranking
No ratings yet
Web Search Engingine Indexing Crawling and Ranking
63 pages
Irs Unit - 4
No ratings yet
Irs Unit - 4
29 pages
Improving Retrieval Augmented Generation
No ratings yet
Improving Retrieval Augmented Generation
33 pages
Informaiton Retrieval and Web Search
No ratings yet
Informaiton Retrieval and Web Search
44 pages
MSC Jeevankrishna 2020
No ratings yet
MSC Jeevankrishna 2020
65 pages
Introduction To DBMS Theory and Practicals
100% (1)
Introduction To DBMS Theory and Practicals
205 pages
Social Information Retrival Thesis
No ratings yet
Social Information Retrival Thesis
104 pages
CS614 FinalTerm Solved Papers
No ratings yet
CS614 FinalTerm Solved Papers
24 pages
1preprocessing Crawling Laws PDF
No ratings yet
1preprocessing Crawling Laws PDF
53 pages
Module 7 Mining Object Spatial Multimedia Text and Web Data
100% (1)
Module 7 Mining Object Spatial Multimedia Text and Web Data
28 pages
Format
No ratings yet
Format
15 pages
23 State of The Art
No ratings yet
23 State of The Art
61 pages
Generative Certification Notes-1
No ratings yet
Generative Certification Notes-1
22 pages
Iv Year Technical Seminar Presentation
No ratings yet
Iv Year Technical Seminar Presentation
16 pages
4
No ratings yet
4
16 pages
Seminar PPT
No ratings yet
Seminar PPT
15 pages
Seminar
No ratings yet
Seminar
15 pages
Irs Unit-4 Modified
No ratings yet
Irs Unit-4 Modified
13 pages
Introduction To Telecom Technologies (Telecom) : Getachew Mamo
No ratings yet
Introduction To Telecom Technologies (Telecom) : Getachew Mamo
65 pages
IV Year Technical Seminar Presentation
No ratings yet
IV Year Technical Seminar Presentation
16 pages
WDM 3,4,5
No ratings yet
WDM 3,4,5
12 pages
Internet Research: What's Hot in Search, Advertizing & Cloud Computing
No ratings yet
Internet Research: What's Hot in Search, Advertizing & Cloud Computing
59 pages
Oops Lab Record Print
No ratings yet
Oops Lab Record Print
110 pages
Ec8551 - Communication Networks MCQ
90% (10)
Ec8551 - Communication Networks MCQ
35 pages
Recomender Systems - Anoverview
No ratings yet
Recomender Systems - Anoverview
26 pages
FYP Proposal
No ratings yet
FYP Proposal
18 pages
Document For Scribd
No ratings yet
Document For Scribd
54 pages
1.explain User Search Techniques
No ratings yet
1.explain User Search Techniques
8 pages
MA 4151 Applied Probability and Statistics For Computer Science Engineers Old Question Paper
33% (3)
MA 4151 Applied Probability and Statistics For Computer Science Engineers Old Question Paper
6 pages
Fake Profiling
No ratings yet
Fake Profiling
18 pages
IRS III Year UNIT-3 Part 1
50% (2)
IRS III Year UNIT-3 Part 1
18 pages
CS3581 Networks Lab Manual For 2021
No ratings yet
CS3581 Networks Lab Manual For 2021
70 pages
UNIT 4 Mining Object Spatial Multimedia Text and Web Data
No ratings yet
UNIT 4 Mining Object Spatial Multimedia Text and Web Data
30 pages
Mining User Access Log Using Evolutionary Approach For Clustering
No ratings yet
Mining User Access Log Using Evolutionary Approach For Clustering
33 pages
Marc Snir NGDM07
No ratings yet
Marc Snir NGDM07
36 pages
Big Data Searching FIRST Review
No ratings yet
Big Data Searching FIRST Review
10 pages
Data Link Layer
No ratings yet
Data Link Layer
88 pages
CS317 IR W1a
No ratings yet
CS317 IR W1a
20 pages
AI Methods in Data Warehousing: A System Architectural View
No ratings yet
AI Methods in Data Warehousing: A System Architectural View
30 pages
Assignment
No ratings yet
Assignment
53 pages
Unit 4
No ratings yet
Unit 4
75 pages
Unit-4 1
No ratings yet
Unit-4 1
7 pages
Classification and Ranking Algorithm For An Recommendations
No ratings yet
Classification and Ranking Algorithm For An Recommendations
21 pages
Brain Stroke BTH09 F01
No ratings yet
Brain Stroke BTH09 F01
20 pages
Cse3024 Web-Mining Eth 1.1 47 Cse3024 PDF
No ratings yet
Cse3024 Web-Mining Eth 1.1 47 Cse3024 PDF
12 pages
ISTQB Foundation Level (CTFL) Syllabus
No ratings yet
ISTQB Foundation Level (CTFL) Syllabus
11 pages
Icwet 1094
No ratings yet
Icwet 1094
6 pages
Supporting Privacy Protection in Personalized Web Search
No ratings yet
Supporting Privacy Protection in Personalized Web Search
16 pages
Is Question Bank
No ratings yet
Is Question Bank
10 pages
CN Lab Manual
No ratings yet
CN Lab Manual
76 pages
DP-900 Dump
67% (6)
DP-900 Dump
64 pages
Web Mining
No ratings yet
Web Mining
10 pages
CS6311 - PDS - 2 Lab Manual by Rajasekaran
No ratings yet
CS6311 - PDS - 2 Lab Manual by Rajasekaran
95 pages
CS6311 - PDS - 2 Lab Manual by Rajasekaran
No ratings yet
CS6311 - PDS - 2 Lab Manual by Rajasekaran
95 pages
IP Addressing
No ratings yet
IP Addressing
52 pages
1 3 1-Logic-Gates - Removed
No ratings yet
1 3 1-Logic-Gates - Removed
22 pages
Django Forms
No ratings yet
Django Forms
8 pages
Multiple Access
No ratings yet
Multiple Access
47 pages
Web Services and API
No ratings yet
Web Services and API
11 pages
A Personalized Ontology Model For Web Information Gathering Using Local Instance Repository
No ratings yet
A Personalized Ontology Model For Web Information Gathering Using Local Instance Repository
7 pages
(IJCST-V8I4P6) :Dr.R.Satheesh Kumar
No ratings yet
(IJCST-V8I4P6) :Dr.R.Satheesh Kumar
5 pages
2.1-WSCL MAX Tech 04111396380584
No ratings yet
2.1-WSCL MAX Tech 04111396380584
37 pages
A Survey On Approaches of Web Mining in Varied Areas
No ratings yet
A Survey On Approaches of Web Mining in Varied Areas
6 pages
31769h Unit2 Que 20230322
No ratings yet
31769h Unit2 Que 20230322
20 pages
IEEE Conference Template 1
No ratings yet
IEEE Conference Template 1
3 pages
Read Me
No ratings yet
Read Me
1 page
Providing Query Suggestions and Ranking For User Search History
No ratings yet
Providing Query Suggestions and Ranking For User Search History
5 pages
Background (1/4) : Slide 1 Slide 3
No ratings yet
Background (1/4) : Slide 1 Slide 3
7 pages
Hardware Specifications: Huawei Usg6000 V500R001
No ratings yet
Hardware Specifications: Huawei Usg6000 V500R001
35 pages
Resource Capability Discovery and Description Management System For Bioinformatics Data and Service Integration - An Experiment With Gene Regulatory Networks
No ratings yet
Resource Capability Discovery and Description Management System For Bioinformatics Data and Service Integration - An Experiment With Gene Regulatory Networks
6 pages
Web People Search Using Ontology Based Decision Tree
No ratings yet
Web People Search Using Ontology Based Decision Tree
8 pages
Multimedia Question Answering System Using Diverse Relevance Ranking
No ratings yet
Multimedia Question Answering System Using Diverse Relevance Ranking
11 pages
A Development of A Web Based Application System of QR Code Location Generator and Scanner Named QR Location
No ratings yet
A Development of A Web Based Application System of QR Code Location Generator and Scanner Named QR Location
8 pages
7 Zip Benchmarks
No ratings yet
7 Zip Benchmarks
39 pages
MG6088-Software Project Management
100% (1)
MG6088-Software Project Management
9 pages
Semantic Search Log For Social Personalized Search
No ratings yet
Semantic Search Log For Social Personalized Search
6 pages
CS2028 UNIX INTERNALS Question Bank New
No ratings yet
CS2028 UNIX INTERNALS Question Bank New
10 pages
4.on Demand Quality of Web Services Using Ranking by Multi Criteria-31-35
No ratings yet
4.on Demand Quality of Web Services Using Ranking by Multi Criteria-31-35
5 pages
Enhance Privacy Search in Web Search Engine Using Greedy Algorithm
No ratings yet
Enhance Privacy Search in Web Search Engine Using Greedy Algorithm
4 pages
Touch With Industry
No ratings yet
Touch With Industry
3 pages
Unit-4 Logical Design
100% (1)
Unit-4 Logical Design
25 pages
Topics For WS and Major Project
No ratings yet
Topics For WS and Major Project
3 pages
Revit24 NewFeaturesPresentation WD
100% (1)
Revit24 NewFeaturesPresentation WD
15 pages
A New Survey On Upgrade Query Testimonial Technique Supporting Exploratory Search Using Search Goal Shift Graph
No ratings yet
A New Survey On Upgrade Query Testimonial Technique Supporting Exploratory Search Using Search Goal Shift Graph
3 pages
Go Tutorial PDF
No ratings yet
Go Tutorial PDF
45 pages
FA2 Prog Tools
No ratings yet
FA2 Prog Tools
16 pages
MN67594 Eng
No ratings yet
MN67594 Eng
22 pages
Conceptual Architecture: John Reekie University of Technology, Sydney
No ratings yet
Conceptual Architecture: John Reekie University of Technology, Sydney
22 pages
SRM Valliammai Engineering College: Department of Computer Science and Engineering Question Bank
No ratings yet
SRM Valliammai Engineering College: Department of Computer Science and Engineering Question Bank
13 pages
E-Learning: A. Pauline Chitra, M. Antoney Raj
No ratings yet
E-Learning: A. Pauline Chitra, M. Antoney Raj
3 pages
FDP On AI
100% (1)
FDP On AI
10 pages
May Broadband Bill
No ratings yet
May Broadband Bill
1 page
21BLC1206 Experiment3
No ratings yet
21BLC1206 Experiment3
4 pages
Placement Details: 2017-18 IDBI Federal Life Insurance Co LTD)
No ratings yet
Placement Details: 2017-18 IDBI Federal Life Insurance Co LTD)
7 pages
Idbi - Chennai Offer Letter-Podhigai - 17-18
No ratings yet
Idbi - Chennai Offer Letter-Podhigai - 17-18
2 pages
Visit Our Infosys Interview Preparation Dashboard
No ratings yet
Visit Our Infosys Interview Preparation Dashboard
7 pages
CS6801
No ratings yet
CS6801
7 pages
Websphere Interview Questions and Answers
No ratings yet
Websphere Interview Questions and Answers
2 pages
Cs3353 Foundations of Data Science L T P C 3 0 0 3
No ratings yet
Cs3353 Foundations of Data Science L T P C 3 0 0 3
2 pages
Address
No ratings yet
Address
5 pages
Ipmitool Fru List
No ratings yet
Ipmitool Fru List
2 pages
A10 Datasheet
No ratings yet
A10 Datasheet
2 pages
HI5033-Tutorial 6
No ratings yet
HI5033-Tutorial 6
2 pages
Using The Layer 2 Traceroute Utility
No ratings yet
Using The Layer 2 Traceroute Utility
4 pages
BISetup Tables List
No ratings yet
BISetup Tables List
4 pages
Symantec Endpoint Protection Manager System Requirements
No ratings yet
Symantec Endpoint Protection Manager System Requirements
4 pages
USRQSG004903 RST100B Web
No ratings yet
USRQSG004903 RST100B Web
2 pages
Closest Pair of Points Problem
No ratings yet
Closest Pair of Points Problem
3 pages
Mini Project Computer Networking Troubleshooting
No ratings yet
Mini Project Computer Networking Troubleshooting
2 pages
Cs8392 Object Oriented Programming
No ratings yet
Cs8392 Object Oriented Programming
2 pages
Export QTP Results To PDF Using Vbscript
No ratings yet
Export QTP Results To PDF Using Vbscript
2 pages

Domain-Independent Candidate Selection Techniques To Handle Heterogeneous Datasets in Semantic Web

Uploaded by

Domain-Independent Candidate Selection Techniques To Handle Heterogeneous Datasets in Semantic Web

Uploaded by

Domain-Independent Candidate Selection

Techniques to handle Heterogeneous Datasets

SUBMITTED BY: GUIDED BY:

real-world entity may be described in various data sources with

In early search engine that offered not clear to search

We propose two candidate selection algorithms to

It enhances the stability of the search quality.

It avoids the unnecessary exposure of the user profile.

Generalizing User Profile

This paper introduces an approach to personalize digital

You might also like