Hierarchical Link Analysis For Ranking W
Hierarchical Link Analysis For Ranking W
June 1, 2010
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
Introduction
Web of Data
There is a growing increase of web data sources ...
Linked Open Data cloud;
Open Graph protocol;
e-commerces (good relations), e-government, ...
1 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
Link Analysis
Given a directed graph, determine the popularity of its nodes using
link information
A link from a node i to a node j is considered as an evidence of the
importance of node j
2 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
Link Analysis
Given a directed graph, determine the popularity of its nodes using
link information
A link from a node i to a node j is considered as an evidence of the
importance of node j
2 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
Link Analysis
Given a directed graph, determine the popularity of its nodes using
link information
A link from a node i to a node j is considered as an evidence of the
importance of node j
2 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
Link Analysis
Given a directed graph, determine the popularity of its nodes using
link information
A link from a node i to a node j is considered as an evidence of the
importance of node j
2 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
Link Analysis
Given a directed graph, determine the popularity of its nodes using
link information
A link from a node i to a node j is considered as an evidence of the
importance of node j
2 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
3 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
4 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
Dataset Graph
5 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
6 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
7 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
Linkset
Figure: Linkset
8 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
Two-Layer Model
9 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
Datasets
DBpedia 17.7 million of entities
Citeseer (RKBExplorer) 2.48 million of entities
Geonames 13.8 million of entities
Sindice 60 million of entities among 50.000 datasets
10 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
11 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
DING Principles
DING performs entity ranking in three steps:
1 dataset ranks are computed by performing link analysis on the
top layer (i.e. the dataset graph);
2 for each dataset, entity ranks are computed by performing link
analysis on the local entity collection;
3 the popularity of the dataset is propagated to its entities and
combined with their local ranks to estimate a global entity
rank.
12 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
DING Principles
DING performs entity ranking in three steps:
1 dataset ranks are computed by performing link analysis on the
top layer (i.e. the dataset graph);
2 for each dataset, entity ranks are computed by performing link
analysis on the local entity collection;
3 the popularity of the dataset is propagated to its entities and
combined with their local ranks to estimate a global entity
rank.
12 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
DING Principles
DING performs entity ranking in three steps:
1 dataset ranks are computed by performing link analysis on the
top layer (i.e. the dataset graph);
2 for each dataset, entity ranks are computed by performing link
analysis on the local entity collection;
3 the popularity of the dataset is propagated to its entities and
combined with their local ranks to estimate a global entity
rank.
12 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
DING Principles
DING performs entity ranking in three steps:
1 dataset ranks are computed by performing link analysis on the
top layer (i.e. the dataset graph);
2 for each dataset, entity ranks are computed by performing link
analysis on the local entity collection;
3 the popularity of the dataset is propagated to its entities and
combined with their local ranks to estimate a global entity
rank.
12 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
Intuition
TF-IDF applied on link labels
|Lσ,i,j | N
wσ,i,j = LF (Lσ,i,j ) × IDF (σ) = P × log
Lτ,i,k |Lτ,i,k | 1 + freq(σ)
13 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
Intuition
TF-IDF applied on link labels
|Lσ,i,j | N
wσ,i,j = LF (Lσ,i,j ) × IDF (σ) = P × log
Lτ,i,k |Lτ,i,k | 1 + freq(σ)
14 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
Intuition
TF-IDF applied on link labels
|Lσ,i,j | N
wσ,i,j = LF (Lσ,i,j ) × IDF (σ) = P × log
Lτ,i,k |Lτ,i,k | 1 + freq(σ)
15 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
Assumption
Dataset surfing behaviour is the same as the web page surfing
behaviour in PageRank
16 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
Assumption
Dataset surfing behaviour is the same as the web page surfing
behaviour in PageRank
DatasetRank
Weighted PageRank on the weighted dataset graph
17 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
Assumption
Dataset surfing behaviour is the same as the web page surfing
behaviour in PageRank
DatasetRank
Weighted PageRank on the weighted dataset graph
Distribution factor wσ,i,j is defined by LF-IDF
X |EDj |
r k (Dj ) = α r k−1 (Di )wσ,i,j + (1 − α) P
Lσ,i,j D∈G |ED |
18 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
Assumption
Dataset surfing behaviour is the same as the web page surfing
behaviour in PageRank
DatasetRank
Weighted PageRank on the weighted dataset graph
Distribution factor wσ,i,j is defined by LF-IDF
Probability of random jump is proportional to the size of a
dataset
X |EDj |
r k (Dj ) = α r k−1 (Di )wσ,i,j + (1 − α) P
Lσ,i,j D∈G |ED |
19 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
Generic Algorithms
Weighted EntityRank: Weighted PageRank applied on the internal
entities and intra-links of a dataset
Weighted LinkCount: in-degree counting links applied on the
internal entities and intra-links of a dataset
20 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
Naive approach
Purely probabilistic point of view: joint probability
Assumption: independent events
Global score rg (e) = P(e ∩ D) = r (e) ∗ r (D)
Problem: favours smaller datasets
DING Approach
Add a local entity rank factor;
Normalise local ranks to a same average based on dataset size
rg (e) = r (D) ∗ r (e) ∗ P |ED | ′
D ′ ∈G |ED |
21 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
Experimental Results
Overview
User Study
SemSearch 2010
22 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
Experiments
1 User study to evaluate qualitatively each methods;
2 Semantic Search challenge.
23 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
Exp-A
Local entity ranking (LER & LLC) on DBpedia dataset
31 participants
Exp-B
DING (DR-LER & DR-LLC) on Sindice’s page-repository
58 participants
Task
10 queries (keyword and SPARQL queries)
One result list (top-10) per algorithm
Rate algorithms (W, SW, S, SB, B) in relation to GER
24 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
25 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
Table: Chi-square test for Exp-A. The column %χ2 gives, for each
modality, its contribution to χ2 (in relative value).
Conclusion
LER and LLC provides similar results than GER. However, there is
a more significant proportion of the population that considers LER
more similar to GER.
26 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
Table: Chi-square test for Exp-B. The column %χ2 gives, for each
modality, its contribution to χ2 (in relative value).
Conclusion
It appears that DR-LLC provides a better effectiveness. A large
proportion of the population finds it slightly better than GER, and
this is reinforced by a few number of people finding it worse.
27 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
SemSearch 2010
First semantic search evaluation;
Focus on entity search.
Experiment Design
Billion Triple Challenge 2009 dataset;
92 keyword queries;
Relevance judgement on top 10 entities.
28 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
29 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
DatasetRank
1 iteration ≈ 200ms;
Good quality rank in few seconds.
30 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
Power-law distribution;
The majority of the datasets contain less than 1000 nodes.
31 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
EntityRank
55 iterations of 1 minute (for DBPedia dataset).
LinkCount
requires only 1 iteration;
can be computed on the fly with appropriate data index.
32 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
33 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
34 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
35 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion
Conclusion
DING Method
Hierarchical Link Analysis for web data;
Quality comparable or even better than standard approaches;
Lower computational complexity;
Dataset-dependent local entity ranking.
Future Work
Investigate how to detect appropriate local entity ranking
method for a dataset;
Study query-dependent ranking and how it can be combined
with DING ranking.
36 / 36