0% found this document useful (0 votes)
29 views

Hierarchical Link Analysis For Ranking W

The document describes a hierarchical link analysis approach called DING for ranking entities in the web of data. DING computes rankings in three steps: (1) it ranks datasets by analyzing links between them, (2) it ranks entities within each dataset by analyzing local links, and (3) it combines the dataset rankings with the local entity rankings to estimate overall entity importance. DING considers both the link structure and hierarchical structure of web data, unlike previous approaches that only considered links.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Hierarchical Link Analysis For Ranking W

The document describes a hierarchical link analysis approach called DING for ranking entities in the web of data. DING computes rankings in three steps: (1) it ranks datasets by analyzing links between them, (2) it ranks entities within each dataset by analyzing local links, and (3) it combines the dataset rankings with the local entity rankings to estimate overall entity importance. DING considers both the link structure and hierarchical structure of web data, unlike previous approaches that only considered links.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Hierarchical Link Analysis for Ranking Web Data

Renaud Delbru, Nickolai Toupikov, Michele Catasta, Giovanni


Tummarello, and Stefan Decker
Digital Enterprise Research Institute, Galway

June 1, 2010
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Introduction

Web of Data
There is a growing increase of web data sources ...
Linked Open Data cloud;
Open Graph protocol;
e-commerces (good relations), e-government, ...

How to search and retrieve relevant information ?


One single query can return million of entities ...
... and users expect only the most relevant ones.
Web data search engines (e.g., Sindice) need effective way to
rank entities.
Partial solution: Popularity-based entity ranking.

1 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Link Analysis on the Web

Link Analysis
Given a directed graph, determine the popularity of its nodes using
link information
A link from a node i to a node j is considered as an evidence of the
importance of node j

Link Analysis for Web Documents


PageRank considers exclusively link structure
Hierarchical Link Analysis consider both link structure and
hierarchical structure

Link Analysis for Web Data


Current approaches consider exclusively link structure
Sindice: Dataset/Entity centric view

2 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Link Analysis on the Web

Link Analysis
Given a directed graph, determine the popularity of its nodes using
link information
A link from a node i to a node j is considered as an evidence of the
importance of node j

Link Analysis for Web Documents


PageRank considers exclusively link structure
Hierarchical Link Analysis consider both link structure and
hierarchical structure

Link Analysis for Web Data


Current approaches consider exclusively link structure
Sindice: Dataset/Entity centric view

2 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Link Analysis on the Web

Link Analysis
Given a directed graph, determine the popularity of its nodes using
link information
A link from a node i to a node j is considered as an evidence of the
importance of node j

Link Analysis for Web Documents


PageRank considers exclusively link structure
Hierarchical Link Analysis consider both link structure and
hierarchical structure

Link Analysis for Web Data


Current approaches consider exclusively link structure
Sindice: Dataset/Entity centric view

2 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Link Analysis on the Web

Link Analysis
Given a directed graph, determine the popularity of its nodes using
link information
A link from a node i to a node j is considered as an evidence of the
importance of node j

Link Analysis for Web Documents


PageRank considers exclusively link structure
Hierarchical Link Analysis consider both link structure and
hierarchical structure

Link Analysis for Web Data


Current approaches consider exclusively link structure
Sindice: Dataset/Entity centric view

2 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Link Analysis on the Web

Link Analysis
Given a directed graph, determine the popularity of its nodes using
link information
A link from a node i to a node j is considered as an evidence of the
importance of node j

Link Analysis for Web Documents


PageRank considers exclusively link structure
Hierarchical Link Analysis consider both link structure and
hierarchical structure

Link Analysis for Web Data


Current approaches consider exclusively link structure
Sindice: Dataset/Entity centric view

2 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Outline: Web Data Model

Web Data Model


Web Data Graph
Dataset Graph
Internal and External Node
Intra and Inter-Dataset Edge
Linkset
Two-Layer Model
Quantifying the Two-Layer Model

3 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Web Data Graph

Figure: Web data graph

4 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Dataset Graph

Figure: Dataset graph

5 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Internal and External Node

Figure: Internal (red) and external nodes (blue)

6 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Intra and Inter-Dataset Edge

Figure: Inter-dataset (orange) and intra-dataset (black) edges

7 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Linkset

Figure: Linkset

8 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Two-Layer Model

Figure: Two-layer model of the Web of Data

9 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Quantifying the two-layer model

Datasets
DBpedia 17.7 million of entities
Citeseer (RKBExplorer) 2.48 million of entities
Geonames 13.8 million of entities
Sindice 60 million of entities among 50.000 datasets

Dataset Intra Inter


DBpedia 88M (93.2%) 6.4M (6.8%)
Citeseer 12.9M (77.7%) 3.7M (22.3%)
Geonames 59M (98.3%) 1M (1.7%)
Sindice 287M (78.8%) 77M (21.2%)

Table: Ratio intra / inter dataset links

10 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Outline: The DING Model

The DING Model


Overview
Unsupervised Link Weighting
Computing DatasetRank
Computing Local EntityRank
Combining Dataset Rank and Entity Rank

11 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

The DING Model: Overview

DING Principles
DING performs entity ranking in three steps:
1 dataset ranks are computed by performing link analysis on the
top layer (i.e. the dataset graph);
2 for each dataset, entity ranks are computed by performing link
analysis on the local entity collection;
3 the popularity of the dataset is propagated to its entities and
combined with their local ranks to estimate a global entity
rank.

12 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

The DING Model: Overview

DING Principles
DING performs entity ranking in three steps:
1 dataset ranks are computed by performing link analysis on the
top layer (i.e. the dataset graph);
2 for each dataset, entity ranks are computed by performing link
analysis on the local entity collection;
3 the popularity of the dataset is propagated to its entities and
combined with their local ranks to estimate a global entity
rank.

12 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

The DING Model: Overview

DING Principles
DING performs entity ranking in three steps:
1 dataset ranks are computed by performing link analysis on the
top layer (i.e. the dataset graph);
2 for each dataset, entity ranks are computed by performing link
analysis on the local entity collection;
3 the popularity of the dataset is propagated to its entities and
combined with their local ranks to estimate a global entity
rank.

12 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

The DING Model: Overview

DING Principles
DING performs entity ranking in three steps:
1 dataset ranks are computed by performing link analysis on the
top layer (i.e. the dataset graph);
2 for each dataset, entity ranks are computed by performing link
analysis on the local entity collection;
3 the popularity of the dataset is propagated to its entities and
combined with their local ranks to estimate a global entity
rank.

12 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Unsupervised Link Weighting

Intuition
TF-IDF applied on link labels

Link Frequency - Inverse Dataset Frequency (LF-IDF)


Link weighting factor wσ,i,j
Assign low weight to very common links, such as rdfs:seeAlso

|Lσ,i,j | N
wσ,i,j = LF (Lσ,i,j ) × IDF (σ) = P × log
Lτ,i,k |Lτ,i,k | 1 + freq(σ)

13 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Unsupervised Link Weighting

Intuition
TF-IDF applied on link labels

Link Frequency - Inverse Dataset Frequency (LF-IDF)


Link weighting factor wσ,i,j
Assign low weight to very common links, such as rdfs:seeAlso

|Lσ,i,j | N
wσ,i,j = LF (Lσ,i,j ) × IDF (σ) = P × log
Lτ,i,k |Lτ,i,k | 1 + freq(σ)

14 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Unsupervised Link Weighting

Intuition
TF-IDF applied on link labels

Link Frequency - Inverse Dataset Frequency (LF-IDF)


Link weighting factor wσ,i,j
Assign low weight to very common links, such as rdfs:seeAlso

|Lσ,i,j | N
wσ,i,j = LF (Lσ,i,j ) × IDF (σ) = P × log
Lτ,i,k |Lτ,i,k | 1 + freq(σ)

15 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Computing Dataset Rank

Assumption
Dataset surfing behaviour is the same as the web page surfing
behaviour in PageRank

16 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Computing Dataset Rank

Assumption
Dataset surfing behaviour is the same as the web page surfing
behaviour in PageRank

DatasetRank
Weighted PageRank on the weighted dataset graph

17 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Computing Dataset Rank

Assumption
Dataset surfing behaviour is the same as the web page surfing
behaviour in PageRank

DatasetRank
Weighted PageRank on the weighted dataset graph
Distribution factor wσ,i,j is defined by LF-IDF

X |EDj |
r k (Dj ) = α r k−1 (Di )wσ,i,j + (1 − α) P
Lσ,i,j D∈G |ED |

18 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Computing Dataset Rank

Assumption
Dataset surfing behaviour is the same as the web page surfing
behaviour in PageRank

DatasetRank
Weighted PageRank on the weighted dataset graph
Distribution factor wσ,i,j is defined by LF-IDF
Probability of random jump is proportional to the size of a
dataset

X |EDj |
r k (Dj ) = α r k−1 (Di )wσ,i,j + (1 − α) P
Lσ,i,j D∈G |ED |

19 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Computing Local EntityRank

Generic Algorithms
Weighted EntityRank: Weighted PageRank applied on the internal
entities and intra-links of a dataset
Weighted LinkCount: in-degree counting links applied on the
internal entities and intra-links of a dataset

20 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Combining Dataset Rank and Entity Rank

Naive approach
Purely probabilistic point of view: joint probability
Assumption: independent events
Global score rg (e) = P(e ∩ D) = r (e) ∗ r (D)
Problem: favours smaller datasets

DING Approach
Add a local entity rank factor;
Normalise local ranks to a same average based on dataset size
rg (e) = r (D) ∗ r (e) ∗ P |ED | ′
D ′ ∈G |ED |

21 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Outline: Experimental Results

Experimental Results
Overview
User Study
SemSearch 2010

22 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Experimental Results: Overview

Link Analysis Methods


Global EntityRank (GER);
Local LinkCount (LLC) and Local EntityRank (LER);
Local algorithms combined with DatasetRank (DR-LLC and
DR-LER).

Experiments
1 User study to evaluate qualitatively each methods;
2 Semantic Search challenge.

23 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

User Study: Design

Exp-A
Local entity ranking (LER & LLC) on DBpedia dataset
31 participants

Exp-B
DING (DR-LER & DR-LLC) on Sindice’s page-repository
58 participants

Task
10 queries (keyword and SPARQL queries)
One result list (top-10) per algorithm
Rate algorithms (W, SW, S, SB, B) in relation to GER

24 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

User Study: Questionnaire

Figure: One of the questionnaire given to the participant

25 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

User Study A: Results

(a) LER (b) LLC


Rate Oi Ei %χ2 Rate Oi Ei %χ2
B 0 6.2 −13% B 3 6.2 −12%
SB 7 6.2 +0% SB 8 6.2 +4%
S 21 6.2 +71% S 13 6.2 +53%
SW 3 6.2 −3% SW 6 6.2 −0%
W 0 6.2 −13% W 1 6.2 −31%
Totals 31 31 Totals 31 31

Table: Chi-square test for Exp-A. The column %χ2 gives, for each
modality, its contribution to χ2 (in relative value).

Conclusion
LER and LLC provides similar results than GER. However, there is
a more significant proportion of the population that considers LER
more similar to GER.

26 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

User Study B: Results

(a) DR-LER (b) DR-LLC


Rate Oi Ei %χ2 Rate Oi Ei %χ2
B 12 11.6 +0% B 7 11.6 −9%
SB 12 11.6 +0% SB 24 11.6 +65%
S 22 11.6 +57% S 13 11.6 +1%
SW 9 11.6 −4% SW 10 11.6 −1%
W 3 11.6 −39% W 4 11.6 −24%
Totals 58 58 Totals 58 58

Table: Chi-square test for Exp-B. The column %χ2 gives, for each
modality, its contribution to χ2 (in relative value).

Conclusion
It appears that DR-LLC provides a better effectiveness. A large
proportion of the population finds it slightly better than GER, and
this is reinforced by a few number of people finding it worse.

27 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

SemSearch 2010: Entity Search Track

SemSearch 2010
First semantic search evaluation;
Focus on entity search.

Experiment Design
Billion Triple Challenge 2009 dataset;
92 keyword queries;
Relevance judgement on top 10 entities.

28 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

SemSearch 2010: Experiment Results

Figure: SemSearch 2010 evaluation results

29 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Scalability: Computing Dataset Rank

Graph Node Edge


Web Data 60M 364M
Dataset 50K 1.2M
Table: Graph Size

DatasetRank
1 iteration ≈ 200ms;
Good quality rank in few seconds.

30 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Scalability: Dataset size distribution

Power-law distribution;
The majority of the datasets contain less than 1000 nodes.

31 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Scalability: Computing Entity Rank

EntityRank
55 iterations of 1 minute (for DBPedia dataset).

LinkCount
requires only 1 iteration;
can be computed on the fly with appropriate data index.

32 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Dataset-Dependent Local EntityRank

Dataset Specific Algorithms


No reason to have one generic algorithm for all datasets;
We could choose appropriate entity ranking algorithm for each
dataset.

Graph Structure Dataset Algorithm


Generic, Controlled DBpedia LinkCount
Generic, Open Social Communities EntityRank
Hierarchical Geonames, Taxonomies DHC
Bipartite DBLP CiteRank
Table: List of various graph structures with appropriate algorithms

33 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Dataset-Dependent Local EntityRank

Dataset Specific Algorithms


No reason to have one generic algorithm for all datasets;
We could choose appropriate entity ranking algorithm for each
dataset.

Graph Structure Dataset Algorithm


Generic, Controlled DBpedia LinkCount
Generic, Open Social Communities EntityRank
Hierarchical Geonames, Taxonomies DHC
Bipartite DBLP CiteRank
Table: List of various graph structures with appropriate algorithms

34 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Dataset-Dependent Local EntityRank

Dataset Specific Algorithms


No reason to have one generic algorithm for all datasets;
We could choose appropriate entity ranking algorithm for each
dataset.

Graph Structure Dataset Algorithm


Generic, Controlled DBpedia LinkCount
Generic, Open Social Communities EntityRank
Hierarchical Geonames, Taxonomies DHC
Bipartite DBLP CiteRank
Table: List of various graph structures with appropriate algorithms

35 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion

Conclusion

DING Method
Hierarchical Link Analysis for web data;
Quality comparable or even better than standard approaches;
Lower computational complexity;
Dataset-dependent local entity ranking.

Future Work
Investigate how to detect appropriate local entity ranking
method for a dataset;
Study query-dependent ranking and how it can be combined
with DING ranking.

36 / 36

You might also like