SlideShare a Scribd company logo
An Empirical Evaluation of Cost-based
Federated SPARQL Query Processing Engines
Umair Qudus
Muhammad Saleem
Axel-Cyrille Ngonga Ngomo
Young-koo Lee
INTRODUCTION
• Finding a good query plan is of key step of the optimization of
query runtime.
• Different metrics proposed to measure the quality of query plan,
including query runtime, result set completeness and correctness,
number of sources selected and number of requests sent.
• Although informative, these metrics are generic and unable to
quantify and evaluate the accuracy of the cardinality estimators of
the cost-based federation engines.
• We present a novel evaluation metrics targeted at a fine-grained
benchmarking of cost-based federated SPARQL query engines
Motivating Example
We need methods to measure the quality of cost
estimations for better query planning.
Motivation (2)
RELATED WORK
Current Performance Metrics
METRICS:
Definitions (1)
• q-error:
• Example
– Cr(TP1):100 Ce(TP1) = 90
– q-error = max(90/100,100/90) = 1.11
– q-error of all TPs = max(1.11,1.25,1) = 1.25
– q-error of whole plan(TPs+Joins) = max(1.11,1.25,1,1.3,3) = 3.
Definitions (2)
• Proposed Similarity Error:
• real = (100,200,300,50,50) estimated = (90,250,300,65,150)
Ep(engine 1) = 2*0.1391 = 0.2784 Ep(engine2) = 2*0.3838 = 0.7676
EXPERIMENTS AND RESULTS
Experimental Settings
• Federated Query Engines
– CostFed
– SPLENDID
– SemaGrow
– LHD
– Odyssey
• Queries and datasets
– FedBench and LargeRDFBench benchmarks. 13 Virtuoso endpoints.
• Technical Specifications: Each Virtuoso was deployed on a physical machine
(32 GB RAM, Core i7 processor and 500 GB hard disc). We ran the selected
federation engines on a local client machine with same specifications
Overall Plan Error (Similarity Error vs. q-error)
Join Error (Similarity Error vs. q-error)
Triple pattern error (Similarity Error vs. q-error)
Correlating metrics with runtime
Regression Experiments
Query Runtime (1/3)
Query Runtime (2/3)
Query Runtime (2/3)
Conclusion
• Positive correlation with the runtimes.
• The higher coefficients (R values) with cosine-based errors as
compared to q-error.
• The smaller p-values of the cosine-based errors as compared to q-
error.
• Joins has higher correlation to runtimes as compared to the error in
the cardinality estimation of triple patterns.
• On average, the CostFed engine produce the smallest estimation
errors and has the smallest execution time for majority of the
LargeRDFBench queries.
Twitter: @UQudus
Paper Link: http://www.semantic-web-journal.net/system/files/swj2604.pdf
https://dice-research.org/UmairQudus
Ad

More Related Content

Similar to An empirical evaluation of cost-based federated SPARQL query Processing Engines (20)

Load distribution of analytical query workloads for database cluster architec...
Load distribution of analytical query workloads for database cluster architec...Load distribution of analytical query workloads for database cluster architec...
Load distribution of analytical query workloads for database cluster architec...
Matheesha Fernando
 
module 1.pptx
module 1.pptxmodule 1.pptx
module 1.pptx
PawanBharadwaj2
 
performancetestinganoverview-110206071921-phpapp02.pdf
performancetestinganoverview-110206071921-phpapp02.pdfperformancetestinganoverview-110206071921-phpapp02.pdf
performancetestinganoverview-110206071921-phpapp02.pdf
MAshok10
 
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Peter Tröger
 
Automated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise ApplicationsAutomated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise Applications
SAIL_QU
 
PAD: Performance Anomaly Detection in Multi-Server Distributed Systems
PAD: Performance Anomaly Detection in Multi-Server Distributed SystemsPAD: Performance Anomaly Detection in Multi-Server Distributed Systems
PAD: Performance Anomaly Detection in Multi-Server Distributed Systems
James Hill
 
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cPresentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Ronald Francisco Vargas Quesada
 
DB LECTURE 5 QUERY PROCESSING.pptx
DB LECTURE 5 QUERY        PROCESSING.pptxDB LECTURE 5 QUERY        PROCESSING.pptx
DB LECTURE 5 QUERY PROCESSING.pptx
grahamoyigo19
 
Optimizer overviewoow2014
Optimizer overviewoow2014Optimizer overviewoow2014
Optimizer overviewoow2014
Mysql User Camp
 
Cpu provisioning algorithms for service differentiation in cloud based enviro...
Cpu provisioning algorithms for service differentiation in cloud based enviro...Cpu provisioning algorithms for service differentiation in cloud based enviro...
Cpu provisioning algorithms for service differentiation in cloud based enviro...
ieeepondy
 
Project Controls Expo - 31st Oct 2012 - Accurate Management Reports on 1me, e...
Project Controls Expo - 31st Oct 2012 - Accurate Management Reports on 1me, e...Project Controls Expo - 31st Oct 2012 - Accurate Management Reports on 1me, e...
Project Controls Expo - 31st Oct 2012 - Accurate Management Reports on 1me, e...
Project Controls Expo
 
Simulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningSimulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightning
CloudLightning
 
Training - What is Performance ?
Training  - What is Performance ?Training  - What is Performance ?
Training - What is Performance ?
Betclic Everest Group Tech Team
 
Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data Analytics
Databricks
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
MLconf
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
SigOpt
 
Using VisualSim Architect for Semiconductor System Analysis
Using VisualSim Architect for Semiconductor System AnalysisUsing VisualSim Architect for Semiconductor System Analysis
Using VisualSim Architect for Semiconductor System Analysis
Deepak Shankar
 
Adding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestAdding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance Test
Rodolfo Kohn
 
A Methodology and Tool Suite for Evaluating the Accuracy of Interoperating NL...
A Methodology and Tool Suite for Evaluating the Accuracy of Interoperating NL...A Methodology and Tool Suite for Evaluating the Accuracy of Interoperating NL...
A Methodology and Tool Suite for Evaluating the Accuracy of Interoperating NL...
Uma Murthy
 
Automated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsAutomated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance Systems
Lionel Briand
 
Load distribution of analytical query workloads for database cluster architec...
Load distribution of analytical query workloads for database cluster architec...Load distribution of analytical query workloads for database cluster architec...
Load distribution of analytical query workloads for database cluster architec...
Matheesha Fernando
 
performancetestinganoverview-110206071921-phpapp02.pdf
performancetestinganoverview-110206071921-phpapp02.pdfperformancetestinganoverview-110206071921-phpapp02.pdf
performancetestinganoverview-110206071921-phpapp02.pdf
MAshok10
 
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Peter Tröger
 
Automated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise ApplicationsAutomated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise Applications
SAIL_QU
 
PAD: Performance Anomaly Detection in Multi-Server Distributed Systems
PAD: Performance Anomaly Detection in Multi-Server Distributed SystemsPAD: Performance Anomaly Detection in Multi-Server Distributed Systems
PAD: Performance Anomaly Detection in Multi-Server Distributed Systems
James Hill
 
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cPresentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Ronald Francisco Vargas Quesada
 
DB LECTURE 5 QUERY PROCESSING.pptx
DB LECTURE 5 QUERY        PROCESSING.pptxDB LECTURE 5 QUERY        PROCESSING.pptx
DB LECTURE 5 QUERY PROCESSING.pptx
grahamoyigo19
 
Optimizer overviewoow2014
Optimizer overviewoow2014Optimizer overviewoow2014
Optimizer overviewoow2014
Mysql User Camp
 
Cpu provisioning algorithms for service differentiation in cloud based enviro...
Cpu provisioning algorithms for service differentiation in cloud based enviro...Cpu provisioning algorithms for service differentiation in cloud based enviro...
Cpu provisioning algorithms for service differentiation in cloud based enviro...
ieeepondy
 
Project Controls Expo - 31st Oct 2012 - Accurate Management Reports on 1me, e...
Project Controls Expo - 31st Oct 2012 - Accurate Management Reports on 1me, e...Project Controls Expo - 31st Oct 2012 - Accurate Management Reports on 1me, e...
Project Controls Expo - 31st Oct 2012 - Accurate Management Reports on 1me, e...
Project Controls Expo
 
Simulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningSimulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightning
CloudLightning
 
Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data Analytics
Databricks
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
MLconf
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
SigOpt
 
Using VisualSim Architect for Semiconductor System Analysis
Using VisualSim Architect for Semiconductor System AnalysisUsing VisualSim Architect for Semiconductor System Analysis
Using VisualSim Architect for Semiconductor System Analysis
Deepak Shankar
 
Adding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestAdding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance Test
Rodolfo Kohn
 
A Methodology and Tool Suite for Evaluating the Accuracy of Interoperating NL...
A Methodology and Tool Suite for Evaluating the Accuracy of Interoperating NL...A Methodology and Tool Suite for Evaluating the Accuracy of Interoperating NL...
A Methodology and Tool Suite for Evaluating the Accuracy of Interoperating NL...
Uma Murthy
 
Automated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsAutomated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance Systems
Lionel Briand
 

Recently uploaded (20)

SPRING FESTIVITIES - UK AND USA -
SPRING FESTIVITIES - UK AND USA            -SPRING FESTIVITIES - UK AND USA            -
SPRING FESTIVITIES - UK AND USA -
Colégio Santa Teresinha
 
The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...
Sandeep Swamy
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-30-2025.pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 4-30-2025.pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 4-30-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-30-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
LDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini UpdatesLDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini Updates
LDM Mia eStudios
 
Exploring-Substances-Acidic-Basic-and-Neutral.pdf
Exploring-Substances-Acidic-Basic-and-Neutral.pdfExploring-Substances-Acidic-Basic-and-Neutral.pdf
Exploring-Substances-Acidic-Basic-and-Neutral.pdf
Sandeep Swamy
 
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
Celine George
 
One Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learningOne Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learning
momer9505
 
To study the nervous system of insect.pptx
To study the nervous system of insect.pptxTo study the nervous system of insect.pptx
To study the nervous system of insect.pptx
Arshad Shaikh
 
Geography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjectsGeography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjects
ProfDrShaikhImran
 
P-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 finalP-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 final
bs22n2s
 
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptxSCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
Ronisha Das
 
New Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxNew Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptx
milanasargsyan5
 
apa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdfapa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdf
Ishika Ghosh
 
GDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptxGDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptx
azeenhodekar
 
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
larencebapu132
 
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACYUNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
DR.PRISCILLA MARY J
 
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Library Association of Ireland
 
Understanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s GuideUnderstanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s Guide
GS Virdi
 
Introduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe EngineeringIntroduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe Engineering
Damian T. Gordon
 
How to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of saleHow to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of sale
Celine George
 
The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...
Sandeep Swamy
 
LDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini UpdatesLDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini Updates
LDM Mia eStudios
 
Exploring-Substances-Acidic-Basic-and-Neutral.pdf
Exploring-Substances-Acidic-Basic-and-Neutral.pdfExploring-Substances-Acidic-Basic-and-Neutral.pdf
Exploring-Substances-Acidic-Basic-and-Neutral.pdf
Sandeep Swamy
 
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
Celine George
 
One Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learningOne Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learning
momer9505
 
To study the nervous system of insect.pptx
To study the nervous system of insect.pptxTo study the nervous system of insect.pptx
To study the nervous system of insect.pptx
Arshad Shaikh
 
Geography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjectsGeography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjects
ProfDrShaikhImran
 
P-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 finalP-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 final
bs22n2s
 
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptxSCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
Ronisha Das
 
New Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxNew Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptx
milanasargsyan5
 
apa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdfapa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdf
Ishika Ghosh
 
GDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptxGDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptx
azeenhodekar
 
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
larencebapu132
 
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACYUNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
DR.PRISCILLA MARY J
 
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Library Association of Ireland
 
Understanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s GuideUnderstanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s Guide
GS Virdi
 
Introduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe EngineeringIntroduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe Engineering
Damian T. Gordon
 
How to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of saleHow to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of sale
Celine George
 
Ad

An empirical evaluation of cost-based federated SPARQL query Processing Engines

  • 1. An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines Umair Qudus Muhammad Saleem Axel-Cyrille Ngonga Ngomo Young-koo Lee
  • 2. INTRODUCTION • Finding a good query plan is of key step of the optimization of query runtime. • Different metrics proposed to measure the quality of query plan, including query runtime, result set completeness and correctness, number of sources selected and number of requests sent. • Although informative, these metrics are generic and unable to quantify and evaluate the accuracy of the cardinality estimators of the cost-based federation engines. • We present a novel evaluation metrics targeted at a fine-grained benchmarking of cost-based federated SPARQL query engines
  • 4. We need methods to measure the quality of cost estimations for better query planning. Motivation (2)
  • 8. Definitions (1) • q-error: • Example – Cr(TP1):100 Ce(TP1) = 90 – q-error = max(90/100,100/90) = 1.11 – q-error of all TPs = max(1.11,1.25,1) = 1.25 – q-error of whole plan(TPs+Joins) = max(1.11,1.25,1,1.3,3) = 3.
  • 9. Definitions (2) • Proposed Similarity Error: • real = (100,200,300,50,50) estimated = (90,250,300,65,150) Ep(engine 1) = 2*0.1391 = 0.2784 Ep(engine2) = 2*0.3838 = 0.7676
  • 11. Experimental Settings • Federated Query Engines – CostFed – SPLENDID – SemaGrow – LHD – Odyssey • Queries and datasets – FedBench and LargeRDFBench benchmarks. 13 Virtuoso endpoints. • Technical Specifications: Each Virtuoso was deployed on a physical machine (32 GB RAM, Core i7 processor and 500 GB hard disc). We ran the selected federation engines on a local client machine with same specifications
  • 12. Overall Plan Error (Similarity Error vs. q-error)
  • 13. Join Error (Similarity Error vs. q-error)
  • 14. Triple pattern error (Similarity Error vs. q-error)
  • 20. Conclusion • Positive correlation with the runtimes. • The higher coefficients (R values) with cosine-based errors as compared to q-error. • The smaller p-values of the cosine-based errors as compared to q- error. • Joins has higher correlation to runtimes as compared to the error in the cardinality estimation of triple patterns. • On average, the CostFed engine produce the smallest estimation errors and has the smallest execution time for majority of the LargeRDFBench queries.
  • 21. Twitter: @UQudus Paper Link: http://www.semantic-web-journal.net/system/files/swj2604.pdf https://dice-research.org/UmairQudus