SlideShare a Scribd company logo
May15th, 2019
HOW REPRESENTATIVE IS A SPARQL
BENCHMARK? AN ANALYSIS OF RDF
TRIPLESTORE BENCHMARKS
Muhammad Saleem, Gábor Szárnyas, Felix Conrads,
Syed Ahmad Chan Bukhari, Qaiser Mehmood, Axel-
Cyrille Ngonga Ngomo
The Web Conference 2019, San Francisco
1
MOTIVATION
 Various RDF Triplestores
 e.g., Virtuoso, FUSEKI, Blazgraph, Stardog, RDF3X etc.
 Various Triplestore benchmarks
 e.g., WatDiv, FEASIBLE, LDBC, BSBM, SP2Bench etc.
 Varying workload on Triplestores
 Various important SPARQL query features
 Which benchmark is more representative?
 Which benchmark is more suitable to test given
Triplestore?
 How SPARQL features effect the query runtimes?
2
QUERYING BENCHMARK
COMPONENTS
 Dataset(s)
 Queries
 Performance metrics
 Execution rules
3
IMPORTANT RDF DATASET
FEATURES
RDF Datasets used in the querying benchmark should
vary:
 Number of triples
 Number of classes
 Number of resources
 Number of properties
 Number of objects
 Average properties per class
 Average instances per class
 Average in-degree and out-degree
 Structuredness or coherence
 Relationship specialty 4
IMPORTANT SPARQL QUERY FEATURES
 Number of triple patterns
 Number of projection variables
 Number of BGPs
 Number of join vertices
 Mean join vertex degree
 Query result set sizes
 Mean triple pattern selectivity
 BGP-restricted triple pattern selectivity
 Join-restricted triple pattern selectivity
 Overall diversity score (average coefficient of variation)
 Join vertex types (`star', `path', `hybrid', `sink')
 SPARQL clauses used (e.g., LIMIT, UNION, OPTIONAL, FILTER etc.)
5
SPARQL queries as directed hypergraph
IMPORTANT PERFORMANCE METRICS
(1/2)
 Query Processing Related
 Query execution time
 Query Mix per Hour (QMpH)
 Queries per Second (QpS)
 CPU and memory usage
 Intermediate results
 Number of disk/memory swaps
 Result Set Related
 Result set correctness
 Result set completeness
6
IMPORTANT PERFORMANCE METRICS
(2/2)
 Data Storage Related
 Data loading time
 Storage space
 Index size
 Parallelism with/without Updates
 Parallel querying agents
 Parallel data updates agents
7
BENCHMARKS SELECTION CRITERIA
 Target query runtime performance evaluation of triplestores
 RDF Datasets available
 SPARQL queries available
 No reasoning required to get complete results
8
SELECTED BENCHMARKS
 Real data and/or queries
benchmarks
 FishMark
 BioBench
 FEASIBLE
 Dbpedia SPARQL Benchmark (DBPSB)
 Synthetic benchmarks
 Bowlogna
 TrainBench
 Berlin SPARQL Benchmark (BSBM)
 SP2Bench
 WatDiv
 Social Networking Benchmark (SNB)
9
 Real-world datasets and queries
 Dbpedia 3.5.1
 Semantic Web Dog Food (SWDF)
 NCBIGene
 SIDER
 DrugBank
DATASETS ANALYSIS:
STRUCTUREDNESS
10
 Duan et al. assumption
 Real datasets are less structured
 Synthetic datasets are high structured
The dataset structuredness problem
is well covered in recent synthetic
data generators (e.g., WatDiv,
TrainBench)
DATASETS ANALYSIS:
RELATIONSHIP SPECIALTY
11
 Qiao et al. assumption
 Synthetic datasets have low relationship
specialty
The low relationship specialty
problem in synthetic datasets still
exists in general and needs to be
covered in future synthetic
benchmark generation approaches
QUERIES ANALYSIS: OVERALL
DIVERSITY SCORE
12
Benchmarks queries diversity (high to low): FEASIBLE  BioBench 
FishMark  WatDiv  Bowlogna  SP2Bench  BSBM  DBPSB  SNB-
BI  SNB-INT  TrainBench
QUERIES ANALYSIS: DISTRIBUTION OF SPARQL CLAUSES
AND JOIN VERTEX TYPES
13
Only FEASIBLE and BioBench do
not completely miss or
overused features
Synthetic benchmarks often
fail to contain important
SPARQL clauses
PERFORMANCE METRICS
14
BSBM reported the results for
maximum metrics among the
selected benchmarks
SPEARSMAN’S CORRELATION WITH RUNTIMES
15
Highest impact on query runtimes:
PV  JV  TP  Result  JVD 
JTPS  TPS  BGPs  LSQ  BTPS
The SPARQL query features we selected
have a weak correlation with query
execution time, suggesting that the query
runtime is a complex measure affected by
multidimensional SPARQL query features
EFFECT OF DATASETS STRUCTUREDNESS
16
CONCLUSIONS
17
 The dataset structuredness problem is well covered in recent synthetic data
generators (e.g., WatDiv, TrainBench)
 The low relationship specialty problem in synthetic datasets still exists in
general and needs to be covered in future synthetic benchmark generation
approaches
 The FEASIBLE framework employed on DBpedia generated the most diverse
benchmark in our evaluation
 The SPARQL query features we selected have a weak correlation with query
execution time, suggesting that the query runtime is a complex measure
affected by multidimensional SPARQL query features
 Still, the number of projection variables, join vertices, triple patterns, the
result sizes, and the join vertex degree are the top five SPARQL features
that most impact the overall query execution time
 Synthetic benchmarks often fail to contain important SPARQL clauses such
as DISTINCT, FILTER, OPTIONAL, LIMIT and UNION
 The dataset structuredness has a direct correlation with the result sizes
and execution times of queries and indirect correlation with dataset

More Related Content

Similar to How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benchmarks (20)

PDF
SQCFramework: SPARQL Query Containment Benchmarks Generation Framework
Holistic Benchmarking of Big Linked Data
 
PPT
A Pragmatic Approach to Semantic Repositories Benchmarking
Dhaval Thakker
 
PDF
CoreBigBench: Benchmarking Big Data Core Operations
t_ivanov
 
PDF
CoreBigBench: Benchmarking Big Data Core Operations
DataBench
 
ODP
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
LDBC council
 
PDF
Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
Holistic Benchmarking of Big Linked Data
 
PPTX
Extended LargeRDFBench
Muhammad Saleem
 
PDF
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Gezim Sejdiu
 
PPTX
Practical SPARQL Benchmarking Revisited
Rob Vesse
 
PPTX
LargeRDFBench
Muhammad Saleem
 
PDF
Predicting SPARQL query execution time and suggesting SPARQL queries based on...
Rakebul Hasan
 
ODP
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
Ioan Toma
 
PDF
MOCHA 2018 Challenge @ ESWC2018
Holistic Benchmarking of Big Linked Data
 
PDF
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
Holistic Benchmarking of Big Linked Data
 
PPTX
Predicting query performance and explaining results to assist Linked Data con...
Rakebul Hasan
 
PPTX
Scalable Web Data Management using RDF
Navid Sedighpour
 
PPTX
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2 Creating Knowledge out of Interlinked Data
 
PDF
Graph basedrdf storeforapachecassandra
Ravindra Ranwala
 
PDF
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
LDBC council
 
PPTX
Validating statistical Index Data represented in RDF using SPARQL Queries: Co...
Jose Emilio Labra Gayo
 
SQCFramework: SPARQL Query Containment Benchmarks Generation Framework
Holistic Benchmarking of Big Linked Data
 
A Pragmatic Approach to Semantic Repositories Benchmarking
Dhaval Thakker
 
CoreBigBench: Benchmarking Big Data Core Operations
t_ivanov
 
CoreBigBench: Benchmarking Big Data Core Operations
DataBench
 
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
LDBC council
 
Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
Holistic Benchmarking of Big Linked Data
 
Extended LargeRDFBench
Muhammad Saleem
 
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Gezim Sejdiu
 
Practical SPARQL Benchmarking Revisited
Rob Vesse
 
LargeRDFBench
Muhammad Saleem
 
Predicting SPARQL query execution time and suggesting SPARQL queries based on...
Rakebul Hasan
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
Ioan Toma
 
MOCHA 2018 Challenge @ ESWC2018
Holistic Benchmarking of Big Linked Data
 
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
Holistic Benchmarking of Big Linked Data
 
Predicting query performance and explaining results to assist Linked Data con...
Rakebul Hasan
 
Scalable Web Data Management using RDF
Navid Sedighpour
 
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2 Creating Knowledge out of Interlinked Data
 
Graph basedrdf storeforapachecassandra
Ravindra Ranwala
 
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
LDBC council
 
Validating statistical Index Data represented in RDF using SPARQL Queries: Co...
Jose Emilio Labra Gayo
 

More from Muhammad Saleem (14)

PPTX
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
Muhammad Saleem
 
PPTX
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
Muhammad Saleem
 
PPTX
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Muhammad Saleem
 
PPTX
Federated Query Formulation and Processing Through BioFed
Muhammad Saleem
 
PPTX
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
Muhammad Saleem
 
PPTX
Efficient source selection for sparql endpoint federation
Muhammad Saleem
 
PDF
LSQ: The Linked SPARQL Queries Dataset
Muhammad Saleem
 
PPTX
Federated SPARQL Query Processing ISWC2015 Tutorial
Muhammad Saleem
 
PPTX
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
Muhammad Saleem
 
PPTX
Federated SPARQL query processing over the Web of Data
Muhammad Saleem
 
PPTX
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
Muhammad Saleem
 
PPTX
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
Muhammad Saleem
 
PPTX
Fostering Serendipity through Big Linked Data
Muhammad Saleem
 
PPTX
Linked Cancer Genome Atlas Database
Muhammad Saleem
 
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
Muhammad Saleem
 
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
Muhammad Saleem
 
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Muhammad Saleem
 
Federated Query Formulation and Processing Through BioFed
Muhammad Saleem
 
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
Muhammad Saleem
 
Efficient source selection for sparql endpoint federation
Muhammad Saleem
 
LSQ: The Linked SPARQL Queries Dataset
Muhammad Saleem
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Muhammad Saleem
 
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
Muhammad Saleem
 
Federated SPARQL query processing over the Web of Data
Muhammad Saleem
 
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
Muhammad Saleem
 
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
Muhammad Saleem
 
Fostering Serendipity through Big Linked Data
Muhammad Saleem
 
Linked Cancer Genome Atlas Database
Muhammad Saleem
 
Ad

Recently uploaded (20)

DOCX
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
 
PDF
TrustArc Webinar - Navigating APAC Data Privacy Laws: Compliance & Challenges
TrustArc
 
PDF
Enhancing Environmental Monitoring with Real-Time Data Integration: Leveragin...
Safe Software
 
PDF
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 
PDF
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
PDF
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
PDF
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
PPTX
Practical Applications of AI in Local Government
OnBoard
 
PDF
How to Visualize the ​Spatio-Temporal Data Using CesiumJS​
SANGHEE SHIN
 
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
PDF
GDG Cloud Southlake #44: Eyal Bukchin: Tightening the Kubernetes Feedback Loo...
James Anderson
 
PDF
LLM Search Readiness Audit - Dentsu x SEO Square - June 2025.pdf
Nick Samuel
 
PDF
Next level data operations using Power Automate magic
Andries den Haan
 
PPTX
Enabling the Digital Artisan – keynote at ICOCI 2025
Alan Dix
 
PPTX
Smarter Governance with AI: What Every Board Needs to Know
OnBoard
 
PDF
Kubernetes - Architecture & Components.pdf
geethak285
 
PDF
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
PDF
Pipeline Industry IoT - Real Time Data Monitoring
Safe Software
 
PDF
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
 
PPTX
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
 
TrustArc Webinar - Navigating APAC Data Privacy Laws: Compliance & Challenges
TrustArc
 
Enhancing Environmental Monitoring with Real-Time Data Integration: Leveragin...
Safe Software
 
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
Practical Applications of AI in Local Government
OnBoard
 
How to Visualize the ​Spatio-Temporal Data Using CesiumJS​
SANGHEE SHIN
 
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
GDG Cloud Southlake #44: Eyal Bukchin: Tightening the Kubernetes Feedback Loo...
James Anderson
 
LLM Search Readiness Audit - Dentsu x SEO Square - June 2025.pdf
Nick Samuel
 
Next level data operations using Power Automate magic
Andries den Haan
 
Enabling the Digital Artisan – keynote at ICOCI 2025
Alan Dix
 
Smarter Governance with AI: What Every Board Needs to Know
OnBoard
 
Kubernetes - Architecture & Components.pdf
geethak285
 
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
Pipeline Industry IoT - Real Time Data Monitoring
Safe Software
 
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
 
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
Ad

How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benchmarks

  • 1. May15th, 2019 HOW REPRESENTATIVE IS A SPARQL BENCHMARK? AN ANALYSIS OF RDF TRIPLESTORE BENCHMARKS Muhammad Saleem, Gábor Szárnyas, Felix Conrads, Syed Ahmad Chan Bukhari, Qaiser Mehmood, Axel- Cyrille Ngonga Ngomo The Web Conference 2019, San Francisco 1
  • 2. MOTIVATION  Various RDF Triplestores  e.g., Virtuoso, FUSEKI, Blazgraph, Stardog, RDF3X etc.  Various Triplestore benchmarks  e.g., WatDiv, FEASIBLE, LDBC, BSBM, SP2Bench etc.  Varying workload on Triplestores  Various important SPARQL query features  Which benchmark is more representative?  Which benchmark is more suitable to test given Triplestore?  How SPARQL features effect the query runtimes? 2
  • 3. QUERYING BENCHMARK COMPONENTS  Dataset(s)  Queries  Performance metrics  Execution rules 3
  • 4. IMPORTANT RDF DATASET FEATURES RDF Datasets used in the querying benchmark should vary:  Number of triples  Number of classes  Number of resources  Number of properties  Number of objects  Average properties per class  Average instances per class  Average in-degree and out-degree  Structuredness or coherence  Relationship specialty 4
  • 5. IMPORTANT SPARQL QUERY FEATURES  Number of triple patterns  Number of projection variables  Number of BGPs  Number of join vertices  Mean join vertex degree  Query result set sizes  Mean triple pattern selectivity  BGP-restricted triple pattern selectivity  Join-restricted triple pattern selectivity  Overall diversity score (average coefficient of variation)  Join vertex types (`star', `path', `hybrid', `sink')  SPARQL clauses used (e.g., LIMIT, UNION, OPTIONAL, FILTER etc.) 5 SPARQL queries as directed hypergraph
  • 6. IMPORTANT PERFORMANCE METRICS (1/2)  Query Processing Related  Query execution time  Query Mix per Hour (QMpH)  Queries per Second (QpS)  CPU and memory usage  Intermediate results  Number of disk/memory swaps  Result Set Related  Result set correctness  Result set completeness 6
  • 7. IMPORTANT PERFORMANCE METRICS (2/2)  Data Storage Related  Data loading time  Storage space  Index size  Parallelism with/without Updates  Parallel querying agents  Parallel data updates agents 7
  • 8. BENCHMARKS SELECTION CRITERIA  Target query runtime performance evaluation of triplestores  RDF Datasets available  SPARQL queries available  No reasoning required to get complete results 8
  • 9. SELECTED BENCHMARKS  Real data and/or queries benchmarks  FishMark  BioBench  FEASIBLE  Dbpedia SPARQL Benchmark (DBPSB)  Synthetic benchmarks  Bowlogna  TrainBench  Berlin SPARQL Benchmark (BSBM)  SP2Bench  WatDiv  Social Networking Benchmark (SNB) 9  Real-world datasets and queries  Dbpedia 3.5.1  Semantic Web Dog Food (SWDF)  NCBIGene  SIDER  DrugBank
  • 10. DATASETS ANALYSIS: STRUCTUREDNESS 10  Duan et al. assumption  Real datasets are less structured  Synthetic datasets are high structured The dataset structuredness problem is well covered in recent synthetic data generators (e.g., WatDiv, TrainBench)
  • 11. DATASETS ANALYSIS: RELATIONSHIP SPECIALTY 11  Qiao et al. assumption  Synthetic datasets have low relationship specialty The low relationship specialty problem in synthetic datasets still exists in general and needs to be covered in future synthetic benchmark generation approaches
  • 12. QUERIES ANALYSIS: OVERALL DIVERSITY SCORE 12 Benchmarks queries diversity (high to low): FEASIBLE  BioBench  FishMark  WatDiv  Bowlogna  SP2Bench  BSBM  DBPSB  SNB- BI  SNB-INT  TrainBench
  • 13. QUERIES ANALYSIS: DISTRIBUTION OF SPARQL CLAUSES AND JOIN VERTEX TYPES 13 Only FEASIBLE and BioBench do not completely miss or overused features Synthetic benchmarks often fail to contain important SPARQL clauses
  • 14. PERFORMANCE METRICS 14 BSBM reported the results for maximum metrics among the selected benchmarks
  • 15. SPEARSMAN’S CORRELATION WITH RUNTIMES 15 Highest impact on query runtimes: PV  JV  TP  Result  JVD  JTPS  TPS  BGPs  LSQ  BTPS The SPARQL query features we selected have a weak correlation with query execution time, suggesting that the query runtime is a complex measure affected by multidimensional SPARQL query features
  • 16. EFFECT OF DATASETS STRUCTUREDNESS 16
  • 17. CONCLUSIONS 17  The dataset structuredness problem is well covered in recent synthetic data generators (e.g., WatDiv, TrainBench)  The low relationship specialty problem in synthetic datasets still exists in general and needs to be covered in future synthetic benchmark generation approaches  The FEASIBLE framework employed on DBpedia generated the most diverse benchmark in our evaluation  The SPARQL query features we selected have a weak correlation with query execution time, suggesting that the query runtime is a complex measure affected by multidimensional SPARQL query features  Still, the number of projection variables, join vertices, triple patterns, the result sizes, and the join vertex degree are the top five SPARQL features that most impact the overall query execution time  Synthetic benchmarks often fail to contain important SPARQL clauses such as DISTINCT, FILTER, OPTIONAL, LIMIT and UNION  The dataset structuredness has a direct correlation with the result sizes and execution times of queries and indirect correlation with dataset