SlideShare a Scribd company logo
Building a Graph based RDF Store for Apache Cassandra
Name: Ravindra Ranwala
ID: 138227T
Supervisor: Dr. Amal Shehan Perera
1
Agenda
● Introduction
● Basic Concepts
● The Problem
● Literature Review
● Methodology
● Demo
● Evaluation and Result
● Conclusion 2
Introduction
● RDFs are used to support queries in the semantic web.
● RDF stores contain trillions of triples.
● Today RDF data is everywhere - commercial search
engines proliferate RDF data ex. Google, yahoo, bing
etc.
● SPARQL - used as a query language.
● Different approaches exists to build a triple store.
● Main challenges are system scalability and generality.
3
Basic Concepts - RDF Triple
● RDF dataset consists of statements in the form of
(subject, predicate, object)
● Subject has a predicate property whose value is the
object.
● Examples: <Titanic, has award, Best picture>
● Core of the semantic web is built on top of the RDF data
model.
● These triples can be stored in different ways.
4
The Problem
● Apache Cassandra is a Nosql, multi tenant and multi
data centric database.
● Our objective is to build a scalable RDF store for
Apache Cassandra.
● Cassandra is used by eBay, Twitter, Cisco, etc.
● This will exponentially increase the value of Cassandra.
● The largest known Cassandra cluster has 300 TB of
data over 400 machines.
● This motivates us to build a distributed, scalable RDF
store to answer user queries on them efficiently. 5
Literature Review - Concepts
● A triple store can be built on top of any DBMS or File system.
● RDF dataset consists of statements in the form of <subject, predicate,
object>
● Subject has a predicate property whose value is object.
● Ex. <person1, name, Mike>
● A typical triple store holds a multi millions/billions of such triples.
● Efficient and scalable management of RDF data is a fundamental
challenge.
● SPARQL queries are submitted to the RDF store.
Jiacheng Yang, Haixun Wang, Bin Shao, Zhongyuan Wang Kai Zeng, "A Distributed Graph
Engine for Web Scale RDF Data,"
6
Apache Cassandra
● Distributed, fault tolerant (i.e. no single point of failures),
post relational, Nosql database system.
● Peer to peer distributed architecture. Supports both strict
and eventual consistency.
● All the nodes are the same. There is no master and slave
nodes.
● Uses read/write anywhere style architecture.
DataStax Corporation. (2011, October) “Welcome to Apache Cassandra 1.0”
7
Triple store –approaches
● There are different approaches the exist to manage
RDF data.
● Each approach has it’s own advantages and
disadvantages.
8
Relational Approach
● Triples are stored using the relational model.
Justin J. Levandoski F. Mokbel, "RDF Data-Centric Storage,"
9
Relational Approach (contd.)
● Triple store - yields costly self joins of a huge RDF store
(trillions of triples)
● N-array - eliminates the need for joins, but leads to
higher number of nulls.
● reduces null storage, but introduces costly join.
10
Graph based approaches
● New approach that greatly improves the performance of
SPARQL query processing
● Graph exploration instead of joins.
● Unnecessary intermediate results can be pruned down.
● Models RDF data in it’s native graph form.
● Examples: Trinity, TripleRush etc.
11
Trinity RDF
● Graph based implementation. Models RDF as a DAG.
● Subjects and objects are represented as a node.
● Predicate is represented as a directed labelled edge.
● Graph is stored in memory for fast access.
H. Wang, and Y. Li B. Shao, "The Trinity graph engine. Technical Report 161291, Microsoft Research," 12
Trinity Architecture
● Distributed in memory key value store.
● Partitions RDF graph across multiple machines by hashing on the nodes.
● Each machine holds a disjoint part of the graph.
● Final result is assembled at the proxy.
Jiacheng Yang, Haixun Wang, Bin Shao, Zhongyuan Wang Kai Zeng, "A Distributed Graph Engine for Web
Scale RDF Data,"
13
Methodology
● Use case Scenarios
○ Populating data into Cassandra Cluster
○ Building the RDF Graph
○ Querying the RDF Graph
○ Dropping the RDF Store
● Technologies used.
○ Apache Jena RDF API
○ Struts 2
○ Java/JSP/XSLT/XML/XPath
14
System Architecture
15
Demo
16
Evaluation and Result
● DBPedia benchmarking was used to compare.
● DBPedia geo-coordinates and homepages dataset was
used. Accounts for 0.7 million triples
● 4Store, Bigdata RDF stores were compared with our
implementation
● Queries used
○ Query One: Finds the homepage of the Metropolitan museum of Art
○ Query Two: Finds the Homepage of Kevin_Bacon
○ Query Three: Finds all the resources and their homepages which
reside near the area of Berlin.
○ Query Four: Finds all the resources and their homepages which reside
near the area of New York. 17
Benchmark Results
● Query complexity increases from Q1 through Q4.
● The execution time taken by different RDF stores, to execute above four queries.
● Query execution time is measured in ms.
Q1 Q2 Q3 Q4
Our
implementation
216ms 7ms 336ms 279ms
4Store 16ms 18ms 455ms 416ms
Bigdata 41ms 30ms 2sec, 355ms 1sec, 600ms
DBpedia. (2008, Jan 10.) RDF Store Benchmarks with DBpedia [Online]. Available:
http://wifo5-03.informatik.uni-mannheim.de/benchmarks-200801/ 18
Benchmarking Results
19
Benchmarking Results
20
Benchmarking Analysis
● Graph based approach yields more performance boosts
when query becomes more and more complex
● Complexity increases from Query 1 to 4 gradually.
● This implementation outperforms 4store and bigdata
especially when the complexity of the query increases.
● First query takes time, because it builds the index
structure.
21
Future Work
● Main limitation of the approach is Scalability.
● Larger datasets lead to OutOfMemory error while building the graph model.
● Solution: Distributed implementation
22
Conclusion
● Approaches used to model and retrieve RDF data.
● New approaches to manage RDF data efficiently.
● Graph based approach.
● New Implementation
○ Use case scenarios
○ Evaluation and result using DBPedia dataset
○ Benchmark Analysis
23

More Related Content

What's hot (20)

Eclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in JavaEclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in Java
Jeen Broekstra
 
Tracking data lineage at Stitch Fix
Tracking data lineage at Stitch FixTracking data lineage at Stitch Fix
Tracking data lineage at Stitch Fix
Stitch Fix Algorithms
 
RDF Seminar Presentation
RDF Seminar PresentationRDF Seminar Presentation
RDF Seminar Presentation
Muntazir Mehdi
 
RDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsRDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL Platforms
Graph-TA
 
Anatomy of Data Source API : A deep dive into Spark Data source API
Anatomy of Data Source API : A deep dive into Spark Data source APIAnatomy of Data Source API : A deep dive into Spark Data source API
Anatomy of Data Source API : A deep dive into Spark Data source API
datamantra
 
Scaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays SingaporeScaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays Singapore
Angad Singh
 
A compute infrastructure for data scientists
A compute infrastructure for data scientistsA compute infrastructure for data scientists
A compute infrastructure for data scientists
Stitch Fix Algorithms
 
Adventures in Linked Data Land (presentation by Richard Light)
Adventures in Linked Data Land (presentation by Richard Light)Adventures in Linked Data Land (presentation by Richard Light)
Adventures in Linked Data Land (presentation by Richard Light)
jottevanger
 
Superset druid realtime
Superset druid realtimeSuperset druid realtime
Superset druid realtime
arupmalakar
 
Deriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataDeriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF Data
Graph-TA
 
Enabling access to Linked Media with SPARQL-MM
Enabling access to Linked Media with SPARQL-MMEnabling access to Linked Media with SPARQL-MM
Enabling access to Linked Media with SPARQL-MM
Thomas Kurz
 
Publishing Linked Data 3/5 Semtech2011
Publishing Linked Data 3/5 Semtech2011Publishing Linked Data 3/5 Semtech2011
Publishing Linked Data 3/5 Semtech2011
Juan Sequeda
 
Scalable Web Data Management using RDF
Scalable Web Data Management using RDF  Scalable Web Data Management using RDF
Scalable Web Data Management using RDF
Navid Sedighpour
 
Data structures
Data structuresData structures
Data structures
BALUJAINSTITUTE
 
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019
Wes McKinney
 
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality AssessmentAre Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Harsh Thakkar
 
Drupal and the Semantic Web
Drupal and the Semantic WebDrupal and the Semantic Web
Drupal and the Semantic Web
Kristof Van Tomme
 
Extending Analytic Reach - From The Warehouse to The Data Lake by Mike Limcaco
Extending Analytic Reach - From The Warehouse to The Data Lake by Mike LimcacoExtending Analytic Reach - From The Warehouse to The Data Lake by Mike Limcaco
Extending Analytic Reach - From The Warehouse to The Data Lake by Mike Limcaco
Data Con LA
 
Extending Analytic Reach
Extending Analytic ReachExtending Analytic Reach
Extending Analytic Reach
Agilisium Consulting
 
Data pipelines observability: OpenLineage & Marquez
Data pipelines observability:  OpenLineage & MarquezData pipelines observability:  OpenLineage & Marquez
Data pipelines observability: OpenLineage & Marquez
Julien Le Dem
 
Eclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in JavaEclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in Java
Jeen Broekstra
 
RDF Seminar Presentation
RDF Seminar PresentationRDF Seminar Presentation
RDF Seminar Presentation
Muntazir Mehdi
 
RDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsRDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL Platforms
Graph-TA
 
Anatomy of Data Source API : A deep dive into Spark Data source API
Anatomy of Data Source API : A deep dive into Spark Data source APIAnatomy of Data Source API : A deep dive into Spark Data source API
Anatomy of Data Source API : A deep dive into Spark Data source API
datamantra
 
Scaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays SingaporeScaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays Singapore
Angad Singh
 
A compute infrastructure for data scientists
A compute infrastructure for data scientistsA compute infrastructure for data scientists
A compute infrastructure for data scientists
Stitch Fix Algorithms
 
Adventures in Linked Data Land (presentation by Richard Light)
Adventures in Linked Data Land (presentation by Richard Light)Adventures in Linked Data Land (presentation by Richard Light)
Adventures in Linked Data Land (presentation by Richard Light)
jottevanger
 
Superset druid realtime
Superset druid realtimeSuperset druid realtime
Superset druid realtime
arupmalakar
 
Deriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataDeriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF Data
Graph-TA
 
Enabling access to Linked Media with SPARQL-MM
Enabling access to Linked Media with SPARQL-MMEnabling access to Linked Media with SPARQL-MM
Enabling access to Linked Media with SPARQL-MM
Thomas Kurz
 
Publishing Linked Data 3/5 Semtech2011
Publishing Linked Data 3/5 Semtech2011Publishing Linked Data 3/5 Semtech2011
Publishing Linked Data 3/5 Semtech2011
Juan Sequeda
 
Scalable Web Data Management using RDF
Scalable Web Data Management using RDF  Scalable Web Data Management using RDF
Scalable Web Data Management using RDF
Navid Sedighpour
 
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019
Wes McKinney
 
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality AssessmentAre Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Harsh Thakkar
 
Extending Analytic Reach - From The Warehouse to The Data Lake by Mike Limcaco
Extending Analytic Reach - From The Warehouse to The Data Lake by Mike LimcacoExtending Analytic Reach - From The Warehouse to The Data Lake by Mike Limcaco
Extending Analytic Reach - From The Warehouse to The Data Lake by Mike Limcaco
Data Con LA
 
Data pipelines observability: OpenLineage & Marquez
Data pipelines observability:  OpenLineage & MarquezData pipelines observability:  OpenLineage & Marquez
Data pipelines observability: OpenLineage & Marquez
Julien Le Dem
 

Viewers also liked (9)

WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2
 
Deploying WSO2 Middleware on Mesos
Deploying WSO2 Middleware on MesosDeploying WSO2 Middleware on Mesos
Deploying WSO2 Middleware on Mesos
Imesh Gunaratne
 
WSO2Con EU 2016: Integrate APIM to Third-party Tools: Creating an Agent for ELK
WSO2Con EU 2016: Integrate APIM to Third-party Tools:  Creating an Agent for ELKWSO2Con EU 2016: Integrate APIM to Third-party Tools:  Creating an Agent for ELK
WSO2Con EU 2016: Integrate APIM to Third-party Tools: Creating an Agent for ELK
WSO2
 
Deploying WSO2 Middleware on Kubernetes
Deploying WSO2 Middleware on KubernetesDeploying WSO2 Middleware on Kubernetes
Deploying WSO2 Middleware on Kubernetes
Imesh Gunaratne
 
WSO2 Identity Server - Product Overview
WSO2 Identity Server - Product OverviewWSO2 Identity Server - Product Overview
WSO2 Identity Server - Product Overview
WSO2
 
Enhanced Developer Experience with WSO2 Enterprise Service Bus Tooling
Enhanced Developer Experience with WSO2 Enterprise Service Bus ToolingEnhanced Developer Experience with WSO2 Enterprise Service Bus Tooling
Enhanced Developer Experience with WSO2 Enterprise Service Bus Tooling
WSO2
 
Resilient Enterprise Messaging with WSO2 ESB
Resilient Enterprise Messaging with WSO2 ESBResilient Enterprise Messaging with WSO2 ESB
Resilient Enterprise Messaging with WSO2 ESB
Ravindra Ranwala
 
Solution Architecture Patterns for Digital Transformation
Solution Architecture Patterns for Digital TransformationSolution Architecture Patterns for Digital Transformation
Solution Architecture Patterns for Digital Transformation
WSO2
 
2016 Year End Webinar - Are You Ready for Digital Transformation?
2016 Year End Webinar - Are You Ready for Digital Transformation?2016 Year End Webinar - Are You Ready for Digital Transformation?
2016 Year End Webinar - Are You Ready for Digital Transformation?
WSO2
 
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2
 
Deploying WSO2 Middleware on Mesos
Deploying WSO2 Middleware on MesosDeploying WSO2 Middleware on Mesos
Deploying WSO2 Middleware on Mesos
Imesh Gunaratne
 
WSO2Con EU 2016: Integrate APIM to Third-party Tools: Creating an Agent for ELK
WSO2Con EU 2016: Integrate APIM to Third-party Tools:  Creating an Agent for ELKWSO2Con EU 2016: Integrate APIM to Third-party Tools:  Creating an Agent for ELK
WSO2Con EU 2016: Integrate APIM to Third-party Tools: Creating an Agent for ELK
WSO2
 
Deploying WSO2 Middleware on Kubernetes
Deploying WSO2 Middleware on KubernetesDeploying WSO2 Middleware on Kubernetes
Deploying WSO2 Middleware on Kubernetes
Imesh Gunaratne
 
WSO2 Identity Server - Product Overview
WSO2 Identity Server - Product OverviewWSO2 Identity Server - Product Overview
WSO2 Identity Server - Product Overview
WSO2
 
Enhanced Developer Experience with WSO2 Enterprise Service Bus Tooling
Enhanced Developer Experience with WSO2 Enterprise Service Bus ToolingEnhanced Developer Experience with WSO2 Enterprise Service Bus Tooling
Enhanced Developer Experience with WSO2 Enterprise Service Bus Tooling
WSO2
 
Resilient Enterprise Messaging with WSO2 ESB
Resilient Enterprise Messaging with WSO2 ESBResilient Enterprise Messaging with WSO2 ESB
Resilient Enterprise Messaging with WSO2 ESB
Ravindra Ranwala
 
Solution Architecture Patterns for Digital Transformation
Solution Architecture Patterns for Digital TransformationSolution Architecture Patterns for Digital Transformation
Solution Architecture Patterns for Digital Transformation
WSO2
 
2016 Year End Webinar - Are You Ready for Digital Transformation?
2016 Year End Webinar - Are You Ready for Digital Transformation?2016 Year End Webinar - Are You Ready for Digital Transformation?
2016 Year End Webinar - Are You Ready for Digital Transformation?
WSO2
 

Similar to Graph basedrdf storeforapachecassandra (20)

Big_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_SessionBig_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_Session
RUHULAMINHAZARIKA
 
A Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL Databases
A Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL DatabasesA Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL Databases
A Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL Databases
Luiz Henrique Zambom Santana
 
Scala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadiusScala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadius
BoldRadius Solutions
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
Demi Ben-Ari
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Fabrizio Orlandi
 
HPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL EcosystemHPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL Ecosystem
Adam Marcus
 
The NoSQL Ecosystem
The NoSQL Ecosystem The NoSQL Ecosystem
The NoSQL Ecosystem
yarapavan
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
Ontotext
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
datamantra
 
Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012
scorlosquet
 
Data processing with spark in r &amp; python
Data processing with spark in r &amp; pythonData processing with spark in r &amp; python
Data processing with spark in r &amp; python
Maloy Manna, PMP®
 
Python Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB WorkshopPython Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB Workshop
Joe Drumgoole
 
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
Alexey Zinoviev
 
The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013
scorlosquet
 
Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014
mahchiev
 
Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...
Michele Pasin
 
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsfPyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
sasuke20y4sh
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
Robbie Strickland
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
Synergetics Learning and Cloud Consulting
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
Tuan Luong
 
Big_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_SessionBig_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_Session
RUHULAMINHAZARIKA
 
A Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL Databases
A Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL DatabasesA Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL Databases
A Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL Databases
Luiz Henrique Zambom Santana
 
Scala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadiusScala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadius
BoldRadius Solutions
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
Demi Ben-Ari
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Fabrizio Orlandi
 
HPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL EcosystemHPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL Ecosystem
Adam Marcus
 
The NoSQL Ecosystem
The NoSQL Ecosystem The NoSQL Ecosystem
The NoSQL Ecosystem
yarapavan
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
Ontotext
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
datamantra
 
Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012
scorlosquet
 
Data processing with spark in r &amp; python
Data processing with spark in r &amp; pythonData processing with spark in r &amp; python
Data processing with spark in r &amp; python
Maloy Manna, PMP®
 
Python Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB WorkshopPython Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB Workshop
Joe Drumgoole
 
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
Alexey Zinoviev
 
The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013
scorlosquet
 
Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014
mahchiev
 
Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...
Michele Pasin
 
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsfPyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
sasuke20y4sh
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
Robbie Strickland
 

Recently uploaded (20)

Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptxLidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
RishavKumar530754
 
QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)
rccbatchplant
 
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptxExplainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
MahaveerVPandit
 
The Gaussian Process Modeling Module in UQLab
The Gaussian Process Modeling Module in UQLabThe Gaussian Process Modeling Module in UQLab
The Gaussian Process Modeling Module in UQLab
Journal of Soft Computing in Civil Engineering
 
Metal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistryMetal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistry
mee23nu
 
theory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptxtheory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptx
sanchezvanessa7896
 
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Journal of Soft Computing in Civil Engineering
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
fluke dealers in bangalore..............
fluke dealers in bangalore..............fluke dealers in bangalore..............
fluke dealers in bangalore..............
Haresh Vaswani
 
introduction to machine learining for beginers
introduction to machine learining for beginersintroduction to machine learining for beginers
introduction to machine learining for beginers
JoydebSheet
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
Degree_of_Automation.pdf for Instrumentation and industrial specialist
Degree_of_Automation.pdf for  Instrumentation  and industrial specialistDegree_of_Automation.pdf for  Instrumentation  and industrial specialist
Degree_of_Automation.pdf for Instrumentation and industrial specialist
shreyabhosale19
 
Avnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights FlyerAvnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights Flyer
WillDavies22
 
DSP and MV the Color image processing.ppt
DSP and MV the  Color image processing.pptDSP and MV the  Color image processing.ppt
DSP and MV the Color image processing.ppt
HafizAhamed8
 
Raish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdfRaish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdf
RaishKhanji
 
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design ThinkingDT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DhruvChotaliya2
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
Reagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptxReagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptx
AlejandroOdio
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptxLidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
RishavKumar530754
 
QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)
rccbatchplant
 
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptxExplainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
MahaveerVPandit
 
Metal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistryMetal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistry
mee23nu
 
theory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptxtheory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptx
sanchezvanessa7896
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
fluke dealers in bangalore..............
fluke dealers in bangalore..............fluke dealers in bangalore..............
fluke dealers in bangalore..............
Haresh Vaswani
 
introduction to machine learining for beginers
introduction to machine learining for beginersintroduction to machine learining for beginers
introduction to machine learining for beginers
JoydebSheet
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
Degree_of_Automation.pdf for Instrumentation and industrial specialist
Degree_of_Automation.pdf for  Instrumentation  and industrial specialistDegree_of_Automation.pdf for  Instrumentation  and industrial specialist
Degree_of_Automation.pdf for Instrumentation and industrial specialist
shreyabhosale19
 
Avnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights FlyerAvnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights Flyer
WillDavies22
 
DSP and MV the Color image processing.ppt
DSP and MV the  Color image processing.pptDSP and MV the  Color image processing.ppt
DSP and MV the Color image processing.ppt
HafizAhamed8
 
Raish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdfRaish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdf
RaishKhanji
 
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design ThinkingDT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DhruvChotaliya2
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
Reagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptxReagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptx
AlejandroOdio
 

Graph basedrdf storeforapachecassandra

  • 1. Building a Graph based RDF Store for Apache Cassandra Name: Ravindra Ranwala ID: 138227T Supervisor: Dr. Amal Shehan Perera 1
  • 2. Agenda ● Introduction ● Basic Concepts ● The Problem ● Literature Review ● Methodology ● Demo ● Evaluation and Result ● Conclusion 2
  • 3. Introduction ● RDFs are used to support queries in the semantic web. ● RDF stores contain trillions of triples. ● Today RDF data is everywhere - commercial search engines proliferate RDF data ex. Google, yahoo, bing etc. ● SPARQL - used as a query language. ● Different approaches exists to build a triple store. ● Main challenges are system scalability and generality. 3
  • 4. Basic Concepts - RDF Triple ● RDF dataset consists of statements in the form of (subject, predicate, object) ● Subject has a predicate property whose value is the object. ● Examples: <Titanic, has award, Best picture> ● Core of the semantic web is built on top of the RDF data model. ● These triples can be stored in different ways. 4
  • 5. The Problem ● Apache Cassandra is a Nosql, multi tenant and multi data centric database. ● Our objective is to build a scalable RDF store for Apache Cassandra. ● Cassandra is used by eBay, Twitter, Cisco, etc. ● This will exponentially increase the value of Cassandra. ● The largest known Cassandra cluster has 300 TB of data over 400 machines. ● This motivates us to build a distributed, scalable RDF store to answer user queries on them efficiently. 5
  • 6. Literature Review - Concepts ● A triple store can be built on top of any DBMS or File system. ● RDF dataset consists of statements in the form of <subject, predicate, object> ● Subject has a predicate property whose value is object. ● Ex. <person1, name, Mike> ● A typical triple store holds a multi millions/billions of such triples. ● Efficient and scalable management of RDF data is a fundamental challenge. ● SPARQL queries are submitted to the RDF store. Jiacheng Yang, Haixun Wang, Bin Shao, Zhongyuan Wang Kai Zeng, "A Distributed Graph Engine for Web Scale RDF Data," 6
  • 7. Apache Cassandra ● Distributed, fault tolerant (i.e. no single point of failures), post relational, Nosql database system. ● Peer to peer distributed architecture. Supports both strict and eventual consistency. ● All the nodes are the same. There is no master and slave nodes. ● Uses read/write anywhere style architecture. DataStax Corporation. (2011, October) “Welcome to Apache Cassandra 1.0” 7
  • 8. Triple store –approaches ● There are different approaches the exist to manage RDF data. ● Each approach has it’s own advantages and disadvantages. 8
  • 9. Relational Approach ● Triples are stored using the relational model. Justin J. Levandoski F. Mokbel, "RDF Data-Centric Storage," 9
  • 10. Relational Approach (contd.) ● Triple store - yields costly self joins of a huge RDF store (trillions of triples) ● N-array - eliminates the need for joins, but leads to higher number of nulls. ● reduces null storage, but introduces costly join. 10
  • 11. Graph based approaches ● New approach that greatly improves the performance of SPARQL query processing ● Graph exploration instead of joins. ● Unnecessary intermediate results can be pruned down. ● Models RDF data in it’s native graph form. ● Examples: Trinity, TripleRush etc. 11
  • 12. Trinity RDF ● Graph based implementation. Models RDF as a DAG. ● Subjects and objects are represented as a node. ● Predicate is represented as a directed labelled edge. ● Graph is stored in memory for fast access. H. Wang, and Y. Li B. Shao, "The Trinity graph engine. Technical Report 161291, Microsoft Research," 12
  • 13. Trinity Architecture ● Distributed in memory key value store. ● Partitions RDF graph across multiple machines by hashing on the nodes. ● Each machine holds a disjoint part of the graph. ● Final result is assembled at the proxy. Jiacheng Yang, Haixun Wang, Bin Shao, Zhongyuan Wang Kai Zeng, "A Distributed Graph Engine for Web Scale RDF Data," 13
  • 14. Methodology ● Use case Scenarios ○ Populating data into Cassandra Cluster ○ Building the RDF Graph ○ Querying the RDF Graph ○ Dropping the RDF Store ● Technologies used. ○ Apache Jena RDF API ○ Struts 2 ○ Java/JSP/XSLT/XML/XPath 14
  • 17. Evaluation and Result ● DBPedia benchmarking was used to compare. ● DBPedia geo-coordinates and homepages dataset was used. Accounts for 0.7 million triples ● 4Store, Bigdata RDF stores were compared with our implementation ● Queries used ○ Query One: Finds the homepage of the Metropolitan museum of Art ○ Query Two: Finds the Homepage of Kevin_Bacon ○ Query Three: Finds all the resources and their homepages which reside near the area of Berlin. ○ Query Four: Finds all the resources and their homepages which reside near the area of New York. 17
  • 18. Benchmark Results ● Query complexity increases from Q1 through Q4. ● The execution time taken by different RDF stores, to execute above four queries. ● Query execution time is measured in ms. Q1 Q2 Q3 Q4 Our implementation 216ms 7ms 336ms 279ms 4Store 16ms 18ms 455ms 416ms Bigdata 41ms 30ms 2sec, 355ms 1sec, 600ms DBpedia. (2008, Jan 10.) RDF Store Benchmarks with DBpedia [Online]. Available: http://wifo5-03.informatik.uni-mannheim.de/benchmarks-200801/ 18
  • 21. Benchmarking Analysis ● Graph based approach yields more performance boosts when query becomes more and more complex ● Complexity increases from Query 1 to 4 gradually. ● This implementation outperforms 4store and bigdata especially when the complexity of the query increases. ● First query takes time, because it builds the index structure. 21
  • 22. Future Work ● Main limitation of the approach is Scalability. ● Larger datasets lead to OutOfMemory error while building the graph model. ● Solution: Distributed implementation 22
  • 23. Conclusion ● Approaches used to model and retrieve RDF data. ● New approaches to manage RDF data efficiently. ● Graph based approach. ● New Implementation ○ Use case scenarios ○ Evaluation and result using DBPedia dataset ○ Benchmark Analysis 23