SlideShare a Scribd company logo
Jerven Bolleman
Swiss-Prot Group
Semantic Variation Graphs
the case for
RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQL
Jerven Bolleman
Swiss-Prot Group
Resource Description Framework
Subjec
t
ObjectPredicate
Resource Description Framework
Resource Description Framework
Subjec
t
ObjectPredicate
a
Resource Description Framework
Resource Description Framework
Subjec
t
ObjectPredicate
a
Resource Description Framework
Virtuoso
Universal Server
Lots of SPARQL databases
Resource Description Framework
✔︎
RDF
Turtle
RDFa
inside
HTML
N-
Triples
RDF
/
THRIF
T
JSON-
LD
RDF
/
XML
Resource Description Framework
RDF
Turtle
RDFa
inside
HTML
N-
Triples
RDF
/
THRIF
T
JSON-
LD
RDF
/
XML
Resource Description Framework
Nodes and Edges are Resources
• Resource → Identified by a URI
– http://purl.uniprot.org/core/
– urn:guid:21EC2020-3AEA-4069-A2DD-08002B30309D
– mailto:help@uniprot.org
– urb:isbn:978-3-16-148410-0
• Nice if public but not a requirement
Resource Description Framework
Terminal edges are literals
• String (xsd:string)
“P53”
• Date (xsd:date & xsd:dateTime)
"1987-08-13"^^xsd:date
• Numbers (xsd:int & xsd:decimal & …)
1 or “1”^^xsd:integer or -1.1 or “-1.1”^^xsd:decimal
• Language string
“Switzerland”@en
“Suisse”@fr
“Schweiz”@de
“Svizzera”@it
Resource Description Framework
Others use it too, and are cross query-able
13
one party evolves data format
everyone evolves data format
Protocol Buffers
Google's data interchange formatGFF
Jerven Bolleman
Swiss-Prot Group
AC
4 nodes
15
ACTG
T
GA
Variation Graph as RDF
T
4 nodes
16
1
2
4
3
AC
ACTG GA
base
<uri of vg schema>
prefix
node:<uri of vg graph>
node:1 a <Node> ;
rdf:value “ACTG” .
node:2 a <Node> ;
rdf:value “AC” .
node:3 a <Node> ;
rdf:value “T” .
node:4 a <Node> ;
rdf:value “GA”
Variation Graph as RDF
T
4 nodes
17
1
2
4
3
AC
ACTG GA
base
<uri of vg schema>
prefix
node:<uri of vg graph>
node:1
<linksForwardToForward>
node:2 , node:3 .
node:2
<linksForwardToForward>
node:4 .
node:3
<linksForwardToForward>
node:4 .
Variation Graph as RDF
T
4 nodes → 1 Path
18
1
2
4
3
AC
ACTG GA
base
<uri of vg schema>
prefix
n:<uri of vg graph>
path:1 a <Path> ;
rdfs:label “Genome of
patient a” ;
rdfs:comment “Paths
through VG make linear
sequences, e.g. a reference
genome or a patient
assembly”
Variation Graph as RDF
T
4 nodes → 1 Path → 3 Steps
19
1
2
4
3
AC
ACTG GA
base
<uri of vg schema>
prefix
n:<uri of vg graph>
step:1 a <Step> ;
<node> node:1 ;
<rank> 1 ;
<path> path:1 .
step:2 a <Step> ;
<node> node:2 ;
<rank> 2 ;
<path> path:1 .
Variation Graph as RDF
Jerven Bolleman
Swiss-Prot Group
Build a “FASTA” from a VG
21
PREFIX vg:<http://example.org/vg/>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?path
(group_concat(?sequence; separator="") as ?pathSeq)
WHERE {
[] vg:path ?path;
vg:node ?node;
vg:rank ?rank.
?node rdf:value ?sequence
}
GROUP BY ?path
ORDER BY ?rank
Variation Graph as RDF
22
PREFIX vg:<http://example.org/vg/>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?path
(group_concat(?sequence; separator="") as ?pathSeq)
WHERE {
[] vg:path ?path;
vg:node ?node;
vg:rank ?rank.
?node rdf:value ?sequence
}
GROUP BY ?path
ORDER BY ?rank
Build a “FASTA” from a VG
23
PREFIX vg:<http://example.org/vg/>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?path
(group_concat(?sequence; separator="") as ?pathSeq)
WHERE {
[] vg:path ?path;
vg:node ?node;
vg:rank ?rank.
?node rdf:value ?sequence
}
GROUP BY ?path
ORDER BY ?rank
Build a “FASTA” from a VG
PREFIX vg:<http://example.org/vg/>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?path
(group_concat(?sequence; separator="") as ?pathSeq)
WHERE {
[] vg:path ?path;
vg:node ?node;
vg:rank ?rank.
?node rdf:value ?sequence
}
GROUP BY ?path
ORDER BY ?rank
24
Build a “FASTA” from a VG
SPARQL a standard query language
See VG WIKI for more examples
VG 1000 Genomes → 50 GB on disk in DB
VG 100,000 Genomes → ±2 TB on disk in DB
Querying a Variation Graph
Summary
• RDF
– simple data model
– consistent identifiers
– anyone can say anything about anything
• SPARQL
– graph query language
– wide scale commercial deployment
– HTTP|REST in the box
– in clinical use
– federated queries on user demand
– can be used for variation graphs
Questions?
27

More Related Content

What's hot (20)

PDF
Data quality in Real Estate
Dimitris Kontokostas
 
PPT
Semantic web an overview and projects
Pranali Gedam-Khobragade
 
PDF
HyperGraphQL
Szymon Klarman
 
PPTX
LD4KD 2015 - Demos and tools
Vrije Universiteit Amsterdam
 
PDF
What_do_Knowledge_Graph_Embeddings_Learn.pdf
Heiko Paulheim
 
PDF
Two graph data models : RDF and Property Graphs
andyseaborne
 
PPTX
RDF SHACL, Annotations, and Data Frames
Kurt Cagle
 
PDF
Find your way in Graph labyrinths
Daniel Camarda
 
PPTX
SHACL: Shaping the Big Ball of Data Mud
Richard Cyganiak
 
PDF
JSON-LD and SHACL for Knowledge Graphs
Franz Inc. - AllegroGraph
 
PDF
Linked Open Data: A simple how-to
nvitucci
 
PDF
Linked Data, Ontologies and Inference
Barry Norton
 
PPT
Rdf And Rdf Schema For Ontology Specification
chenjennan
 
PDF
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
Ontotext
 
ODP
Data Integration And Visualization
Ivan Ermilov
 
PPT
Linked data and voyager
Edmund Chamberlain
 
PPTX
What's New in RDF 1.1?
Richard Cyganiak
 
PDF
Rdf
cyberswat
 
PPTX
Querying the Web of Data
Rinke Hoekstra
 
PDF
LOD(Linked Open Data) Recommendations
Myungjin Lee
 
Data quality in Real Estate
Dimitris Kontokostas
 
Semantic web an overview and projects
Pranali Gedam-Khobragade
 
HyperGraphQL
Szymon Klarman
 
LD4KD 2015 - Demos and tools
Vrije Universiteit Amsterdam
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
Heiko Paulheim
 
Two graph data models : RDF and Property Graphs
andyseaborne
 
RDF SHACL, Annotations, and Data Frames
Kurt Cagle
 
Find your way in Graph labyrinths
Daniel Camarda
 
SHACL: Shaping the Big Ball of Data Mud
Richard Cyganiak
 
JSON-LD and SHACL for Knowledge Graphs
Franz Inc. - AllegroGraph
 
Linked Open Data: A simple how-to
nvitucci
 
Linked Data, Ontologies and Inference
Barry Norton
 
Rdf And Rdf Schema For Ontology Specification
chenjennan
 
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
Ontotext
 
Data Integration And Visualization
Ivan Ermilov
 
Linked data and voyager
Edmund Chamberlain
 
What's New in RDF 1.1?
Richard Cyganiak
 
Querying the Web of Data
Rinke Hoekstra
 
LOD(Linked Open Data) Recommendations
Myungjin Lee
 

Viewers also liked (6)

PPTX
Translating natural language competency questions into sparql queries web2013
Leila Zemmouchi-Ghomari
 
PPTX
Use Cases and Vocabularies Related to the DDI-RDF Discovery Vocabulary (EDDI ...
Dr.-Ing. Thomas Hartmann
 
PPT
Extensible use of RDF
Kerstin Forsberg
 
PDF
Bigdive 2014 - RDF, principles and case studies
Diego Valerio Camarda
 
PDF
Isa case study_how_to_describe_organizations_in_rdf_core_business_vocabulary
Semic.eu
 
PDF
UniProt and the Semantic Web
Chimezie Ogbuji
 
Translating natural language competency questions into sparql queries web2013
Leila Zemmouchi-Ghomari
 
Use Cases and Vocabularies Related to the DDI-RDF Discovery Vocabulary (EDDI ...
Dr.-Ing. Thomas Hartmann
 
Extensible use of RDF
Kerstin Forsberg
 
Bigdive 2014 - RDF, principles and case studies
Diego Valerio Camarda
 
Isa case study_how_to_describe_organizations_in_rdf_core_business_vocabulary
Semic.eu
 
UniProt and the Semantic Web
Chimezie Ogbuji
 
Ad

Similar to Semantic Variation Graphs the case for RDF & SPARQL (20)

PPT
A hands on overview of the semantic web
Marakana Inc.
 
ODP
SPARQL 1.1 Update (2013-03-05)
andyseaborne
 
PDF
Yokohama Art Spot meets SPARQL
Fuyuko Matsumura
 
PDF
Sparql service-description
STIinnsbruck
 
PPT
Data in RDF
Emanuele Della Valle
 
PPTX
The Semantic Web #10 - SPARQL
Myungjin Lee
 
PDF
SFScon 2020 - Peter Hopfgartner - Open Data de luxe
South Tyrol Free Software Conference
 
PDF
A Hands On Overview Of The Semantic Web
Shamod Lacoul
 
PPT
Sesam4 project presentation sparql - april 2011
sesam4able
 
PPT
Sesam4 project presentation sparql - april 2011
Robert Engels
 
PPT
Facet: Building Web Pages with SPARQL
Leigh Dodds
 
PPTX
A Little SPARQL in your Analytics
Dr. Neil Brittliff
 
PPTX
SWT Lecture Session 3 - SPARQL
Mariano Rodriguez-Muro
 
PPTX
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
Kai Schlegel
 
PDF
Functional manipulations of large data graphs 20160601
David Wood
 
PPT
LarKC Tutorial at ISWC 2009 - Data Model
LarKC
 
PDF
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
Rathachai Chawuthai
 
PDF
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
Olaf Hartig
 
PDF
The Semantics of SPARQL
Olaf Hartig
 
PDF
RDFauthor (EKAW)
Norman Heino
 
A hands on overview of the semantic web
Marakana Inc.
 
SPARQL 1.1 Update (2013-03-05)
andyseaborne
 
Yokohama Art Spot meets SPARQL
Fuyuko Matsumura
 
Sparql service-description
STIinnsbruck
 
The Semantic Web #10 - SPARQL
Myungjin Lee
 
SFScon 2020 - Peter Hopfgartner - Open Data de luxe
South Tyrol Free Software Conference
 
A Hands On Overview Of The Semantic Web
Shamod Lacoul
 
Sesam4 project presentation sparql - april 2011
sesam4able
 
Sesam4 project presentation sparql - april 2011
Robert Engels
 
Facet: Building Web Pages with SPARQL
Leigh Dodds
 
A Little SPARQL in your Analytics
Dr. Neil Brittliff
 
SWT Lecture Session 3 - SPARQL
Mariano Rodriguez-Muro
 
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
Kai Schlegel
 
Functional manipulations of large data graphs 20160601
David Wood
 
LarKC Tutorial at ISWC 2009 - Data Model
LarKC
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
Rathachai Chawuthai
 
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
Olaf Hartig
 
The Semantics of SPARQL
Olaf Hartig
 
RDFauthor (EKAW)
Norman Heino
 
Ad

More from Jerven Bolleman (8)

PDF
Why sparql tohu
Jerven Bolleman
 
PDF
RDF: what and why plus a SPARQL tutorial
Jerven Bolleman
 
PDF
UniProtKB/Swiss-Prot:Why sparql?
Jerven Bolleman
 
PDF
sparql,uniprot.org in production
Jerven Bolleman
 
PDF
The UniProt SPARQL endpoint: 20 billion quads in production
Jerven Bolleman
 
PPT
Biohackathon2013: Tripling Bioinformatics Productivity
Jerven Bolleman
 
PDF
Learning sparql 2012 12
Jerven Bolleman
 
PDF
Uni protsparqlcloud
Jerven Bolleman
 
Why sparql tohu
Jerven Bolleman
 
RDF: what and why plus a SPARQL tutorial
Jerven Bolleman
 
UniProtKB/Swiss-Prot:Why sparql?
Jerven Bolleman
 
sparql,uniprot.org in production
Jerven Bolleman
 
The UniProt SPARQL endpoint: 20 billion quads in production
Jerven Bolleman
 
Biohackathon2013: Tripling Bioinformatics Productivity
Jerven Bolleman
 
Learning sparql 2012 12
Jerven Bolleman
 
Uni protsparqlcloud
Jerven Bolleman
 

Recently uploaded (20)

PPTX
Single-Cell Multi-Omics in Neurodegeneration p1.pptx
KanakChaudhary10
 
PPTX
General properties of connective tissue.pptx
shrishtiv82
 
PDF
Human-to-Robot Handovers track - RGMC - ICRA 2025
Alessio Xompero
 
PPTX
Fake Science: Where it comes from and how to avoid beign part of it
Leonid Schneider
 
PPTX
MEDICINAL CHEMISTRY PROSPECTIVES IN DESIGN OF EGFR INHIBITORS.pptx
40RevathiP
 
PPTX
Human-AI Interaction in Space: Insights from a Mars Analog Mission with the H...
Jean Vanderdonckt
 
PDF
The First Detection of Molecular Activity in the Largest Known Oort Cloud Com...
Sérgio Sacani
 
PPTX
Indian Podophyllum [Autosaved].pptx done
TahirChowdhary1
 
PDF
Can Consciousness Live and Travel Through Quantum AI?
Saikat Basu
 
PPT
states_of_matter.ppt presentation for grade 9
ROLANARIBATO3
 
PDF
CERT Basic Training PTT, Brigadas comunitarias
chavezvaladezjuan
 
PPTX
(Normal Mechanism)physiology of labour.pptx
DavidSalman2
 
PDF
Enzyme Kinetics_Lecture 8.5.2025 Enzymology.pdf
ayeshaalibukhari125
 
PDF
Agentic AI: Autonomy, Accountability, and the Algorithmic Society
vs5qkn48td
 
DOCX
Transportation in plants and animals.docx
bhatbashir421
 
PDF
Disk Evolution Study Through Imaging of Nearby Young Stars (DESTINYS): Eviden...
Sérgio Sacani
 
PDF
Driving down costs for fermentation: Recommendations from techno-economic data
The Good Food Institute
 
PPTX
Cancer
Vartika
 
PPTX
Liquid Biopsy Biomarkers for early Diagnosis
KanakChaudhary10
 
PDF
Herbal Excipients: Natural Colorants & Perfumery Agents
Seacom Skills University
 
Single-Cell Multi-Omics in Neurodegeneration p1.pptx
KanakChaudhary10
 
General properties of connective tissue.pptx
shrishtiv82
 
Human-to-Robot Handovers track - RGMC - ICRA 2025
Alessio Xompero
 
Fake Science: Where it comes from and how to avoid beign part of it
Leonid Schneider
 
MEDICINAL CHEMISTRY PROSPECTIVES IN DESIGN OF EGFR INHIBITORS.pptx
40RevathiP
 
Human-AI Interaction in Space: Insights from a Mars Analog Mission with the H...
Jean Vanderdonckt
 
The First Detection of Molecular Activity in the Largest Known Oort Cloud Com...
Sérgio Sacani
 
Indian Podophyllum [Autosaved].pptx done
TahirChowdhary1
 
Can Consciousness Live and Travel Through Quantum AI?
Saikat Basu
 
states_of_matter.ppt presentation for grade 9
ROLANARIBATO3
 
CERT Basic Training PTT, Brigadas comunitarias
chavezvaladezjuan
 
(Normal Mechanism)physiology of labour.pptx
DavidSalman2
 
Enzyme Kinetics_Lecture 8.5.2025 Enzymology.pdf
ayeshaalibukhari125
 
Agentic AI: Autonomy, Accountability, and the Algorithmic Society
vs5qkn48td
 
Transportation in plants and animals.docx
bhatbashir421
 
Disk Evolution Study Through Imaging of Nearby Young Stars (DESTINYS): Eviden...
Sérgio Sacani
 
Driving down costs for fermentation: Recommendations from techno-economic data
The Good Food Institute
 
Cancer
Vartika
 
Liquid Biopsy Biomarkers for early Diagnosis
KanakChaudhary10
 
Herbal Excipients: Natural Colorants & Perfumery Agents
Seacom Skills University
 

Semantic Variation Graphs the case for RDF & SPARQL

Editor's Notes

  • #3: I come from a strange land over the mountain where we deal with proteins ;) We even have our own 20 letter alphabet instead that 4 letter one
  • #4: It’s a lot of content for 20 minutes so be sure to look at the VG wiki and ask questions on the mailing list afterwards.
  • #5: Models your information in simple sentences. Key is the relationship between entities/resources/things
  • #6: RDF is a W3C family of standards
  • #7: You can query RDF data with SPARQL. A sql like query language for Graphs.
  • #8: There is wide commercial support for SPARQL We use Virtuoso 7 a column store DB for UniProt.org See 3 Oracle logo’s its because they have 3 implementations They have different use cases. Franz has a clinical deployment with 4 trillion (10^12) edges These are some of the interesting ones with real commercial success
  • #9: RDF is a conceptual model. You can serialise the same information in different ways. Each has their own usecases
  • #10: The examples I show later will use Turtle, it closely aligns with the SPARQL syntax.
  • #11: Key part of RDF graphs, all nodes are identified using URIs. One should be able to paste these into a browser and get some extra information (dereference). This is not an obligation.
  • #12: At the edge of the graph are literals. Numbers etc…
  • #13: Some are using OWL see your own ontology working group. But OWL and RDF are like two hands on one content and full belly ;)
  • #14: XML, AVRO, Protobuf, Thrift => 1 party controls the xsd, schema. RDF as long as information is additive it is compatible.
  • #16: I assume you have seen a VG railroad schema Here there a 4 nodes
  • #17: Using the turtle syntax we describe the 4 nodes and the sequence fragments they represent. We won’t be talking about node identifiers. There is a proposal on the VG wiki.
  • #18: They are simply linked in the forward to forward strand connection.
  • #19: One path representing a linear sequence.
  • #20: Steps say in which path they are and which rank they have. This allows us to allow arbitrary subgraphs to be serialised out while maintain a correct link to the whole graph
  • #22: This is a simple query which builds a “FASTA” from the VG graph. Showing we can get the input data out again.
  • #23: We want to see a ?path and its associated sequence ?pathSeq. which is concatination of all its original ?sequences.
  • #24: We need to get all the sequences from the nodes
  • #25: We need to group by path because there will be many. Ordering by rank is important to get the concated sequence in the correct order
  • #26: See GA4GH issue for napkin math on disk usage.
  • #28: While the SIB pays my salary, this work would not have happened without the NBDC/DBCLS biohackathon series. As well as the welcome by Erik Garrison and the rest of the VG team.