RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013

Relational Database to RDF
(RDB2RDF)
Tutorial
International Semantic Web
Conference ISWC2013
Juan F. Sequeda
Daniel P. Miranker

Barry Norton

RDB2RDF Tutorial
Introduction
Juan F. Sequeda
Daniel P. Miranker

Barry Norton

What is RDB2RDF?
Alice

Person
ID NAME

AGE

CID

1

Alice

25

100

2

Bob

NULL

100

foaf:name

25

Alice

foaf:age

<Person/1>

foaf:name

<Person/2>

foaf:based_near

City
CID

NAME

100

Austin

200

Madrid

<City/100>

<City/200>
www.rdb2rdf.org - ISWC2013

foaf:name

foaf:name

Austin

Madrid

Context
RDF
Data Management

(RDB2RDF)

Wrapper
Systems

Extract-Transform-Load
(ETL)

Native
Triplestores

Triplestores

RDBMS-backed
NoSQL
Triplestores
Triplestores

Outline
• Historical Overview
• 4 Scenarios
• Overview W3C RDB2RDF Standards
– Direct Mapping
– R2RML


F2F Meeting
ISWC 2008

March 2008

1. Recommendation
to standardize a
mapping language
2. RDB2RDF Survey

October 2008

February 2009

(1) http://www.w3.org/2005/Incubator/rdb2rdf/XGR-rdb2rdf-20090126/
(2) http://www.w3.org/2005/Incubator/rdb2rdf/RDB2RDF_SurveyReport.pdf

Sept 2012

Sept 2009


WD
R2RML+DM
FPWD
DM

FPWD
R2RML

250

WD
R2RML+DM

Candidate Rec
R2RML + DM Proposed Rec
R2RML + DM
WD
R2RML + DM

200

150

100

50

First F2F
@Semtech 2010


Photo from cygri http://www.flickr.com/photos/cygri/4719458268/

Oct-12

Sep-12

Aug-12

Jul-12

Jun-12

May-12

Apr-12

Mar-12

Feb-12

Jan-12

Dec-11

Nov-11

Oct-11

Sep-11

Aug-11

Jul-11

Jun-11

May-11

Apr-11

Mar-11

Feb-11

Jan-11

Dec-10

Nov-10

Oct-10

Sep-10

Aug-10

Jul-10

Jun-10

May-10

Apr-10

Mar-10

Feb-10

Jan-10

Dec-09

Nov-09

Oct-09

Sep-09

0

Rec
R2RML + DM

Statistics
• 206 Actions
• 78 Issues
– 61 Closed
– 17 Postponed

• public-rdb2rdf-wg
– 3393 emails (Sept 2009 – Oct 2012)

• public-rdb2rdf-comments
– 200 emails (Sept 2009 – March 2013)

How to include relational data in a
semantic application?
• Many architectural design choices.
• Technology Development Fluid.

• No established “best-of-breed” sol’n.


Feature Space of Design Choices
• Scope of the application
– Mash-up topic page
– Heterogeneous Enterprise Data Application

• Size of the (native) database
– Data Model
– Contents

• Size of the useful (in application) database
– Data Model
– Contents

• When to translate the data?
– Wrapper
– ETL


Reduction to 4 Scenario’s


Scenario 1: Direct Mapping
Suppose:
• Database of Chinese Herbal Medicine and Applicable Conditions
– Database is static.
– Herbs and conditions do not have representation in western medical ontologies.


Suppose:
– Herbs and conditions do not have representation in western medical ontologies.

SPARQL
Relational
Database

Extract

Direct
Mapping
Engine

Triplestore

Transform


Load

Suppose:
SPARQL
Relational
Database

Extract

Direct
Mapping
Engine

Triplestore

Transform

Load

Then:
• Existing table and column names are encoded into URIs
• Data is translated into RDF and loaded into an existing, Internet
accessible triplestore.

Scenario 2: R2RML
Suppose:
+ Clinical Records
– Also have, patient names, demographics, outcomes


Scenario 2: R2RML
Suppose:
+ Clinical Records
Domain
Ontologies
(e.g FOAF, etc)

SPARQL
R2RML
Mapping
Engine

R2RML
File

Extract

Triplestore

Transform

Relational
Database

Load

Scenario 2: R2RML
•

Database of Chinese Herbal Medicine and Applicable Conditions
+ Clinical Records

Domain
Ontologies
(e.g FOAF, etc)

SPARQL
R2RML
Mapping
Engine

R2RML
File

Extract

Triplestore

Transform

Load

Relational
Database

•

Then:
– Developer says, “I know FOAF, I’ll write some R2RML and that data will have
canonical URIs, and people will be able to use the data”.


Scenario 4: Automatic Mapping
Suppose:
•
•
•
•

Database of Electronic Medical Records
Application, integration of all of a hospitals IT systems
Database has 100 tables and a total of 7,000 columns
Use of existing ontologies as a unifying data model
– ICDE10 codes (> 12,000 concepts)
– SNOMED vocabulary (> 40,000 concepts)


Suppose:
• 7,000 Columns
• Use of existing ontologies as a
unifying data model

Then:
• Convert the database schema and
data to an ontology.
SPARQL
• Apply ontology alignment program RDF

Automatic
Mapping

Domain
Ontologies
Source
Putative
Ontology

Refined
R2RML

Direct
Mapping as
Ontology
RDB2RDF Wrapper
Relational Database


Suppose:
• 7,000 Columns
• Use of existing ontologies as a
unifying data model

Then:
• A semantic system implements the
solution with no human labor
SPARQL
RDF

Automatic
Mapping

Domain
Ontologies
Source
Putative
Ontology

Refined
R2RML
Direct
Mapping as
Ontology
RDB2RDF Wrapper
Relational Database


Scenario 3: Semi-automatic Mapping
Domain
Ontologies

SemiAutomatic
Mapping

Refined
R2RML

Source
Putative
Ontology

Direct
Mapping as
Ontology

SPARQL
RDF

RDB2RDF
Wrapper
Relational
Database

W3C RDB2RDF Standards
• Standards to map relational data to RDF
• A Direct Mapping of Relational Data to RDF
– Default automatic mapping of relational data to
RDF

• R2RML: RDB to RDF Mapping Language
– Customizable language to map relational data to
RDF

Direct Mapping

Relational
Database

Direct
Mapping
Engine

Input:
Database (Schema and Data)
Primary Keys
Foreign Keys


RDF

Output
RDF graph

Direct Mapping Result
25

Alice

Person
ID NAME

<Person#NAME>

AGE

Alice

<Person#AGE>

<Person#NAME>

CID

1

Alice

25

100

2

Bob

NULL

100

City

<Person/ID=1>

<Person/ID=2>

<Person#ref-CID>

CID

NAME

100

Austin

200

Madrid

<Person#ref-CID>

<City/CID=100>
<City/CID=200>

<Person#NAME>
<Person#NAME>

Austin
Madrid

R2RML
OWL
Ontologies
(e.g FOAF, etc)

R2RML
File

R2RML
Mapping
Engine

Relational
Database


RDF

Direct Mapping as R2RML
@prefix rr: <http://www.w3.org/ns/r2rml#> .
<TriplesMap1>
a rr:TriplesMap;
rr:logicalTable [ rr:tableName ”Person”];
rr:subjectMap [
rr:template "http://www.ex.com/Person/ID={ID}";
rr:class <http://www.ex.com/Person>
];
rr:predicateObjectMap [
rr:predicate <http://www.ex.com/Person#NAME> ;
rr:objectMap [rr:column ”NAME" ]
].

Customized R2RML
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
<TriplesMap1>
a rr:TriplesMap;

rr:subjectMap [
rr:template "http://www.ex.com/Person/{ID}";
rr:class foaf:Person
];
rr:predicate foaf:name;
]
.

<TriplesMap1>
a rr:TriplesMap;
rr:logicalTable [ rr:tableName”Person" ];
rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}";
rr:class foaf:Person ];
rr:predicate
foaf:based_near ;
rr:objectMap
[
rr:parentTripelMap <TripleMap2>;
rr:joinCondition [
rr:child “CID”;
rr:parent “CID”;
]
]
<TriplesMap2>
]
a rr:TriplesMap;
.
rr:logicalTable [ rr:tableName ”City" ];
rr:subjectMap [ rr:template "http://ex.com/City/{CID}";
rr:class ex:City ];
rr:predicate
foaf:name;
rr:objectMap
[ rr:column ”TITLE" ]
]
.

R2RML View
<TriplesMap1>
a rr:TriplesMap;
rr:logicalTable [ rr:sqlQuery
“””SELECT ID, NAME
FROM Person WHERE gender = “F” “””];
rr:subjectMap [
rr:class <http://www.ex.com/Woman>
];
]
.

Questions
Next: Direct Mapping

RDB2RDF Tutorial
Direct Mapping

Juan F. Sequeda
Daniel P. Miranker

Barry Norton

Direct Mapping

Relational
Database

Direct
Mapping
Engine

RDF

Completely Automatic

40

W3C Direct Mapping
• Input:
– Database (Schema and Data)
– Primary Keys
– Foreign Keys

• Output
– RDF graph

41

What do we need to automatically
generate?
• Generate Identifiers
– IRI
– Blank Nodes

• Generate Triples
– Table
– Literal
– Reference

Generating Identifiers
• Identifier for rows, tables, columns and foreign
keys
• If a table has a primary key,
– then the row identifier will be an IRI,
– otherwise a blank node

• The identifiers for table, columns and foreign keys
are IRIs
• IRIs are generated by appending to a given base
IRI
• All strings are percent encoded

Row Node
Base IRI

“Table Name”/“PK attr”=“PK value”

1) <http://www.ex.com/Person/ID=1>
Base IRI

“Table Name”/“PK attr”=“PK value”

2) <http://www.ex.com/Person/ID=1;SID=123>

3) Fresh Blank Node

More IRI
Base IRI

“Table Name”

1) <http://www.ex.com/Person>
Base IRI

“Table Name”#“Attribute”

2) <http://www.ex.com/Person#NAME>
Base IRI

“Table Name”#ref-“Attribute”

3) <http://www.ex.com/Person#ref-CID>

Table Triple
Person
ID (pk)

NAME

AGE

1

Alice

25

2

Bob

NULL

<http://www.ex.com/Person/ID=1>
rdf:type
<http://www.ex.com/Person>
46

Literal Triples
Person
ID (pk)

NAME

AGE

1

Alice

25

2

Bob

NULL

<http://www.ex.com/Person#NAME>
“Alice” .
47

Reference Triples
City

Person
ID
(pk)

AGE

CID
(fk)

CID
(pk)

NAME

TITLE

1

Alice

25

100

100

Austin

2

Bob

NULL

200

200

Madrid

<http://www.ex.com/Person#ref-CID>

<http://www.ex.com/City/CID=100>.
48

Direct Mapping Result
25

Alice

Person
ID NAME

<Person#NAME>

AGE

Alice

<Person#AGE>

<Person#NAME>

CID

1

Alice

25

100

2

Bob

NULL

100

City

<Person/ID=1>

<Person/ID=2>

<Person#ref-CID>

CID

NAME

100

Austin

200

Madrid

<Person#ref-CID>

<City/CID=100>
<City/CID=200>

<Person#NAME>
<Person#NAME>

Austin
Madrid
49

Summary: Direct Mapping
• Default and Automatic Mapping
• URIs are automatically generated
–
–
–
–

<table>
<table#attribute>
<table#ref-attribute>
<Table#pkAttr=pkValue>

• RDF represents the same relational schema
• RDF can be transformed by
SPARQL CONSTRUCT
– RDF represents the structure and ontology of mapping
author’s choice
50

What else is missing?
• Relational Schema to OWL is *not* in the
W3C standard
• NULL values
• Many-to-Many relationships (binary tables)

• “Ugly” IRIs
51

NULL
“The direct mapping does not generate triples
for NULL values. Note that it is not known how
to relate the behavior of the obtained RDF graph
with the standard SQL semantics of the NULL
values of the source RDB.”
A Direct Mapping of Relational Data to RDF.
W3C Recommendation
52

Problem
1. How can a relational database schema and
data, be automatically mapped to OWL and
RDF?

2. How can we assure correctness of mapping?

53

Product
ptID

label

prID

10

ACME Inc

4

11

FooBars

String

5

String

pt:Producer

pt:label

ex:Producer

ex:Product

String
rdf:type

rdf:type

Producer
prID

title

loc

4

Foo

5

Bar

pt:label

pr:title

pt:Producer

TX
NULL

FooBars

Input
• Relational Schema R
• Set Σ of Primary Keys PK and
Foreign Keys FK over R
• Instance I of R

ex:Product11

Mapping

ex:Producer5

Bar

Output
• RDF graph
• OWL ontology as a graph

We need to be careful about two issues
• Binary Relations
• NULLs
54

NULLs
• What should we do with NULLs?
– Generate a Blank Node

title

loc

4
Bar

prID

Foo

TX

5

Bar

NULL

ex:Producer5
_:a

– Don’t generate a triple
pr:title
ex:Producer5

Bar

How do we
reconstruct the
NULL?

55

Direct Mapping Properties
• Fundamental Properties
– Information Preserving: no information is lost
– Query Preserving: no query is lost

• Desirable Properties
– Monotonicity
– Semantics Preserving:

Information Preservation
Direct Mapping

RDB
Inverse Direct Mapping

57

Query Preservation
Result of Q

RDB

=

Result of Q*

Direct Mapping

58

Monotonicity
New Data

Direct Mapping

RDB

subset

RDB

subset

Direct Mapping

59

Semantics Preservation
RDB

RDB

Direct Mapping

Direct Mapping

60

The Nugget
• Defined a Direct Mapping DM
• Formally defined semantics using Datalog
• Considered RDBs that may contain NULL values
• Studied DM wrt 4 properties
–
–
–
–

Query Preservation
Monotonicity

Sequeda, Arenas & Miranker. On Directly Mapping Relational Databases to RDF and OWL. WWW 2012
Sequeda et. al. Survey of Directly Mapping SQL Databases to the Semantic Web. J KER 2011
62
Tirmizi, Sequeda & Miranker. Translating SQL Applications to the Semantic Web. DEXA 2008

Direct Mapping
Input: A relational schema R a set of Σ of
primary keys and foreign keys and a database
instance I of this schema
Output: An RDF Graph
Definition:
A direct mapping M is a total function from the
set of all (R, Σ, I) to the set of all RDF graphs
63

The Direct Mapping DM
• Relational Schema to OWL
– S.H. Tirmizi, J.F. Sequeda and D.P. Miranker.
Translating SQL Applications to the Semantic Web.
DEXA 2008

• Relational Data to RDF
– M. Arenas, A. Bertails, E. Prud’hommeaux and J.F.
Sequeda. A Direct Mapping of Relational Data to
RDF. W3C Recommendation. 27 September 2012

64

Direct Mapping RDB to RDF and OWL

R, Σ
I

Predicates to
store (R, Σ, I)

Datalog Rules
to generate
O from R, Σ

Predicates to
Store Ontology O

Datalog Rules
to generate
OWL from O

Datalog Rules
to generate
RDF from O and I

OWL

RDF

65

Running Example
Consider the following relational schema:
– person(ssn, name, age) : ssn is the primary key
– student(id, degree, ssn) : id is the primary key,
ssn is a foreign key to ssn in person

Consider the following instance:
person

student
id

degree

ssn

ssn

name

age

1

Math

789

123

Juan

26

2

EE

456

456

Marcelo

27

3

CS

123

789

Daniel

NULL
66

Input: Relational Schema
student

• Rel(r) :
– Rel(student)

• Attr(a, r) :

id

degree

ssn

1

Math

789

2

EE

456

3

CS

123

– Attr(degree, student)

• PKn(a1, … , an, r) :
– PK1(id, student)

• FKn(a1, … , an, r, b1, … , bn, s) :
– FK1(ssn, student, ssn, person)
67

Input: Instances
student

• Value(v, a, t, r)
–
–
–
–
–
–
–
–
–

Value( 1, id, t1, student)
Value( Math, degree, t1, student)
Value( 789, ssn, t1, student)
Value( EE, degree, t2, student)
Value( CS, degree, t3, student)

id

degree

ssn

1

Math

789

2

EE

456

3

CS

123

68

Mapping to OWL
Triple(http://ex.org/person, rdf:type, owl:Class)

Triple(U,"rdf:type","owl:Class") ← Class(R), ClassIRI(R, U)
ClassIRI(R, X) ← Class(R), Concat2(base, R, X)
Class(X) ← Rel(X), ¬IsBinRel(X)

IsBinRel(X) ← BinRel(X, A, B, S, C, T, D)
BinRel(R, A, B, S, C, T, D) ←
PK2(A, B, R), ¬ThreeAttr(R), FK1(A,R,C,S),R ≠ S, FK1(B,R,D,T),R ≠ T,
¬TwoFK(A, R), ¬TwoFK (B, R), ¬OneFK(A, B, R), ¬FKTo(R)
69

Mapping to RDF
Table triples: for each relation, store the tuples
that belongs to it
Triple(http://ex.org/person#ssn=123,
rdf:type, http://ex.org/person)

70

Mapping to RDF
Table triples: for each relation, store the tuples
that belongs to it
Triple(http://ex.org/person#ssn=123 ,
rdf:type, http://ex.org/person )

Literal triples: for each tuple, store the values in
each of its attributes
Triple(http://ex.org/person#ssn=123 ,
http://ex.org/person#name , “Juan”)
71

Mapping to RDF
Reference triples: store the references generated by
the FKs
Triple(http://ex.org/student#id=3 ,
http://ex.org/student,person#ssn,ssn ,
http://ex.org/person#ssn=123 )

72

Mapping to RDF
Triple(http://ex.org/person#ssn=123 , http://ex.org/person#name , “Juan”)

Triple(U,V, W) ← DTP(A,R), Value(W, A, T, R), W != NULL.
TupleID(T,R,U), DTP_IRI(A,R,V)
DTP_IRI(A, R, X) ← DTP(A,R) , Concat4(base, R,”#”, A, X)

DTP(A,R)  Attr(A,R), ¬IsBinRel(X)
TupleID(T, R, X)  Class(R), PKn(A1, …, An, R),
Value(V1, A1, T, R), …, Value(Vn, An, T, R),
RowIRIn(V1, …, Vn, A1, …, An, T, R, X)
73

M(R, Σ, I)

R, Σ
I
M- (M(R, Σ, I))
Theorem: The Direct Mapping is information preserving
Proof: Provide a computable mapping M74

Relational Algebra tuples vs.
SPARQL mappings
person
ssn
789

name
Daniel

age
NULL

t.ssn = 789
t.name = Daniel
t.age = NULL

Then, tr(t) = μ :
• Domain of μ is {?ssn, ?name}
• μ(?ssn) = 789
• μ(?name) = Daniel
75

Query Preservation
tr(eval(Q, I))

R, Σ
I

=

eval(Q*, M(R, Σ, I))

M(R, Σ, I)

Theorem: The Direct Mapping is query preserving
Proof: By induction on the structure of Q
Bottom-up algorithm for translating Q into Q*

76

Example of Query Preservation
πname, age( σdegree ≠ EE (student)

person)

person

student
id

degree

ssn

ssn

name

age

1

CS

789

123

Juan

26

2

EE

456

456

Marcelo

27

3

Math

123

789

Daniel

NULL
77


person)

SELECT ?id ?degree ?ssn
WHERE {
?x rdf:type <…/student>.
OPTIONAL{?x <…/student#id> ?id. }
OPTIONAL{?x <…/student#degree> ?degree. }
OPTIONAL{?x <…/student#ssn> ?ssn. }
}

student
id

degree

ssn

1

CS

789

2

EE

456

3

Math

123
78


person)

SELECT ?id ?degree ?ssn
WHERE {
FILTER(?degree != “EE” && bound(?degree) )
}
student
id

degree

ssn

1

CS

789

2

EE

456

3

Math

123
79

πname, age( σdegree ≠ EE(student)

person)

SELECT ?ssn ?name ?age
WHERE {
?x rdf:type <…/person>.
OPTIONAL{?x <…/person#ssn> ?ssn. }
OPTIONAL{?x <…/person#name> ?name. }
OPTIONAL{?x <…/person#age > ?age. }
}

person
ssn

name

age

123

Juan

26

456

Marcelo

27

789

Daniel

NULL
80

πname,age( σdegree ≠ EE(student)
SELECT ?name ?age{
{SELECT ?id ?degree ?ssn
WHERE {
FILTER(?degree != “EE” && bound(?degree) )
FILTER(bound(?ssn)}
}
{SELECT ?ssn?name ?age
WHERE {
?x rdf:type <…/person>.
OPTIONAL{?x <…/person#ssn> ?ssn. }
OPTIONAL{?x <…/person#name> ?name. }
OPTIONAL{?x <…/person#age > ?age. }
FILTER(bound(?ssn)}
}
}

person)

81

Monotonicity
R, Σ
I2
I1

M(R, Σ, I2)

I2

M(R, Σ, I1)

R, Σ
I1

M(R, Σ, I2)

M(R, Σ, I1)

Theorem: The Direct Mapping is monotone
Proof: All negative atoms in the Datalog rules refer to the schema,
where the schema is fixed.

82

Consistent under
OWL semantics

I satisfies Σ

R, Σ
I

M(R, Σ, I)

Not consistent under
OWL semantics

I does not satisfies Σ

R, Σ
I

M(R, Σ, I)

83

DM is not Semantics Preserving
person
ssn

Juan

name

123

Juan

123

DM(R, Σ, I)

123

person#ssn

#ssn=123

Marcelo

Marcelo

ssn is the PK

I does not satisfy Σ

however

DM(R, Σ, I) is consistent
under OWL semantics

Theorem: No monotone direct mapping is semantics preserving
Proof: By contradiction.
84

Extending DM for Semantics
Preservation
• Family of Datalog rules to determine violation
– Primary Keys
– Foreign Keys

• Non-monotone direct mapping
• Information Preserving
• Query Preserving
• Semantics Preserving
85

Summary
• The Direct Mapping DM
– Formally defined semantics using Datalog
– Consider RDBs that may contain NULL values
– Monotone, Information and Query Preserving

• If you migrate your RDB to the Semantic Web
using a monotone direct mapping, be
prepared to experience consistency when
what one would expect is inconsistency.
86

W3C Direct Mapping
• Only maps Relational Data to RDF
– Does not consider schema

• Monotone
• Not Information Preserving
– Because it does not direct map the schema

• Not Semantics Preserving

87

Questions?
Next: From Direct Mapping to R2RML

DM is not Semantics Preserving
PREFIX ex: <http://ex.org/>
PREFIX person: <http://ex.org/person#>
ex:person rdf:type owl:Class .
person:name rdf:type owl:DatatypeProperty ;
rdfs:domain ex:person .
person:ssn rdf:type owl:DatatypeProperty ;
rdfs:domain ex:person .

person
ssn

name

123

Juan

123

DM(R, Σ, I)

Marcelo

ssn is the PK

Juan
123

person#ssn

#ssn=123
Marcelo

I does not satisfy Σ

however

DM(R, Σ, I) is consistent
under OWL semantics
90

What about owl:hasKey
student

• Student/id=NULL, rdf:type Student
• Student/id=1, degree, math

id

degree

NULL Math

• owl:hasKey can not make me have a value

91

owl:hasKey
student

• Tuple 1
– Student/id=1, student#id, 1
– Student/id=1, degree, math

id

degree

1

Math

1

EE

• Tuple 2
– Student/id=1, degree, EE

• DM generate the same IRI Student/id=1 for two
different tuples. This does not violate owl:hasKey

92

owl:hasKey
student

• Tuple 1
– Student/id=1, degree, math

id

degree

1

Math

1

EE

• Tuple 2
– Student/id=1, degree, EE

• However, UNA works:
– Student/id=1 differentFrom Student/id=1
• However a new DM that generates IRIs based on tuple ids
– Owl:hasKey would work

93

Semantics Preserving DMpk
• Find violation of PK
• Create artificial triple that will generate
contradiction

94

Semantics Preserving DMpk+fk
• Find violation of FK
• Create artificial triple that will generate
contradiction

95

RDB2RDF Tutorial
From Direct Mapping to R2RML

Juan F. Sequeda
Daniel P. Miranker

Barry Norton

R2RML
OWL
Ontologies
(e.g FOAF, etc)

R2RML
File

R2RML
Mapping
Engine

RDF

Relational
Database

97

W3C R2RML
• Input
– Database (schema and data)
– Target Ontologies
– Mappings between the Database and Target
Ontologies in R2RML

• Output
– RDF graph

98

OWL
Ontologies
(e.g FOAF, etc)

R2RML
File

R2RML
Mapping
Engine

RDF

Relational
Database
Direct Mapping helps to “bootstrap”
99

25

Alice

Person
ID NAME

<Person#NAME>

AGE

Alice

<Person#AGE>

<Person#NAME>

CID

1

Alice

25

100

2

Bob

NULL

100

City

<Person/ID=1>

<Person/ID=2>

<Person#ref-CID>

CID

NAME

100

Austin

200

Madrid

<Person#ref-CID>

How can this be
represented as R2RML?

<City/CID=100>
<City/CID=200>

<Person#NAME>
<Person#NAME>

Austin
Madrid
100

<TriplesMap1>
a rr:TriplesMap;
rr:subjectMap [
];
].
101

<TriplesMap1>
a rr:TriplesMap;
rr:logicalTable [ rr:tableName ”Person”]; mapped?
Logical Table: What is being
rr:subjectMap [
SubjectMap: How to generate the Subject?
];
PredicateObjectMap: ”NAME" ]
rr:objectMap [rr:column How to generate the Predicate and Object?
].
102

Logical Table

<TriplesMap1>
a rr:TriplesMap;

What is being mapped?


rr:subjectMap [
];
]
.

103

Subject URI Template

<TriplesMap1>
a rr:TriplesMap;

Subject URI


rr:subjectMap [
];
]
<Subject URI> rdf:type <Class
.

URI>
104

Predicate URI Constant

<TriplesMap1>
a rr:TriplesMap;

Predicate URI

rr:subjectMap [
];
]
.

105

Object Column Value

<TriplesMap1>
a rr:TriplesMap;

rr:subjectMap [
];
]
.

Object Literal

106

“Ugly” vs “Cool” URIs
<http://www.ex.com/Person#NAME>
<http://www.ex.com/Person>

<http://www.ex.com/Person/1>

foaf:name
foaf:Person
107

Customization
<TriplesMap1>
a rr:TriplesMap;

Customized Subject URI

rr:subjectMap [
];
]
.

Customized Class

108

What if …
Person
ID

NAME GENDER

1

Alice

F

2

Bob

M

<Woman>
rdf:type

<Person/1>

foaf:name

Alice

R2RML View

SELECT ID, NAME
FROM Person
WHERE GENDER = "F"
109

R2RML View
<TriplesMap1>
a rr:TriplesMap;

Query instead of table

rr:subjectMap [
];

]
.

110

Quick Overview of R2RML
• Manual and Customizable Language
• Learning Curve

• Direct Mapping bootstraps R2RML
• RDF represents the structure and ontology of
mapping author’s choice
111

RDB2RDF Tutorial
R2RML

Juan F. Sequeda
Daniel P. Miranker

Barry Norton

Outline
•
•
•
•
•
•

Logical Tables: What is being mapped
Term Maps: How to create RDF terms
How to create Triples from a table
How to create Triples between two tables
Languages
Datatypes

R2RML Mapping
Input Database

R2RML Mapping

Logical Table
Logical Table = base table or view or SQL query
R2RML View = SQL Query

R2RML Mapping
Student
sid name

pid

1

Juan

100

2

Martin 200
Professor
pid

name

100 Dan
200 Marcelo

R2RML Mapping
ex:Student1 rdf:type ex:Student .
ex:Professor100 rdf:type ex:Professor .
ex:Student1 foaf:name “Juan”.
…

R2RML Mapping
• A R2RML Mapping M consists of a finite set TM
TripleMaps.
• Each TM ∈TM consists of a tuple
(LT, SM, POM)
– LT: LogicalTable
– SM: SubjectMap
– POM: PredicateObjectMap

• Each POM∈POM consists of a pair (PM, OM)*
– PM: PredicateMap
– OM: ObjectMap
* For simplicity

R2RML Mapping
• An R2RML Mapping is represented as an RDF
Graph itself.
• Associated RDFS schema
– http://www.w3.org/ns/r2rml

• Turtle is the recommended syntax

<TriplesMap1>
a rr:TriplesMap;
rr:subjectMap [
];
]
.
119

LogicalTable
• Tabular SQL query result that is to be mapped
to RDF
– rr:logicalTable

1. SQL base table or view
– rr:tableName

2. R2RML View
– rr:sqlQuery

<TriplesMap1>
a rr:TriplesMap;
rr:subjectMap [
];
]
.
121

<TriplesMap1>
a rr:TriplesMap;

rr:subjectMap [
];
]
.

RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013

How to create RDF terms that define
S, P and O?
• RDF term is either an IRI, a blank node, or a
literal
• Answer
1. Constant Value
2. Value in the database
a. Raw Value in a Column
b. Column Value applied to a template

TermMap
• A TermMap is a function that generates an
RDF Term from a logical table row.
• RDF Term is either a IRI, or a Blank Node, or a
Literal
RDF Term
TermMap
Logical Table Row

IRI

Bnode
Literal

TermMap
• A TermMap must be exactly on of the
following
– Constant-valued TermMap
– Column-valued TermMap
– Template-valued TermMap

• If TermMaps are used to create S, P, O, then
– 3 ways to create a subject
– 3 ways to create a predicate
– 3 ways to create an object

How many ways to create a Triple?
Ptemplate
Stemplate

PConstant
Pcolumn

Otemplate
Oconstant
Ocolumn
Otemplate
Oconstant Sconstant
Ocolumn
Otemplate
Oconstant
Ocolumn
Ptemplate
Scolumn

PConstant
Pcolumn

Ptemplate
PConstant
Pcolumn
Otemplate
Oconstant
Ocolumn
Otemplate
Oconstant
Ocolumn
Otemplate
Oconstant
Ocolumn

Otemplate
Oconstant
Ocolumn
Otemplate
Oconstant
Ocolumn
Otemplate
Oconstant
Ocolumn

Constant-valued TermMap
• A TermMap that ignores the logical table row
and always generates the same RDF term
• rr:constant
• Commonly used to generate constant IRIs as
the predicate

<TriplesMap1>
a rr:TriplesMap;
rr:subjectMap [
];
rr:predicateMap [rr:constant foaf:name ]
]
.
129

Column-valued TermMap
• A TermMap that maps a column value of a
column name in a logical table row
• rr:column
• Commonly used to generate Literals as the
object

<TriplesMap1>
a rr:TriplesMap;
rr:subjectMap [
];
]
.
131

Template-valued TermMap
• A TermMap that maps the column values of a
set of column names to a string template.
• A string template is a format that can be used
to build strings from multiple components.
• rr:template
• Commonly used to generate IRIs as the
subject or concatenate different attributes

<TriplesMap1>
a rr:TriplesMap;
rr:subjectMap [
];
]
.
133

Commonly used…
• … but any of these TermMaps can be used to
create any RDF Term (s,p,o). Recall:
– 3 ways to create a subject
– 3 ways to create a predicate
– 3 ways to create an object

• Template-valued TermMap are commonly used to
create an IRI for a subject, but can be used to
create Literal for an object.
• How to specify the term (IRI or Literal in this
case)?

TermType
• Specify the type of a term that a TermMap
should generate
• Force what the RDF term should be
• Three types of TermType:
– rr:IRI
– rr:BlankNode
– rr:Literal

<TriplesMap1>
a rr:TriplesMap;
rr:subjectMap [
];
rr:objectMap [
rr:template ”{FIRST_NAME} {LAST_NAME}”;
rr:termType rr:Literal;
]
]
.

136

<TriplesMap1>
a rr:TriplesMap;
rr:subjectMap [
rr:template ”person{ID}";
rr:termType rr:BlankNode;
];
]
.

137

TermType (cont…)
• Can only be applied to Template and Column
valued TermMap
• Applying to Constant-valued TermMap has no
effect
– i.e If the constant is an IRI, the term type is
automatically an IRI

TermType Rules
• If the Term Map is for a
1. Subject  TermType = IRI or Blank Node
2. Predicate  TermType = IRI
3. Object  TermType = IRI or Blank Node or Literal

TermType is Optional
• If a TermType is not specified then
– Default = IRI
– Unless it’s for an object being defined by a
Column-based TermMap or has a language tag or
specified datatype, then the TermType is a Literal

• That’s why if there is a template in an
ObjectMap, it will always generate an IRI,
unless a TermType to Literal is specified.

rr:objectMap [
rr:template ”{FIRST_NAME} {LAST_NAME}”;
]
]
rr:objectMap [
rr:template ”{FIRST_NAME} {LAST_NAME}”
]
]

rr:predicateMap [rr:constant ex:role ]
rr:objectMap [
rr:template ”http://ex.com/role/{role}”
]
]

141

Now we have the elements to
create Triples

Generating SPO
• TermMap that specifies what RDF term should
be for S, P, O
– SubjectMap
– PredicateMap
– ObjectMap

SubjectMap
•
•
•
•

SubjectMap is a TermMap
rr:subjectMap
Specifies what the subject of a triple should be
3 ways to create a subject
– Template-valued Term Map
– Column-valued Term Map
– Constant-valued Term Map

• Has to be an IRI or Blank Node

SubjectMap
• SubjectMaps are usually Template-valued
TermMap
• Use-case for Column-valued TermMap
– Use a column value to create a blank node
– URI exist as a column value

• Use-case for Constant-valued TermMap
– For all tuples: <CompanyABC> <consistsOf> <Dep{id}>

SubjectMap
• Optionally, a SubjectMap may have one or
more Class IRIs associated
– This will generate rdf:type triples

• rr:class

<TriplesMap1>
a rr:TriplesMap;
rr:subjectMap [
];
]
.

Optional

148

PredicateObjectMap
• A function that creates one or more predicateobject pairs for each logical table row.
• rr:predicateObjectMap
• It is used in conjunction with a SubjectMap to
generate RDF triples in a TriplesMap.
• A predicate-object pair consists of*
– One or more PredicateMaps
– One or more ObjectMaps or
ReferencingObjectMaps

<TriplesMap1>
a rr:TriplesMap;
rr:subjectMap [
];
rr:predicateMap [rr:constant foaf:name];
[rr:column ”NAME" ]
]
.

rr:objectMap

150

PredicateMap
• PredicateMap is a TermMap
• rr:predicateMap
• Specifies what the predicate of a triple should
be
• 3 ways to create a predicate

• Has to be an IRI

PredicateMap
• PredicateMaps are usually Constant-valued
TermMap
• Use-case for Column-valued TermMap
–…

• Use-case for Template-valued TermMap
–…

<TriplesMap1>
a rr:TriplesMap;
rr:subjectMap [
];
]
.
153

<TriplesMap1>
a rr:TriplesMap;
rr:subjectMap [
];

Shortcut!

]
.
154

Constant Shortcut Properties
• ?x rr:predicate ?y
• ?x rr:predicateMap [ rr:constant ?y ]
• ?x rr:subject ?y
• ?x rr:subjectMap [ rr:constant ?y ]
• ?x rr:object ?y
• ?x rr:objectMap [ rr:constant ?y ]

ObjectMap
•
•
•
•

ObjectMap is a TermMap
rr:objectMap
Specifies what the object of a triple should be
3 ways to create a predicate

• Has to be an IRI or Literal or Blank Node

ObjectMap
• ObjectMaps are usually Column-valued
TermMap
• Use-case for Template-valued TermMap
– Concatenate values
– Create IRIs

• Use-case for Constant-valued TermMap
– All rows in a table share a role

<TriplesMap1>
a rr:TriplesMap;
rr:subjectMap [
];
]
.
158

Example 1
• We now have sufficient elements to create a
mapping that will generate
– A Subject IRI
– rdf:Type triple(s)
Student
sid name

pid

1

Juan

100

2

Martin 200

TripleMap

@prefix ex: <http://example.com/ns/>.

Example 1
@prefix rr: <http://www.w3.org/ns/r2rml#>.
<#TriplesMap1>
rr:logicalTable [ rr:tableName ”Student”];
rr:subjectMap [
rr:template "http://example.com/ns/{sid}";
rr:class ex:Student;
].

Logical Table is a Table Name
SubjectMap is a
And it has one Class IRI

Example 2

Student
sid name

pid

1

Juan

100

2

Martin 200

TripleMap

ex:Student1 ex:name “Juan” .
ex:Student2 ex:name “Martin” .

Example 2
<#TriplesMap1>
rr:subjectMap [
];
rr:predicate ex:name;
rr:objectMap [ rr:column “name”];
].

PredicateMap which is a

SubjectMap is a

PredicateObjectMap

ObjectMap which is a

Example 3

Student
sid name

pid

1

Juan

100

2

Martin 200

TripleMap

ex:Student1 ex:comment “Juan is a Student” .
ex:Student2 ex:comment “Martin is a Student” .

Example 3
<#TriplesMap1>
rr:subjectMap [
];
rr:predicate ex:comment;
rr:objectMap [
rr:template “{name} is a Student”;
];
].


SubjectMap is a

PredicateObjectMap

TermType

Example 4

Student
sid name

pid

1

Juan

100

2

Martin 200

TripleMap

ex:Student1 ex:webpage <http://ex.com/Juan>.
ex:Student2 ex:webpage <http://ex.com/Martin>.

Example 4
<#TriplesMap1>
rr:subjectMap [
];
rr:predicate ex:webpage;
rr:objectMap [
rr:template “http://ex.com/{name}”;
];
].


SubjectMap is a

PredicateObjectMap

Note that there is not TermType

Example 5

Student
sid name

pid

1

Juan

100

2

Martin 200

TripleMap

ex:Student1 ex:studentType ex:GradStudent.
ex:Student2 ex:studentType ex:GradStudent.

Example 6
<#TriplesMap1>
rr:subjectMap [
];
rr:predicate ex:studentType;
rr:object ex:GradStudent ;
].


SubjectMap is a

PredicateObjectMap


RefObjectMap
• A RefObjectMap (Referencing ObjectMap)
allows using the subject of another TriplesMap
as the object generated by a ObjectMap.
• rr:objectMap
• A RefObjectMap defined by
– Exactly one ParentTripleMap, which must be a
TripleMap
– May have one or more JoinConditions

<TriplesMap1>
a rr:TriplesMap;
rr:predicate
foaf:based_near ;
rr:objectMap
[
rr:joinCondition [
rr:child “CID”;
]
]
<TriplesMap2>
]
a rr:TriplesMap;
.

RefObjectMap

rr:class ex:City ];
rr:predicate
foaf:name;
rr:objectMap
]
.

171

ParentTripleMap
• The referencing TripleMap
• rr:parentTriplesMap
<TriplesMap1>
a rr:TriplesMap;
rr:predicate
foaf:based_near ;
rr:objectMap
[
rr:joinCondition [
rr:child “CID”;
]
]
]
.

Parent TriplesMap

JoinCondition
• Join between child and parent attribuets
• rr:joinCondition
<TriplesMap1>
a rr:TriplesMap;
rr:predicate
foaf:based_near ;
rr:objectMap
[
rr:joinCondition [
rr:child “CID”;
]
]
]
.

JoinCondition

<TriplesMap1>
a rr:TriplesMap;
rr:predicate
foaf:based_near ;
rr:objectMap
[
rr:joinCondition [
rr:child “CID”;
]
]
<TriplesMap2>
]
a rr:TriplesMap;
.

RefObjectMap
Parent TriplesMap
JoinCondition

rr:class ex:City ];
rr:predicate
foaf:name;
rr:objectMap
]
.

174

JoinCondition
• Child Column which must
be the column name that
exists in the logical table
of the TriplesMap that
contains the
RefObjectMap
• Parent Column which
must be the column name
that exists in the logical
table of the
RefObjectMap’s Parent
TriplesMap.

<TriplesMap1>
a rr:TriplesMap;
...
rr:predicate foaf:based_near ;
rr:objectMap [
rr:joinCondition [
rr:child “CID”;
rr:parent “CID”;]
]
].

<TriplesMap2>
a rr:TriplesMap;
...
.

JoinCondition
• Child Query
– The Child Query of a
RefObjectMap is the
LogicalTable of the
TriplesMap containing the
RefObjectMap

• Parent Query
– The ParentQuery of a
RefObjectMap is the
LogicalTable of the Parent
TriplesMap

• If the ChildQuery and
ParentQuery are not
identical, then a
JoinCondition must exist

<TriplesMap1>
a rr:TriplesMap;
...
rr:predicate foaf:based_near ;
rr:objectMap [
rr:joinCondition [
rr:child “CID”;
rr:parent “CID”;]
]
].

<TriplesMap2>
a rr:TriplesMap;
...
.

Example 7
Student
sid name

pid

1

Juan

100

2

Martin 200
Professor
pid

name

100 Dan
200 Marcelo

R2RML Mapping
ex:Student1 ex:hasAdvisor ex:Professor100 .
ex:Student2 ex:hasAdvisor ex:Professor200

<#TriplesMap1>
rr:subjectMap [
];
rr:predicate ex:hasAdvisor;
RefObjectMap
rr:objectMap [
rr:parentTriplesMap <#TriplesMap2>;
Parent TriplesMap
rr:joinCondition [
rr:child “pid”;
JoinCondition
rr:parent “pid”;
]
]
<#TriplesMap2>
].
rr:logicalTable [ rr:tableName ”Professor”];
rr:subjectMap [
rr:template "http://example.com/ns/{pid}";
rr:class ex:Professor;
].

Languages
• TermMap with a TermType of rr:Literal may
have a language tag
• rr:language <#TriplesMap1>
rr:subjectMap [
];
rr:predicate ex:comment;
rr:objectMap [
rr:column “comment”;
rr:language “en”;
];
].

Student

sid

name

comment

1

Juan

Excellent Student

2

Martin

Wonderful student

ex:Student1 ex:comment “Excellent Student”@en .
ex:Student2 ex:comment “Wonderful Student”@en .

Issue with Languages
• What happens if language value is in the data?
ID

COUNTRY_ID

LABEL

LANG

1

1

United States

en

2

1

Estados Unidos

es

3

2

England

en

4

2

Inglaterra

es

ID

COUNTRY_ID

LABEL

LANG

1

1

United States

en

2

1

Estados Unidos

es

3

2

England

en

4

2

Inglaterra

es

?
ex:country1 rdfs:label “United States”@en .
ex:country1 rdfs:label “Estados Unidos”@es .
ex:country2 rdfs:label “England”@en .
ex:country2 rdfs:label “Inglaterra”@es .

Issue with Languages
• Mapping for each language
<#TripleMap_Countries_EN>
a rr:TriplesMap;
rr:logicalTable [ rr:sqlQuery """SELECT COUNTRY_ID, LABEL, LANG, FROM
COUNTRY WHERE LANG = ’en'""" ];
rr:subjectMap [
rr:template "http://example.com/country{COUNTRY_ID}"
];
rr:predicate rdfs:label;
rr:objectMap [
rr:column “LABEL”;
rr:language “en”;
];
].

Language Extension
• Single mapping for all languages
<#TripleMap_Countries_EN>
a rr:TriplesMap;
rr:logicalTable [ rr:tableName ”COUNTRY" ];
rr:subjectMap [
rr:template "http://example.com/country{COUNTRY_ID}"
];
rr:predicate rdfs:label;
rr:objectMap [
rr:column “LABEL”;
rrx:languageColumn “LANG”;
];
].

Column Value as Language

Datatypes
• TermMap with a TermType of rr:Literal
• TermMap does not have rr:language
<#TriplesMap1>
rr:subjectMap [
];
rr:predicate ex:startDate;
rr:objectMap [
rr:column “start_date”;
rr:datatype xsd:date;
];
].

Summary of Terminology
•
•
•
•
•
•
•
•
•
•
•
•

R2RML Mapping
Logical Table
Input Database
R2RML View
TriplesMap
Logical Table Row
TermMap
TermType
SubjectMap
PredicateObjectMap
PredicateMap
ObjectMap

•
•
•
•
•
•
•
•
•

RefObjectMap
JoinConditions
ChildQuery
ParentQuery
Language
Datatype

Questions?
Next: ETL and Musicbrainz

RDB2RDF Tutorial
ETL and Musicbrainz

Juan F. Sequeda
Daniel P. Miranker

Barry Norton

Context
RDF
Data Management

(RDB2RDF)

Wrapper
Systems

(ETL)

Native
Triplestores

Triplestores

RDBMS-backed
NoSQL
Triplestores
Triplestores
191

Extract – Transform – Load (ETL)

SPARQL

Relational
Database

RDB2RDF
Dump

Triplestore

Analysis &
Mining Module

Visualization
Module

RDFa

Data acquisition

LD Dataset

Access

Application

EUCLID Scenario

SPARQL
Endpoint
Publishing
Vocabulary
Mapping

Interlinking

Physical Wrapper

Integrated
Dataset

Cleansing

LD Wrapper

R2R Transf.

LD Wrapper

RDF/
XML
Streaming providers

Downloads

Musical Content

Metadata

Other content
193

W3C RDB2RDF

Data acquisition

LD Dataset

Access

SPARQL
Endpoint

Publishing

Integrated
Data in
Triplestore

Vocabulary
Mapping

• Task: Integrate data from
relational DBMS with
Linked Data

Interlinking

• Approach: map from
relational schema to
semantic vocabulary with
R2RML

R2RML
Engine

Cleansing

• Publishing: two
alternatives –
– Translate SPARQL into SQL
on the fly
– Batch transform data into
RDF, index and provide
SPARQL access in a
triplestore

Relational
DBMS

RDB2RDF

194

MusicBrainz Next Gen Schema
• artist
As pre-NGS, but
further attributes

• artist_credit
Allows joint credit

• release_group
Cf. ‘album’
versus:

• work
• release • track
• medium • tracklist • recording
https://wiki.musicbrainz.org/Next_Generation_Schema
RDB2RDF

195

Music Ontology
• MusicArtist
– ArtistEvent, member_of

• SignalGroup
‘Album’ as per Release_Group

• Release
– ReleaseEvent

•
•
•
•

Record
Track
Work
Composition

http://musicontology.com/
RDB2RDF

196

Scale
• MusicBrainz RDF derived via R2RML:

300M
Triples

lb:artist_member a rr:TriplesMap ;
rr:logicalTable [rr:sqlQuery
"""SELECT a1.gid, a2.gid AS band
FROM artist a1
INNER JOIN l_artist_artist ON a1.id =
l_artist_artist.entity0
INNER JOIN link ON l_artist_artist.link = link.id
INNER JOIN link_type ON link_type = link_type.id
INNER JOIN artist a2 on l_artist_artist.entity1 = a2.id
WHERE link_type.gid='5be4c609-9afa-4ea0-910b-12ffb71e3821'"""]
;
rr:subjectMap [rr:template "http://musicbrainz.org/artist/{gid}#_"]
;
rr:predicateObjectMap
[rr:predicate mo:member_of ;
rr:objectMap [rr:template
"http://musicbrainz.org/artist/{band}#_" ;
rr:termType rr:IRI]] .

197

Musicbrainz
• Musicbrainz Dumps:
– http://mbsandbox.org/~barry/

• Musicbrainz R2RML Mappings
– https://github.com/LinkedBrainz/MusicBrainz-R2RML

• 30 mins to generate 150M triples with Ultrawrap
– 8 Xeon cores, 16 GB Ram (2GB are usually free)
– Should be less but server was overloaded
– It use to be 8+ hours using D2RQ on a dedicated
machine

Musicbrainz Dump Statistics
(Lead) Table
area
artist
dbpedia
label
medium
recording
release_group
release
track
work

Triples
59798
36868228
172017
201832
18069143
11400354
3050818
9764887
75506495
1728955
156822527

Time (s)
2
423
13
3
163
209
31
151
794
20
1809

R2RML Class Mapping
• Mapping tables to classes is ‘easy’:
lb:Artist a rr:TriplesMap ;
rr:logicalTable [rr:tableName "artist"] ;
rr:subjectMap
[rr:class mo:MusicArtist ;
rr:template
"http://musicbrainz.org/artist/{gid}#_"] ;
[rr:predicate mo:musicbrainz_guid ;
rr:objectMap [rr:column "gid" ;
rr:datatype xsd:string]] .

RDB2RDF

200

R2RML Property Mapping
• Mapping columns to properties can be easy:
lb:artist_name a rr:TriplesMap ;
"""SELECT artist.gid, artist_name.name
FROM artist
INNER JOIN artist_name ON artist.name =
artist_name.id"""] ;
rr:subjectMap [rr:template
"http://musicbrainz.org/artist/{gid}#_"] ;
[rr:predicate foaf:name ;
rr:objectMap [rr:column "name"]] .

RDB2RDF

201

NGS Advanced Relations
• Major entities (Artist, Release Group, Track, etc.) plus
URL are paired
(l_artist_artist)
• Each pairing
of instances
refers to a Link
• Links have types
(cf. RDF properties)
and attributes

http://wiki.musicbrainz.org/Advanced_Relationship
RDB2RDF

202

Advanced Relations Mapping
• Mapping advanced relationships (SQL joins):
lb:artist_member a rr:TriplesMap ;
"""SELECT a1.gid, a2.gid AS band
FROM artist a1
INNER JOIN l_artist_artist ON a1.id = l_artist_artist.entity0
INNER JOIN link ON l_artist_artist.link = link.id
INNER JOIN artist a2 on l_artist_artist.entity1 = a2.id
WHERE link_type.gid='5be4c609-9afa-4ea0-910b-12ffb71e3821'"""] ;
rr:subjectMap [rr:template "http://musicbrainz.org/artist/{gid}#_"] ;
[rr:predicate mo:member_of ;
rr:objectMap [rr:template "http://musicbrainz.org/artist/{band}#_" ;
rr:termType rr:IRI]] .

RDB2RDF

203

Advanced Relations Mapping
• Mapping advanced relationships (SQL joins):
lb:artist_dbpedia a rr:TriplesMap ;
"""SELECT artist.gid,
REPLACE(REPLACE(url, 'wikipedia.org/wiki',
'dbpedia.org/resource'),
'http://en.',
'http://')
AS url
FROM artist
INNER JOIN l_artist_url ON artist.id = l_artist_url.entity0
INNER JOIN link ON l_artist_url.link = link.id
INNER JOIN url on l_artist_url.entity1 = url.id
WHERE link_type.gid='29651736-fa6d-48e4-aadc-a557c6add1cb'
AND url SIMILAR TO
'http://(de|el|en|es|ko|pl|pt).wikipedia.org/wiki/%'"""] ;
rr:subjectMap lb:sm_artist ;
[rr:predicate owl:sameAs ;
rr:objectMap [rr:column "url"; rr:termType rr:IRI]] .
RDB2RDF

204

SPARQL Example
• SPARQL versus SQL
ASK {dbp:Paul_McCartney mo:member dbp:The_Beatles}
SELECT …
INNER
INNER
INNER
INNER
INNER
INNER
INNER
INNER
INNER
INNER
INNER
INNER
WHERE

JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
JOIN
AND … AND … AND … AND …

RDB2RDF

205

For exercises, quiz and further material visit our website:

http://www.euclid-project.eu

Course

eBook

Other channels:

@euclid_project

EUCLID project

EUCLIDproject
206

RDB2RDF Tutorial
Wrappers

Juan F. Sequeda
Daniel P. Miranker

Barry Norton

Context
RDF
Data Management

(RDB2RDF)

Wrapper
Systems

(ETL)

Native
Triplestores

Triplestores

RDBMS-backed
NoSQL
Triplestores
Triplestores
209

Wrapper Systems
SQL

Relational
Database

SPARQL
RDB2RDF
Mapping

SQL
Results

RDF

SPARQL/RDF
Results
210

“Comparing the overall performance […] of
the fastest rewriter with the fastest
relational database shows an overhead for
query rewriting of 106%. This is an indicator
that there is still room for improving the
rewriting algorithms”
[Bizer and Schultz 2009]

Results of BSBM 2009

Larger numbers are better
http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/results/index.html

Results of BSBM 2009
100M Triple Dataset

Larger numbers are better

After March 2009, RDB2RDF systems have not
been compared to RDBMS
http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/results/index.html

Current rdb2rdf systems are not capable of
providing the query execution performance
required [...] it is likely that with more work
on query translation, suitable mechanisms
for translating queries could be developed.
These mechanisms should focus on
exploiting the underlying database system’s
capabilities to optimize queries and process
large quantities of structure data
[Gray et al. 2009]

“SPARQL is equivalent, from an
expressive point of you, to relational
algebra”
Angles & Gutierrez 2008

Problem
• How can SPARQL queries be efficiently
evaluated on a RDBMS?
• Hypothesis: Existing commercial relational
database already subsume optimizations for
effective SPARQL execution on relationally
stored data

219

Nugget
1. Defined architecture based on SQL Views
which allows RDBMS to do the optimization.
2. Identified two important optimizations that
already exist in commercial RDBMS.

Sequeda & Miranker. Ultrawrap: SPARQL Execution on Relational Data. Journal Web Semantics 2013
220

Ultrawrap
Compile Time
1. Translate SQL Schema
to OWL and Mapping
2. Define RDF Triples,
as a View

Run Time
3. SPARQL to SQL
translation
4. SQL Optimizer
creates relational
query plan

221

Creating Tripleview
• For every ontology element (Class, Object
Property and Datatype property), create a SQL
SELECT query that outputs triples
SELECT 'Product’+ptID as s, ‘label’ as p, label as o
FROM Product WHERE label IS NOT NULL
Product
ptID label

prID

S

P

O

1

ACME Inc 4

Product1 label

ACME Inc

2

Foo Bars

Product2 label

Foo Bars

5

222

Creating Tripleview

SELECT ‘Product’+ptID as s, prID as s_id, ‘label’ as p, label as o, NULL as o_id
FROM Product WHERE label IS NOT NULL
Product
ptID label

prID

S

S_id

P

O

O_id

1

ACME Inc 4

Product1

1

label

ACME Inc

NULL

2

Foo Bars

Product2

2

label

Foo Bars

NULL

5

223

Class RDF Triples
SELECT ‘Product’+ptID as s, prID as s_id, ‘rdf:type’ as p, ‘Product’ as o, NULL as o_id
FROM Product
S

S_id

P

O

O_id

Product1

1

rdf:type

Product

NULL

Product2

2

rdf:type

Product

NULL

Object Property RDF Triples
SELECT ‘Product’+ptID as s, ptID as s_id, ‘Product#Producer’ as p, ‘Producer’+prID as o,
prID as o_id FROM Product

S

S_id

P

O

O_id

Product1

1

Product#Producer

Producer4

4

Product2

2

Product#Producer

Producer5

5

Creating Tripleview (…)
• Create TripleViews (SQL View), which are
unions of the SQL SELECT query that have the
same datatype
CREATE VIEW Tripleview_varchar AS
SELECT ‘Product’+ptID as s, ptID as s_id, ‘label’ as p, label as o, NULL as o_id FROM Product
UNION ALL
SELECT ‘Producer’+prID as s, prID as s_id, ‘title’ as p, title as o, NULL as o_id FROM Producer
UNION ALL …
S

S_id

P

O

O_id

Product1

1

label

ACME Inc

NULL

Product2

2

label

Foo Bars

NULL

Producer4

4

title

Foo

NULL

Producer5

5

Ttitle

Bars

NULL

225

CREATE VIEW Tripleview_int AS
SELECT ‘Product’+ptID as s, ptID as s_id, ‘pnum1’ as p, pnum1 as o, NULL as o_id
FROM Product
UNION ALL
SELECT ‘Product’+ptID as s, ptID as s_id, ‘pnum2’ as p, pnum2 as o, NULL as o_id
FROM Product

S

S_id

P

O

O_id

Product1

1

pnum1

1

NULL

Product2

2

pnum1

3

NULL

Product1

1

pnum2

2

NULL

Product2

2

pnum2

3

NULL

SPARQL and SQL
• Translating a SPARQL query to a semantically
equivalent SQL query
SELECT ?label ?pnum1
WHERE{
?x label ?label.
?x pnum1 ?pnum1.
}

SQL on Tripleview



SELECT label, pnum1
FROM product
What is
the
Query
Plan?

SELECT t1.o AS label, t2.o AS pnum1
FROM tripleview_varchar t1, tripleview_int t2
WHERE t1.p = 'label' AND
t2.p = 'pnum1' AND
t1.s_id = t2.s_id

227

π t1.o AS label, t2.o AS pnum1

σp = ‘label’
Tripleview_varchar t1

σp = ‘pnum1’
Tripleview_int t2

CONTRADICTION
CONTRADICTION

U

U

π Product+’id’ AS s , ‘pnum2’ AS p, pnum2 AS o

π Producer+’id’ AS s , ‘title’ AS p, title AS o
σpnum2 ≠ NULL
π Product+’id’ AS s , ‘label’ AS p, label AS o
σpnum1 ≠ NULL
σtitle ≠ NULL
Product
σlabel ≠ NULL
Product
Product

Producer

228

Detection of Unsatisfiable Conditions
• Determine that the query result will be empty
if the existence of another answer would
violate some integrity constraint in the
database.
• This would imply that the answer to the query
is null and therefore the database does not
need to be accessed
Chakravarthy, Grant and Minker. (1990) Logic-Based Approach to Semantic Query Optimization.
229

π t1.o AS label, t2.o AS pnum1

π Product+’id’ AS s , ‘label’ AS p, label AS o


σlabel ≠ NULL

σpnum1 ≠ NULL

Product

Product

Join on the same table?  REDUNDANT

230

Self Join Elimination
• If attributes from the same table are projected
separately and then joined, then the join can
be dropped
Self Join Elimination of Projection
SELECT p1.label, p2.pnum1
FROM product p1, product p2
WHERE
p1.id = 1 and
p1.id = p2.id

SELECT label, pnum1
FROM product
WHERE
id = 1

Self Join Elimination of Selection
SELECT p1.id
FROM product p1, product p2
WHERE
p1.pnum1 >100 and
p2.pnum2 < 500 and
p1.id = p2.id

SELECT id
FROM product
WHERE
pnum1 > 100 and
pnum2 < 500

231

π label, pnum1
σlabel ≠ NULL AND pnum1 ≠ NULL
Product

232

Evaluation
• Use Benchmarks that stores data in relational
databases, provides SPARQL queries and their
semantically equivalent SQL queries
• BSBM - 100 Million Triples
• Barton – 45 million triples

Detection of
Unsatisfiable
Conditions
MYSQL
MSSQL

ORACLE
DB2

Self
Join
Elimination

✖
✔
✖
✔

✖
✖
✔
✔
234

Augmented Ultrawrap Experiment
• Implemented DoUC
– Hash predicate to SQL query
– Few LOC

SPARQL as Fast as SQL

Berlin Benchmark on 100 Million Triples on Oracle 11g using
Ultrawrap
237

Discussion
• Self join elimination
• Push Selects and Join Predicates
• Join Ordering

• Left Outer Join

RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013

Recommended

More Related Content

What's hot (20)

Similar to RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013 (20)

More from Juan Sequeda (20)

Recently uploaded (20)

RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013

Editor's Notes