RDF Data Model and Query Languages: Sergio Tessaris
RDF Data Model and Query Languages: Sergio Tessaris
Sergio Tessaris
FOAF example
Sergio Tessaris
Sergio Tessaris
Sergio Tessaris
Introduction Building Blocks RDF Abstract Syntax RDF Vocabulary RDF Semantics RDF model theory Entailment Casting RDF into FOL Querying RDF Introduction Graph Patterns Query languages SPARQL
Sergio Tessaris RDF Data Model and Query Languages
Basic Concepts
Sergio Tessaris
Basic Concepts
RDF: language for representing information about resources (e.g. Metadata) information about things identied on the Web
identiable doesnt mean retrievable e.g. goods from an eShop, prices, availability, etc.
Sergio Tessaris
Basic Concepts
RDF: language for representing information about resources (e.g. Metadata) information about things identied on the Web
identiable doesnt mean retrievable e.g. goods from an eShop, prices, availability, etc.
Sergio Tessaris
Sergio Tessaris
Design Goals
having a simple data model having formal semantics and provable inference using an extensible URI-based vocabulary using an XML-based syntax supporting use of XML schema datatypes allowing anyone to make statements about any resource
Sergio Tessaris
RDF Statements
RDF is about making statements about resources E.g. Sergio Tessaris is the author of the web page http://www.inf.unibz.it/tessaris/index.html This can be stated as a property of the web page http://www.inf.unibz.it/tessaris/index.html has an author whose value is "Sergio Tessaris" RDF statements
subject: e.g. URL http://www.inf.unibz.it/tessaris/index.html predicate: e.g. property author object: e.g. string "Sergio Tessaris"
Sergio Tessaris
Identifying Resources
RDF identiers: Uniform Resource Identiers (URI) URIs, URLs, and URNs URL identies resources via a representation of their primary access mechanism URN URIs that are required to remain globally unique and persistent Example ftp://ftp.is.co.za/rfc/rfc1808.txt http://www.math.uio.no/faq/compression-faq/part1.html news:comp.infosystems.www.servers.unix telnet://melvyl.ucop.edu/ mailto:[email protected]
Sergio Tessaris RDF Data Model and Query Languages
Literals
RDF allows the use of values
strings numbers booleans
Literals are basically UNICODE strings plain just strings (w optional language tag) typed have associated datatype URI RDF literals and typing literals are not URIs
e.g. http://www.unicode.org and http://www.unicode.org are dierent
Sergio Tessaris
Sergio Tessaris
Sergio Tessaris
triple: two nodes (subject, object) connected by a labelled edge (predicate) set of triples: a labelled directed graph
Sergio Tessaris
Blank Nodes
RDF graphs may contain additional nodes arbitrary set of blank nodes (bnodes)
innite disjoint from URIs and literals
given two blank nodes it is possible to determine whether or not they are the same
Sergio Tessaris
Blank Nodes
RDF graphs may contain additional nodes arbitrary set of blank nodes (bnodes)
innite disjoint from URIs and literals
given two blank nodes it is possible to determine whether or not they are the same intuition: a bnode represents the existence of something
Sergio Tessaris
URIs and Literals have a global scope two equal URIs (Literals) always represent the same object
equal means that the two unicode strings are the same there are not contextual to the graph
Sergio Tessaris
URIs and Literals have a global scope two equal URIs (Literals) always represent the same object
equal means that the two unicode strings are the same there are not contextual to the graph
bnodes are contextual to the graph in which they appear . . . more to came on the role of bnodes
Sergio Tessaris
RDF Graphs
RDF triple
subject: RDF URI reference or a bnode predicate: RDF URI reference object: literal, RDF URI reference or a bnode
Sergio Tessaris
RDF Graphs
RDF triple
subject: RDF URI reference or a bnode predicate: RDF URI reference object: literal, RDF URI reference or a bnode
Sergio Tessaris
no literals as subject predicates are just URIs URIs are used for both resources (nodes) and predicates (edges) literals can be non well formed datatypes no complete information about any resource
Sergio Tessaris
On Bnodes
bnodes are dierent from other RDF terms starting from the syntax
Sergio Tessaris
On Bnodes
bnodes are dierent from other RDF terms starting from the syntax Graph equivalence Denition G , G are equivalent if there is a bijection M between the nodes of the two graphs, s.t.:
1 2 3 4
M maps bnodes to bnodes. M(lit) = lit for literals lit in G . M(uri) = uri for URI in G . s, p, o in G i the triple M(s), p, M(o) in G .
Sergio Tessaris
On Bnodes
bnodes are dierent from other RDF terms starting from the syntax Graph equivalence Denition G , G are equivalent if there is a bijection M between the nodes of the two graphs, s.t.:
1 2 3 4
M maps bnodes to bnodes. M(lit) = lit for literals lit in G . M(uri) = uri for URI in G . s, p, o in G i the triple M(s), p, M(o) in G .
still talking about (abstract) syntax nothing has been said on the actual semantics of RDF
introduced with an RDF vocabulary given by means of a Model Theory
how do you write/exchange RDF? in particular bnodes and literals several possibilities
N-Triples RDF/XML (normative) Turtle notation (subset of N3)
Sergio Tessaris
N-Triples
documents contain a set of assertions subject predicate object . URI references written out completely: <http://example.org/resource30> Literals as strings: plain "chat"@fr typed "<a></a>"^^<http://www.w3.org/2000/01/ rdf-schema#XMLLiteral> Bnodes: :anon
Sergio Tessaris
N-Triples Example
<http://www.inf.unibz.it/~tessaris/myfoaf.xml#me> <http://xmlns.com/foaf/0.1/title> "Dr" . <http://www.inf.unibz.it/~tessaris/myfoaf.xml#me> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <http://www.inf.unibz.it/~tessaris/myfoaf.xml#me> <http://xmlns.com/foaf/0.1/workplaceHomepage> <http://www.unibz.it/inf> . <http://www.inf.unibz.it/~tessaris/myfoaf.xml#me> <http://xmlns.com/foaf/0.1/homepage> <http://www.inf.unibz.it/~tessaris> . <http://www.inf.unibz.it/~tessaris/myfoaf.xml#me> <http://xmlns.com/foaf/0.1/mbox_sha1sum> "758128cdae69fd0fd9e880921d9f4b25259edbf5" . <http://www.inf.unibz.it/~tessaris/myfoaf.xml#me> <http://xmlns.com/foaf/0.1/name> "Sergio Tessaris" . <http://www.inf.unibz.it/~tessaris/myfoaf.xml#me> <http://xmlns.com/foaf/0.1/family_name> "Tessaris" . <http://www.inf.unibz.it/~tessaris/myfoaf.xml#me> <http://xmlns.com/foaf/0.1/givenname> "Sergio" . <http://www.inf.unibz.it/~tessaris/myfoaf.xml#me> <http://xmlns.com/foaf/0.1/phone> <tel:+39-0471-016-125> .
Sergio Tessaris
RDF/XML
Normative XML serialisation for RDF documents Use namespace abbreviations: e.g.
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:myfoaf="http://www.inf.unibz.it/~tessaris/myfoaf.xml">
Encode paths of the RDF graph There are several ways of encoding the same graph!
Sergio Tessaris
RDF/XML
document root is rdf:RDF rdf:Description to represent nodes predicates use the corresponding URI
<rdf:Description rdf:about="http://www.w3.org/TR/rdf-syntax-grammar"> <dc:title>RDF/XML Syntax Specification (Revised)</dc:title> </rdf:Description>
Sergio Tessaris
RDF/XML Example
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <foaf:Person rdf:ID="me"> <foaf:name>Sergio Tessaris</foaf:name> <foaf:title>Dr</foaf:title> <foaf:givenname>Sergio</foaf:givenname> <foaf:family_name>Tessaris</foaf:family_name> <foaf:mbox_sha1sum> 758128cdae69fd0fd9e880921d9f4b25259edbf5</foaf:mbox_sha1sum> <foaf:homepage rdf:resource="http://www.inf.unibz.it/~tessaris"/> <foaf:phone rdf:resource="tel:+39-0471-016-125"/> <foaf:workplaceHomepage rdf:resource="http://www.unibz.it/inf"/> </foaf:Person> </rdf:RDF>
Sergio Tessaris RDF Data Model and Query Languages
Turtle Notation
Sergio Tessaris
Turtle Notation
Extension of N-Triples Compact representation of graphs I will use this notation, explaining it on the way
Sergio Tessaris
Turtle Example
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix : <http://www.inf.unibz.it/~tessaris/myfoaf.xm#> . :me rdf:type foaf:Person ; foaf:family_name "Tessaris" ; foaf:givenname "Sergio" ; foaf:homepage <http://www.inf.unibz.it/~tessaris> ; foaf:mbox_sha1sum "758128cdae69fd0fd9e880921d9f4b25259edbf5" ; foaf:name "Sergio Tessaris" ; foaf:phone <tel:+39-0471-016-125> ; foaf:title "Dr" ; foaf:workplaceHomepage <http://www.unibz.it/inf> .
Sergio Tessaris RDF Data Model and Query Languages
Sergio Tessaris
wheres the semantics? how to express rich semantic constructs? everything is going to be in RDF itself
Sergio Tessaris
wheres the semantics? how to express rich semantic constructs? everything is going to be in RDF itself URIs are global: RDF prescribes the meaning of same URIs
Sergio Tessaris
Sergio Tessaris
Sergio Tessaris
RDF Vocabulary
Classes
rdf:Property rdf:XMLLiteral
Properties
rdf:type
Reication
rdf:Statement rdf:subject, rdf:predicate, rdf:object
Sergio Tessaris
RDFS Vocabulary
Classes
rdfs:Resource rdfs:Class rdfs:Literal rdfs:Datatype
Properties
rdfs:range, rdfs:domain rdfs:subClassOf rdfs:subPropertyOf rdfs:label, rdfs:comment
Sergio Tessaris
monotonic
no closed-world assumptions no defaults
Sergio Tessaris
RDF Interpretations
Consider a set of terms T (URIs and Literals in a graph) Denition (Simple Interpretation) a simple interpretation I over T is composed by non-empty set of resources (domain of I)
set P (properties of I) PL , which contains all the plain literals in T
mapping V from URI references in T into P mapping L from typed literals in T into mapping E from P into the powerset of
Sergio Tessaris
RDF Interpretations
double level interpretation the key is in E literals are in the domain no bnodes in T
Sergio Tessaris
Triples Satisability
Sergio Tessaris
Triples Satisability
a triple s p o . is satised by I i
I(s), I(o) E(I(p))
Sergio Tessaris
Triples Satisability
a triple s p o . is satised by I i
I(s), I(o) E(I(p))
Bnodes
B(G ) is the set of bnodes in G A a mapping from B(G ) to IA is the extension of I with A (i.e. IA (b) = A(b))
Sergio Tessaris
Sergio Tessaris
Sergio Tessaris
Simple Entailment
Sergio Tessaris
Simple Entailment
Entailment is dened in terms of models G entails G i every model of G is a model of G bnodes are existential variables
I model of G if exists A s.t. IA satises all the triples in G I model of G if exists A s.t. IA satises all the triples in G
Sergio Tessaris
Simple Entailment
Entailment is dened in terms of models G entails G i every model of G is a model of G bnodes are existential variables
I model of G if exists A s.t. IA satises all the triples in G I model of G if exists A s.t. IA satises all the triples in G
Sergio Tessaris
RDF(S) Vocabulary
Semantic conditions e.g. x P i x, V(rdf:Property) E(V(rdf:type)) Axiomatic triples e.g. rdf:type rdf:type rdf:Property . RDF-MT doesnt cover reication, containers and collections
Sergio Tessaris
Sergio Tessaris
Sergio Tessaris
Sergio Tessaris
Sergio Tessaris
Sergio Tessaris
Sergio Tessaris
:p1 rdfs:range :c1 . :p2 rdfs:range :c2 . :p1 rdfs:subPropertyOf :p2 . implies? :c1 rdfs:subClassOf :c2 .
Sergio Tessaris
:p1 rdfs:range :c1 . :p2 rdfs:range :c2 . :p1 rdfs:subPropertyOf :p2 . implies? :c1 rdfs:subClassOf :c2 . only with extensional semantic conditions (not normative)
Sergio Tessaris
if x C(V(rdfs:Class)) if x C(V(rdfs:Datatype))
then then
Sergio Tessaris
RDF(S) Literals
Plain literals (RDFS) PL = C(V(rdfs:Literal))
Sergio Tessaris
RDF(S) Literals
Plain literals (RDFS) PL = C(V(rdfs:Literal)) aects logical implication :myself foaf:name "Sergio Tessaris" . entails :myself foaf:name :b1 . :b2 rdf:type rdfs:Literal .
Sergio Tessaris
RDF(S) Literals
Plain literals (RDFS) PL = C(V(rdfs:Literal)) aects logical implication :myself foaf:name "Sergio Tessaris" . entails :myself foaf:name :b1 . :b2 rdf:type rdfs:Literal . Note "Sergio Tessaris" rdf:type rdfs:Literal . is not a valid RDF(S) triple!
Sergio Tessaris RDF Data Model and Query Languages
Entailment Rules
a set of inference rules to capture RDF(S)-entailment rules complete an RDF graph add a triple if there is some pattern
rules application terminates in a polynomial number of steps
Sergio Tessaris
Entailment Rules
Lemma (Entailment Lemma) G rdf(s)-entails E i exists G derived from G with axiomatic triples using the entailment rules s.t. G simply entails E . simple entailment is enough eective procedure (needs simple entailment)
Sergio Tessaris
Grounding a graph G Def completed: added axiomatic triples and applied entailment rules Def Herbrand model: each bnode replaced by an URIs or Literal Def Canonical model (G ): bnodes replaced with fresh URIs
Sergio Tessaris
Graph entailment
Theorem RDF graphs entailment: G entails E i some herbrand model of E is a subgraph of the canonical model of G Connects our denition to W3C normative semantics Complexity of entailment
1 2 3
NP-complete in the size of the RDF graphs PTIME in the size of the entailing graph G PTIME if E is acyclic or ground
Sergio Tessaris
Sergio Tessaris
Sergio Tessaris
Def non-high order graph: no blank nodes as objects of rdf:type Def The classical logic translation FO(G ) of a non-high order graph G
URIs and literals are constants blank nodes are existentially quantied variables binary atomic formulas in correspondence with triples in G u1 , rdf:type, u2 triples introduce u2 (u1 ) atomic formulae
Theorem Given an RDF graph G and a non-high order graph E , G entails E i FO(G ) |=C FO(E )
Sergio Tessaris
Sergio Tessaris
entailment and query answering are strictly related an answer to a query is a set of entailed facts
tuples representing variable bindings complex structures like RDF graphs or XML documents it depends on the query language
Sergio Tessaris
Querying RDF
Sergio Tessaris
Querying RDF
Sergio Tessaris
Graph Patterns
Dataset: graph to be queried dene a new kind of graphs: RDF graph patterns
RDF graphs with variables
?x
foaf:nick
"Alice" .
Query Answering Find all the assignments for the variables that make the pattern a logical consequence of the Dataset
Sergio Tessaris RDF Data Model and Query Languages
Subgraph Matching
Sergio Tessaris
Subgraph Matching
can we query in an ecient way? entailment lemma: simple entailment is subgraph matching
Sergio Tessaris
Subgraph Matching
can we query in an ecient way? entailment lemma: simple entailment is subgraph matching subgraph matching is conjunctive query answering
Sergio Tessaris
Subgraph Matching
can we query in an ecient way? entailment lemma: simple entailment is subgraph matching subgraph matching is conjunctive query answering Dataset can be stored in a relational database
Sergio Tessaris
Subgraph Matching
can we query in an ecient way? entailment lemma: simple entailment is subgraph matching subgraph matching is conjunctive query answering Dataset can be stored in a relational database the answer is yes!
Sergio Tessaris
Subgraph Matching
can we query in an ecient way? entailment lemma: simple entailment is subgraph matching subgraph matching is conjunctive query answering Dataset can be stored in a relational database the answer is yes! nothing more than conjunctive queries using a single ternary predicate (e.g. triple(x, foaf:nick, "Alice"))
e.g. Oracle supports RDF and an extension to SQL most SPARQL implementations rely on a database back-end
Sergio Tessaris
SELECT t.r reviewer, e.emailid emailid FROM TABLE(RDF_MATCH( (?r ReviewerOf ?c) (?r rdf:type Faculty), RDFModels(reviewers), NULL, NULL)) t, employees e WHERE t.r = e.name;
Sergio Tessaris
SPARQL
query language for RDF becoming a W3C recommendation based on Graph Patterns
patterns extract a set of variable bindings (tuples)
Sergio Tessaris
SPARQL Example
BASE <http://example.org/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX ex: <properties/1.0#> SELECT DISTINCT ?person ?name ?age FROM <http://rdf.example.org/people.rdf> WHERE { ?person a foaf:Person ; foaf:name ?name. OPTIONAL { ?person ex:age ?age } . FILTER ! REGEX(?name, "Bob") } LIMIT 3 ORDER BY ASC[?name]
Sergio Tessaris
SPARQL Semantics
Answer set is a set of mappings from variables in Q to terms occurring in G Given an entailment |=E , a query pattern Q, an RDF graph G , then Q E -matches graph G with answer S if:
Sergio Tessaris
SPARQL Semantics
Answer set is a set of mappings from variables in Q to terms occurring in G Given an entailment |=E , a query pattern Q, an RDF graph G , then Q E -matches graph G with answer S if:
1
Sergio Tessaris
SPARQL Semantics
Answer set is a set of mappings from variables in Q to terms occurring in G Given an entailment |=E , a query pattern Q, an RDF graph G , then Q E -matches graph G with answer S if:
1 2
Sergio Tessaris
SPARQL Semantics
Answer set is a set of mappings from variables in Q to terms occurring in G Given an entailment |=E , a query pattern Q, an RDF graph G , then Q E -matches graph G with answer S if:
1 2 3
Sergio Tessaris
SPARQL Semantics
Answer set is a set of mappings from variables in Q to terms occurring in G Given an entailment |=E , a query pattern Q, an RDF graph G , then Q E -matches graph G with answer S if:
1 2 3
Sergio Tessaris
Sergio Tessaris
Wheres the problem? why not G |=E S(Q ) bnodes in answer sets
graph: :myself foaf:homepage "http://www.inf.unibz.it/" query: :b foaf:homepage ?x
Sergio Tessaris
Wheres the problem? why not G |=E S(Q ) bnodes in answer sets
graph: :myself foaf:homepage "http://www.inf.unibz.it/" query: :b foaf:homepage ?x answers: [ ?x/"http://www.inf.unibz.it/" ], [ ?x/ :myself]
Sergio Tessaris
which bnodes in answers? graph :myself foaf:name "Sergio Tessaris" . query ?x rdf:type rdfs:Literal answer
Sergio Tessaris
which bnodes in answers? graph :myself foaf:name "Sergio Tessaris" . query ?x rdf:type rdfs:Literal answer querying precompleted graph graph :myself foaf:name "Sergio Tessaris" . :b rdf:type rdfs:Literal answer {[ ?x/ :b ]}
Sergio Tessaris
which bnodes in answers? graph :myself foaf:name "Sergio Tessaris" . query ?x rdf:type rdfs:Literal answer querying precompleted graph graph :myself foaf:name "Sergio Tessaris" . :b rdf:type rdfs:Literal answer {[ ?x/ :b ]} in both cases :w rdf:type rdfs:Literal is a logical consequence
Sergio Tessaris
Conclusions
RDF has a precise (model theoretic) semantics Several drawbacks, but its a standard we have to deal with it! Strange semantics but relatively easy to deal with (up to RDFS) Extensions (e.g. OWL) require a more careful rethinking of the whole framework
Keeping backward compatibility Interoperability via SPARQL Higher order style semantics (cfg HiLog)
Sergio Tessaris