SPARQL
SPARQL
1. Use-Cases
a. Data Integration
Unifying data from disparate sources using shared vocabularies (e.g., FOAF, schema.org).
Used in Linked Open Data projects (e.g., DBpedia, Wikidata).
Aggregates structured data across domains like healthcare, finance, and public sector data.
Example:
Integrating patient data from hospitals and genomic data using common ontologies (e.g., SNOMED CT,
Gene Ontology).
b. Knowledge Graphs
Example:
Query all subsidiaries of a company and their CEO names in a corporate knowledge graph.
While not a replacement for SQL or big data engines, SPARQL supports analytical tasks:
Example:
窗体底端
a. Python
Libraries:
A. Python
sparql = SPARQLWrapper("https://query.wikidata.org/sparql")
sparql.setQuery("""
SELECT ?person ?personLabel WHERE {
?person wdt:P31 wd:Q5; wdt:P106 wd:Q937857. # human & data scientist
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 10
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
for result in results["results"]["bindings"]:
print(result["personLabel"]["value"])
b. Java
Libraries:
Apache Jena (most common): Offers model, query execution, and endpoint management
Java
SPARQL engines struggle with large-scale graph analytics compared to SQL engines.
Joins and complex patterns can be computationally expensive.
Caching and indexing are critical for performance but not uniformly supported.
d. Standardization Gaps
SPARQL 1.1 improved things, but features like full-text search, geospatial, and updates still vary between
engines.
Lack of transaction management or bulk updates in some SPARQL stores.
SPARQL 1.1 is the current official specification by the W3C (published in 2013), which extends SPARQL 1.0 and
provides much-needed features for practical querying and data manipulation.
Feature Description
Aggregates COUNT, SUM, AVG, MIN, MAX, GROUP BY, HAVING
Subqueries Nest SELECT queries inside other queries
Property Paths Query paths of arbitrary length: ?x :knows+ ?y
Updates INSERT, DELETE, DELETE/INSERT, LOAD, CLEAR, CREATE, DROP
Federated Queries SERVICE keyword to query remote SPARQL endpoints
Bindings BIND, VALUES for temporary variables and inline data
Negation MINUS, NOT EXISTS
🧪 Example: Aggregation
sparql
CopyEdit
SELECT ?person (COUNT(?book) AS ?bookCount)
WHERE {
?person :hasWritten ?book .
}
GROUP BY ?person
2. 🧭 SPARQL Extensions
While SPARQL 1.1 is powerful, some use cases (like geospatial or annotated data) require community or vendor-
specific extensions. Two major ones:
📍 a. GeoSPARQL
Developed by the Open Geospatial Consortium (OGC), GeoSPARQL extends SPARQL to support geospatial data
types and queries.
Use Cases:
Core Functions:
Example:
sparql
CopyEdit
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
SELECT ?place
WHERE {
?place geo:hasGeometry ?geom .
FILTER(geof:distance(?geom, "POINT(40.7128 -74.0060)"^^geo:wktLiteral, uom:metre) < 10000)
}
⭐ b. SPARQL-star / RDF-star
SPARQL-star extends SPARQL to allow querying statements about statements (i.e., reification made practical).
Use Cases:
Syntax:
Querying:
sparql
CopyEdit
SELECT ?who
WHERE {
<< ?s ?p ?o >> :source ?who .
}
Support:
Not a W3C standard yet, but implemented in RDF4J, Blazegraph, and Jena (experimental).
1. Wikidata
sparql
CopyEdit
SELECT ?person ?personLabel ?awardYear
WHERE {
?person wdt:P166 wd:Q38104; # Nobel Prize in Physics
wdt:P569 ?birthDate .
?awardStatement pq:P585 ?awardYear .
?person p:P166 ?awardStatement .
FILTER(YEAR(?awardYear) > 2000)
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?awardYear
LIMIT 10
sparql
CopyEdit
SELECT ?country ?countryLabel ?population
WHERE {
?country wdt:P31 wd:Q6256; # instance of country
wdt:P1082 ?population. # population
FILTER(?population > 100000000)
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC(?population)
2. DBpedia
sparql
CopyEdit
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbr: <http://dbpedia.org/resource/>
SELECT ?city ?population
WHERE {
?city a dbo:City ;
dbo:country dbr:Germany ;
dbo:populationTotal ?population .
FILTER(?population > 1000000)
}
ORDER BY DESC(?population)
sparql
CopyEdit
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbr: <http://dbpedia.org/resource/>
SELECT ?film
WHERE {
?film dbo:director dbr:Christopher_Nolan .
}
LIMIT 20
Example 5: Cities near New York City (geospatial filter with WKT)
sparql
CopyEdit
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
PREFIX gn: <http://www.geonames.org/ontology#>
sparql
CopyEdit
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
SELECT ?protein ?proteinLabel
WHERE {
?protein a up:Protein ;
up:organism taxon:9606 ; # Human
up:annotation ?ann .
?ann a up:Disease_Annotation ;
up:disease ?disease .
?disease rdfs:label ?proteinLabel .
FILTER(CONTAINS(LCASE(?proteinLabel), "cancer"))
}
LIMIT 10.