Wikidata as a hub for the linked data cloud

Wikidata as a hub for the linked data cloudWikidata as a hub for the linked data cloud
TUTORIAL AT DCMI CONFERENCE, SEOUL, 2019-09-25TUTORIAL AT DCMI CONFERENCE, SEOUL, 2019-09-25
Tom Baker, Joachim Neubert, Andra Waagmeester
Slides (partitially) at https://jneubert.github.io/wd-dcmi2019/#/

OverviewOverview
Part 1: Using and querying Wikidata
Part 2: Wikidata as a linking hub
Part 3: Applications based on Wikidata
Part 4: Wikidata usage scenarios
Scenarios
Intro and details
Hands-on: Mix-n-match
Quality control tools and procedures
Wikidata community

Wikidata as linking hubWikidata as linking hub

The idea of linking hubsThe idea of linking hubs
Connect concepts via identifiers/URLs
Existing hubs: , , ...
Image by Jakob Voss
VIAF sameAs.org

Different linking propertiesDifferent linking properties
1. (datatype URL)
generic link to URL in the meaning of skos:exactMatch
2. : more than 4000 specialized properties (datatype external identifier)
exact match
Pxxxx

Examples for external identifiersExamples for external identifiers
GND / VIAF identifiers
geogaphical entities
proteins
Swedish cultural heritage objects
African plants
baseball players
TED conference speakers

Wikidata as a hub for the linked data cloud

Property definitionsProperty definitions
subject item for the property
examples
constraints on values, cardinality, etc.
: creates a clickable link for the ID
start at the property page, e.g., for the ISSN:
formatter URL
https://www.wikidata.org/wiki/Property:P236

Property DocumentationProperty Documentation

Beyond sameness - mapping relationsBeyond sameness - mapping relations
Wikidata external ids imply "sameness" of linked concepts
even with geographic names, other mapping relations are required in some
cases.
examples:
close matches, e.g., "Yugoslavia" (1918-1992) (Wikidata) ≅ "Yugoslavia (until
1990)" (STW)
related matches, e.g. a company and its founder

Mapping relation type (P4390)Mapping relation type (P4390)
introduced after a community discussion in October 2017
to be used as qualifier for external id entries
fixed value set - SKOS mapping relations (exact, close, broad, narrow, related
match)

EXAMPLE AT ITEMEXAMPLE AT ITEM ASSESSMENT CENTERASSESSMENT CENTER

How does that relate to the Linked Data model?How does that relate to the Linked Data model?
Internal data model and storage (Wikibase) is transformed to RDF for:
RDF dumps
Query Service

RDF linking from WikidataRDF linking from Wikidata
: linked data URI
e.g., , (vs. formatter URL
)
linked external RDF resources
plus ~950.000 relations to individual URIs
formatter URI for RDF resource
http://sws.geonames.org/$1/
https://www.geonames.org/$1
List of 130+ relationships to external RDF datasets
26+ million
exact match

Links in the RDF dumpsLinks in the RDF dumps
Output has full URLs to external resources, however with Wikidata-specific
properties:
This creates a hurdle for generic Linked Data browsers and tools - not even
is translated to skos:exactMatch
wd:Q123 wdt:P234 "External-ID" ;
wdtn:P234 <http://example.com/reference/External-ID>
exact
match

Federated SPARQL queriesFederated SPARQL queries
Example use case: GND authority has information about the
professions/occupations of people which is not known in Wikidata.
So get that information dynamically from a GND SPARQL endpoint.
Here, we are interested in economists, in particular.

From Wikidata to a remote endpointFrom Wikidata to a remote endpoint
From a remote endpoint to WikidataFrom a remote endpoint to Wikidata
<== not working currently
query to WDQS
query to GND endpoint

Several points for attentionSeveral points for attention
Direction and sequence of statements often matters for performance
To reach out from Wikidata, endpoints have to be ( )
In the other direction, access is normally not restricted
Some federated queries get extremely slow, when large sets of bindings exist before the remote
service is invoked
be sure to exclude variables bound to blank nodes ('unknown value' in Wikidata)
approved full list

Further reading on Wikidata/RDFFurther reading on Wikidata/RDF
( )
Critical comments/suggestions:
RDF dump format (documentation)
Waagmeester: Integrating Wikidata and other linked data sources -
Federated SPARQL queries more examples
Malyshev et al.: Semantic Technology Usage in Wikipedia’s
Knowledge Graph
Freire/Isaac: Technical usability of
Wikidatas linked data

Application process for a new propertyApplication process for a new property
Double-check, that the property does not already exist
Prepare a property proposal in the according section, e.g., Wikidata:Property
proposal/Authority control

Hints for getting it approved smoothlyHints for getting it approved smoothly
Clearly lay out the motivation and planned use for the property
Provide working examples (with the formatter URI you are suggesting)
Be responsive to comments

Wikidata as a universal linking hubWikidata as a universal linking hub
easy extensibility with new properties for external identifiers
immense fund of existing items, with the full set of SKOS mapping relations for
more or less exact mappings to these
immediate extensibility with new items

Linking content via Mix-n-MatchLinking content via Mix-n-Match

Mix-n-match is a widely used tool (by Magnus Manske) to link external databases,
catalogs, etc. to existing Wikidata items (or to create new ones).

Example list:Example list:
Newspapers and journalsNewspapers and journals
from the 20th Century Press Archivefrom the 20th Century Press Archive

Please navigate to our example catalog
Mix-n-match manual

TasksTasks
Login through Widar
In "Automatically matched" list:
Connect matching items
Remove non-matching entries
In "Unmatched" list:
Search for existing items
Create missing items ( )suggested properties

Supplementary materialSupplementary material

Item creation "on the go"Item creation "on the go"
With Mix-n-match "New item": rudimentary, no references
Custom list of prepared QuickStatements insert blocks ( from STW
Thesaurus for Economics - please don't mess with it, this is work in progress)
Workflow-wise, use same sequence for M-n-m input and prepared insert blocks
example

Recommendations for item creationRecommendations for item creation
Pay attention to (much more relaxed then
Wikipedias)
Explain your plan and ask for feedback in the
to make mass edits ( )
Source every statement ( )
Create input in
Check with a few statements, verify result
Run as batch, document input and batch URL
Wikidata's notability criteria
Wikidata project chat
Apply for a bot account example
hints
QuickStatements text format

Matching from WD to the external database entriesMatching from WD to the external database entries

Normally requires an endpoint for the external source, where you can search for
the labels, aliases or other data of Wikidata items
Insert statement for external id into Wikidata can be prepared for cut&paste or
even semi-automatic execution in QuickStatements
Some hints and linked code here

Import catalog data to Mix-n-MatchImport catalog data to Mix-n-Match

Prepare dataPrepare data
... as tab-separated table (one line per record) with three columns
1. identitfier
2. name
3. description
Input file for the example used earlier

Pay attention toPay attention to
description column: include everything useful for intelectual identification
order: the sequence may help structuring your workflow (e.g., most used entries
first)

Load data via web interfaceLoad data via web interface
... at https://tools.wmflabs.org/mix-n-match/import.php

Sync existing ids from WikidataSync existing ids from Wikidata

Quality control tools and proceduresQuality control tools and procedures
Perception: Anybody can edit anything - so Wikidata is no reliable source of
knowledge
Seen as a threat for information systems based on Wikidata
particularly by some large Wikipedias (e.g., the English one)
Basic policy to address this: Statements should be referenced

QA support for editorsQA support for editors
Contraint definition for properties
raise warnings during data input, when, e.g.
a format definition (ISBN, DOI etc.) is violated
a supposedly unique identifier is added to more than one item
generated lists of constraint violations (e.g. )
Constraints can be very helpful, but do not cover complex cases
ZDB ID format

More QA support for editorsMore QA support for editors
Additional reports can be created via SPARQL queries
Shape Expressions (ShEx) allow to define complex constraints and conformance
checks
ShEx Primer
How to get started with Shape Expressions in Wikidata?

Revision control and patrolingRevision control and patroling
Versioned edits and version control
Manual and tool supported vandalism prevention
Watchlists
Automated flagging of suspect edits (e.g., "new editor deleteing statements")
Technically very easy to revert edits
Semi-protection or protection of oftenly-vandalized items
Patroling

Automated tools for vandalism detectionAutomated tools for vandalism detection
Fighting to keep up with rate of human edits in Wikidata (multiple per second)
... requires reducing the manual workload, e.g. via
Objective Revision Evaluation Service ( )
and other rule-based and machine-learning tools
Wikidata Abuse Filter
ORES

Ongoing researchOngoing research
Heindorf et al.: Vandalism Detection in Wikidata
Sarabadani et al.: Building automated vandalism detection tools for
Wikidata

The Wikidata communityThe Wikidata community
Everybody can participate
No central "committee" or decision structure
Desisions are made via discussion and community consensus

Main entry point for all kind of discussions
Resolved discussions archived after 7 days -
Beginner's questions welcome (but please try to find the answer online before,
particularly in the , which has a search link to the help pages)
Compared to Wikipedia, the overall atmosphere is constructive (though
exceptions exist sometimes in some sub-communities)
English is the lingua franca, but a few questions show up in other languages,
and receive comments, too
Project chatProject chat
searchable
FAQ

User page and user talk pageUser page and user talk page
Introduce yourself - especially if you work in a professional context with
Wikidata ( )
Activate notifications to your email address
Be responsive to comments on your talk page
You can address other users on their talk page, too
example

Talk pages of propertiesTalk pages of properties
Questions on the use of a certain property
Suggestions for changes or enhancements of the property definition ( )
Consider adding properties you are interested in to your watchlist
example

Often a great source to find documentation about the community consensus in
certain fields
Many WikiProjects pages contain data structuring recommendations - see, e.g.,
for
Current WikiProjects on Wikidata
periodicals

Thank you - questions welcome!Thank you - questions welcome!
Joachim Neubert
j.neubert@zbw.eu
Jneubert on WD

Wikidata as a hub for the linked data cloud

Recommended

More Related Content

What's hot (20)

Similar to Wikidata as a hub for the linked data cloud (20)

More from Joachim Neubert (20)

Recently uploaded (20)

Wikidata as a hub for the linked data cloud