SlideShare a Scribd company logo
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 1/60
Preserving a
Web of Linked Data
Lessons and challenges from a fading Web
Miel Vander Sande
Ghent University – imec
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 2/60
There are many sides
to preservation.
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 3/60
Web of
Linked Data?
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 4/60
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 5/60
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 6/60
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 7/60
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 8/60
“
We are loosing thousands of Alexandria
libraries each day
We have lost so much of the early Web history, just
as we have lost so much of early Human history.
—Kalev H. Leetaru - University of Illinois
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 9/60
The forces of decay
Link Rot
Content Drift
Digital Preservation Business Case Toolkit http://wiki.dpconline.org/
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 10/60
Link Rot
Illustration by the Project Twins
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 11/60
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 12/60
Content Drift
Significant change in content
within a 3-Month Period
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 13/60
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 14/60
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 15/60
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 16/60
Yesterday: Web archiving strategies
Today: Tools for a Web of Linked Data
Tomorrow: Things to keep in mind
Preserving a Web of Linked Data
1
2
3
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 17/60
Yesterday: Web archiving strategies
Today: Tools for a Web of Linked Data
Tomorrow: Things to keep in mind
Preserving a Web of Linked Data
1
2
3
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 18/60
Strategies
Observational: perceived as discrete
Snapshot
Web archive
Historical: perceived as continuous
Versioning systems
Transactional
Notification-based
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 19/60
Snapshot
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 20/60
Web archive
See: Open Wayback
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 21/60
Versioning systems
See: MediaWiki
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 22/60
Transactional
See: SiteStory apache plugin
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 23/60
If a representation
changes and nobody is
around to see it,
should it be archived?
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 24/60
Notification-based
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 25/60
Memento: travelling to the Web of the
Past
https://tools.ietf.org/html/rfc7089
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 26/60
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 27/60
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 28/60
Yesterday: Web archiving strategies
Today: Tools for a Web of Linked Data
Tomorrow: Things to keep in mind
Preserving a Web of Linked Data
1
2
3
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 29/60
Archive or
Archiving?
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 30/60
Linked Data archiving as the product
RDF indexes for versioning
Dydra, Virtuoso, XRDF3X, ...
Representations of versions, provenance & time:
PROV, LDPatch, LODE, ...
Technical
(Increasingly) Popular research tracks.
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 31/60
Linked Data archiving as the process
Some technological building blocks
Linked Data interfaces, change detection, publishing,
crawling & querying
Technical, as well as Infrastructural & Societal.
Rather unknown territory (but there are technologies).
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 32/60
What assumptions are there about data
evolution?
Historical Data
Provenance is a timeline.
Only truth can exist at the same time.
Timeseries databases, Wikipedia
Versioned Data
Provenance is a directed acyclic graph.
Multiple truths can exist at the same time.
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 33/60
Decay becomes more complex
Link Rot
Content Drift
Concept Drift
"Please don't change your vocabulary"
(Check out DRIFT-A-LOD workshop)
Problem in other domains as well (Machine Learning)
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 34/60
Study these issues within Linked Data
Link Rot
Subject or Object cannot be dereferenced
Dataset/Interface is gone
Content Drift
Context graph of Subject or Object has changed
Concept Drift
Predicate or Object change meaning
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 35/60
Archiving for the
Reproducibility of Query results
Sustain the validity of claims
Backwards compatibility of applications
Federated querying is highly affected
How to shape a decentralized Quality of Service?
The Hyperlink is the simplest form of decentralization,
which we are already failing to preserve.
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 36/60
Persistent Identification
Figure by Herbert Van de Sompel
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 37/60
Persistent Identification
Dependency on publisher registering the PIDs
Possible loss of connection between PIDs and the
original
Dependency on the PID provider
Possibly replacing one potential Link rot problem by
another
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 38/60
Who are you to tell me my URI is not
persistent?
ISWC Resources track:
Consensus on and trust in persistence in a decentralized
Web:
community-driven? standardization? blockchain,...?
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 39/60
Robust links
<a href="B"
data-versionurl="URL of snapshot of B"
data-versiondate="datetime of snapshot of B">
http://robustlinks.mementoweb.org/spec/
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 40/60
Robust Links
Open Annotation
& Memento vocab
Can be linked
to PROV
Figure by Herbert Van de Sompel
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 41/60
Real-time data
Parallel truths
Open challenges with Memento
HTTP Datetime format is per second
No solution for accessing Versioned
Data
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 42/60
Who will be responsible for archiving?
Publisher
Snapshot
Versioning systems
3rd party
Traditional
Hybrid: Publisher and/or 3rd party
Transactional
Notification-based
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 43/60
Snapshot
Often "End of Term" archive (DBPedia version)
Exchangeable archives, eg. file-based HDT
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 44/60
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 45/60
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 46/60
Web
RDF
Versioning systems
Memento support can improve
depends on query expressivity
Significant progress in the RDF domain
MediaWiki
Storage: Dydra, Virtuoso, ...
Memento-supported publishing: DBpedia
Wayback machine, Linked Data Fragments
Server
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 47/60
Linked Data pages
Triple Patterns
Hybrid: Snapshot + Versioning
Discrete snapshots + index for continuous versions
Tailr, ...
Ostrich (offset-enabled), ...
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 48/60
Web archive
Not much in place yet
Indexes, but no notion of time
Sindice, LODCache, LODLaundromat
Many technologies
targeted crawling, sindice LODLaundromat, Linked Data
Crawling, ...
No guarantees on completeness
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 49/60
Transactional
Decentralized, sustainable solution
A challenge for completeness
Dependence on resource granularity
eg. SPARQL results or Linked Data pages?
Interested to see how far we would get...
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 50/60
Notification-based
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 51/60
Yesterday: Web archiving strategies
Today: tools for a Web of Linked Data
Tomorrow: things to keep in mind
Preserving a Web of Linked Data
1
2
3
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 52/60
Data archiving intrests more than curators
& activists
For instance, Data driven journalism.
Product: transparency of the editorial process
Process: interaction with users, public
Scolary communication, cultural heritage, legal
publications, community databases (Wikipedia &
Wikidata)
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 53/60
Archivability of Linked Data
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 54/60
Linked Data is in essence easier to archive.
Raw, self-contained data
Already machine processable/understandable
No obfuscation by client-side scripting
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 55/60
“
Accessibility of content to stimulate
archiving.
The content in HTML+RDFa that dokieli produces is
accessible (readable) without requiring any CSS or
JavaScript, ie. text-browser safe. Breaking this
"rule" in future development should be considered
an anti-pattern (or a bug) in dokieli.
—dokieli documentation, Sarven Capadisli
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 56/60
Intelligent Server
Intelligent Client
Choices in Linked Data interface
increase or decrease archiving.
High resource granularity
Data not as accessible
Need to participate in archiving process
data
dump
Triple Pattern
Fragments
SPARQL
endpoint
interface offered by the server
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 57/60
Prevent mistakes from the past in
standardization
Query interfaces: what can be archived?
Protocols: is it accessible?
Domain Modeling: can the semantics be preserved?
How to select the subgraph?
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 58/60
Yesterday: Web archiving strategies
Today: Tools for a Web of Linked Data
Tomorrow: Things to keep in mind
Preserving a Web of Linked Data
1
2
3
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 59/60
There are many sides
to preservation.
We don't start from scratch,
many technologies are there.
Start covering the uncovered sides.
Add archiving to the discussion.
03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web
https://mielvds.github.io/MEPDaW2018/#1 60/60
Preserving a Web of Linked
Data
Lessons and challenges from a fading Web
Miel Vander Sande
Ghent University – imec

More Related Content

Similar to Preserving a Web of Linked Data: Lessons and challenges from a fading web (20)

Oggcamp Fast and Beautiful Images
Oggcamp Fast and Beautiful ImagesOggcamp Fast and Beautiful Images
Oggcamp Fast and Beautiful Images
Doug Sillars
 
Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time Analytics
Arcadia Data
 
Readying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web BundlesReadying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web Bundles
Sawood Alam
 
Milano ux
Milano uxMilano ux
Milano ux
Doug Sillars
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Edureka!
 
Automated Time Series Analysis using Deep Learning, Ray and Analytics Zoo
Automated Time Series Analysis using Deep Learning, Ray and Analytics ZooAutomated Time Series Analysis using Deep Learning, Ray and Analytics Zoo
Automated Time Series Analysis using Deep Learning, Ray and Analytics Zoo
Jason Dai
 
How it works- Data Science
How it works- Data ScienceHow it works- Data Science
How it works- Data Science
Edureka!
 
Interacting with Linked Data to Facilitate its Sustainability
Interacting with Linked Data to Facilitate its SustainabilityInteracting with Linked Data to Facilitate its Sustainability
Interacting with Linked Data to Facilitate its Sustainability
Roberto García
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)
Trieu Nguyen
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)
Chris Dagdigian
 
Turin webperf meetup
Turin webperf meetupTurin webperf meetup
Turin webperf meetup
Doug Sillars
 
Introduction to Big Data Technologies
Introduction to Big Data TechnologiesIntroduction to Big Data Technologies
Introduction to Big Data Technologies
Big Data Engineering, Faculty of Engineering, Dhurakij Pundit University
 
May 2023 CIAOPS Need to Know Webinar
May 2023 CIAOPS Need to Know WebinarMay 2023 CIAOPS Need to Know Webinar
May 2023 CIAOPS Need to Know Webinar
Robert Crane
 
Shareable Metadata for Visual Resources
Shareable Metadata for Visual ResourcesShareable Metadata for Visual Resources
Shareable Metadata for Visual Resources
Jenn Riley
 
Hackference
HackferenceHackference
Hackference
Doug Sillars
 
Reading gdg images
Reading gdg imagesReading gdg images
Reading gdg images
Doug Sillars
 
Mobile App Performance, Firenze
Mobile App Performance, FirenzeMobile App Performance, Firenze
Mobile App Performance, Firenze
Doug Sillars
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
DATAVERSITY
 
Big Data analytics
Big Data analyticsBig Data analytics
Big Data analytics
ArunKumar5524
 
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Chris Bizer
 
Oggcamp Fast and Beautiful Images
Oggcamp Fast and Beautiful ImagesOggcamp Fast and Beautiful Images
Oggcamp Fast and Beautiful Images
Doug Sillars
 
Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time Analytics
Arcadia Data
 
Readying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web BundlesReadying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web Bundles
Sawood Alam
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Edureka!
 
Automated Time Series Analysis using Deep Learning, Ray and Analytics Zoo
Automated Time Series Analysis using Deep Learning, Ray and Analytics ZooAutomated Time Series Analysis using Deep Learning, Ray and Analytics Zoo
Automated Time Series Analysis using Deep Learning, Ray and Analytics Zoo
Jason Dai
 
How it works- Data Science
How it works- Data ScienceHow it works- Data Science
How it works- Data Science
Edureka!
 
Interacting with Linked Data to Facilitate its Sustainability
Interacting with Linked Data to Facilitate its SustainabilityInteracting with Linked Data to Facilitate its Sustainability
Interacting with Linked Data to Facilitate its Sustainability
Roberto García
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)
Trieu Nguyen
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)
Chris Dagdigian
 
Turin webperf meetup
Turin webperf meetupTurin webperf meetup
Turin webperf meetup
Doug Sillars
 
May 2023 CIAOPS Need to Know Webinar
May 2023 CIAOPS Need to Know WebinarMay 2023 CIAOPS Need to Know Webinar
May 2023 CIAOPS Need to Know Webinar
Robert Crane
 
Shareable Metadata for Visual Resources
Shareable Metadata for Visual ResourcesShareable Metadata for Visual Resources
Shareable Metadata for Visual Resources
Jenn Riley
 
Reading gdg images
Reading gdg imagesReading gdg images
Reading gdg images
Doug Sillars
 
Mobile App Performance, Firenze
Mobile App Performance, FirenzeMobile App Performance, Firenze
Mobile App Performance, Firenze
Doug Sillars
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
DATAVERSITY
 
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Chris Bizer
 

More from Miel Vander Sande (18)

20230525_mmc_seminar.pdf
20230525_mmc_seminar.pdf20230525_mmc_seminar.pdf
20230525_mmc_seminar.pdf
Miel Vander Sande
 
The Memento protocol
The Memento protocolThe Memento protocol
The Memento protocol
Miel Vander Sande
 
Slight change of plans!
Slight change of plans!Slight change of plans!
Slight change of plans!
Miel Vander Sande
 
PhD Defense: Metadata and Control Features for Low-Cost Linked Data Publishin...
PhD Defense: Metadata and Control Features for Low-Cost Linked Data Publishin...PhD Defense: Metadata and Control Features for Low-Cost Linked Data Publishin...
PhD Defense: Metadata and Control Features for Low-Cost Linked Data Publishin...
Miel Vander Sande
 
Reproducibility with 
the 99 cents Linked Data archive
Reproducibility with 
the 99 cents Linked Data archiveReproducibility with 
the 99 cents Linked Data archive
Reproducibility with 
the 99 cents Linked Data archive
Miel Vander Sande
 
Innovatiemarkt 2017: Machines are the new digital natives
Innovatiemarkt 2017: Machines are the new digital nativesInnovatiemarkt 2017: Machines are the new digital natives
Innovatiemarkt 2017: Machines are the new digital natives
Miel Vander Sande
 
A sweet affordable combo for Linked Data Archives
A sweet affordable combo for Linked Data ArchivesA sweet affordable combo for Linked Data Archives
A sweet affordable combo for Linked Data Archives
Miel Vander Sande
 
Machines are the new Digital Natives
Machines are the new Digital NativesMachines are the new Digital Natives
Machines are the new Digital Natives
Miel Vander Sande
 
Time travelling through DBpedia
Time travelling through DBpediaTime travelling through DBpedia
Time travelling through DBpedia
Miel Vander Sande
 
Opportunistic Linked Data Querying through Approximate Membership Metadata
Opportunistic Linked Data Querying through Approximate Membership MetadataOpportunistic Linked Data Querying through Approximate Membership Metadata
Opportunistic Linked Data Querying through Approximate Membership Metadata
Miel Vander Sande
 
Publish data as Time Consistent Web API based on Provenance (WS-REST 2014)
Publish data as Time Consistent Web API based on Provenance (WS-REST 2014)Publish data as Time Consistent Web API based on Provenance (WS-REST 2014)
Publish data as Time Consistent Web API based on Provenance (WS-REST 2014)
Miel Vander Sande
 
The Story behind Everything Is Connected: Multimedia narration of automatical...
The Story behind Everything Is Connected: Multimedia narration of automatical...The Story behind Everything Is Connected: Multimedia narration of automatical...
The Story behind Everything Is Connected: Multimedia narration of automatical...
Miel Vander Sande
 
LDOW2013 r&wbase: git for triples
LDOW2013 r&wbase: git for triplesLDOW2013 r&wbase: git for triples
LDOW2013 r&wbase: git for triples
Miel Vander Sande
 
The Terminator's origins or how the Semantic Web could endanger Humanity.
The Terminator's origins or how the Semantic Web could endanger Humanity.The Terminator's origins or how the Semantic Web could endanger Humanity.
The Terminator's origins or how the Semantic Web could endanger Humanity.
Miel Vander Sande
 
PMOD Challenges for Open Data Usage: Open derivatives and challenges
PMOD Challenges for Open Data Usage: Open derivatives and challengesPMOD Challenges for Open Data Usage: Open derivatives and challenges
PMOD Challenges for Open Data Usage: Open derivatives and challenges
Miel Vander Sande
 
Aan de slag met Linked Open Data
Aan de slag met Linked Open DataAan de slag met Linked Open Data
Aan de slag met Linked Open Data
Miel Vander Sande
 
The DataTank: an Open Data adapter with semantic output
The DataTank: an Open Data adapter with semantic outputThe DataTank: an Open Data adapter with semantic output
The DataTank: an Open Data adapter with semantic output
Miel Vander Sande
 
Follow the stars 25/11/2011
Follow the stars 25/11/2011Follow the stars 25/11/2011
Follow the stars 25/11/2011
Miel Vander Sande
 
PhD Defense: Metadata and Control Features for Low-Cost Linked Data Publishin...
PhD Defense: Metadata and Control Features for Low-Cost Linked Data Publishin...PhD Defense: Metadata and Control Features for Low-Cost Linked Data Publishin...
PhD Defense: Metadata and Control Features for Low-Cost Linked Data Publishin...
Miel Vander Sande
 
Reproducibility with 
the 99 cents Linked Data archive
Reproducibility with 
the 99 cents Linked Data archiveReproducibility with 
the 99 cents Linked Data archive
Reproducibility with 
the 99 cents Linked Data archive
Miel Vander Sande
 
Innovatiemarkt 2017: Machines are the new digital natives
Innovatiemarkt 2017: Machines are the new digital nativesInnovatiemarkt 2017: Machines are the new digital natives
Innovatiemarkt 2017: Machines are the new digital natives
Miel Vander Sande
 
A sweet affordable combo for Linked Data Archives
A sweet affordable combo for Linked Data ArchivesA sweet affordable combo for Linked Data Archives
A sweet affordable combo for Linked Data Archives
Miel Vander Sande
 
Machines are the new Digital Natives
Machines are the new Digital NativesMachines are the new Digital Natives
Machines are the new Digital Natives
Miel Vander Sande
 
Time travelling through DBpedia
Time travelling through DBpediaTime travelling through DBpedia
Time travelling through DBpedia
Miel Vander Sande
 
Opportunistic Linked Data Querying through Approximate Membership Metadata
Opportunistic Linked Data Querying through Approximate Membership MetadataOpportunistic Linked Data Querying through Approximate Membership Metadata
Opportunistic Linked Data Querying through Approximate Membership Metadata
Miel Vander Sande
 
Publish data as Time Consistent Web API based on Provenance (WS-REST 2014)
Publish data as Time Consistent Web API based on Provenance (WS-REST 2014)Publish data as Time Consistent Web API based on Provenance (WS-REST 2014)
Publish data as Time Consistent Web API based on Provenance (WS-REST 2014)
Miel Vander Sande
 
The Story behind Everything Is Connected: Multimedia narration of automatical...
The Story behind Everything Is Connected: Multimedia narration of automatical...The Story behind Everything Is Connected: Multimedia narration of automatical...
The Story behind Everything Is Connected: Multimedia narration of automatical...
Miel Vander Sande
 
LDOW2013 r&wbase: git for triples
LDOW2013 r&wbase: git for triplesLDOW2013 r&wbase: git for triples
LDOW2013 r&wbase: git for triples
Miel Vander Sande
 
The Terminator's origins or how the Semantic Web could endanger Humanity.
The Terminator's origins or how the Semantic Web could endanger Humanity.The Terminator's origins or how the Semantic Web could endanger Humanity.
The Terminator's origins or how the Semantic Web could endanger Humanity.
Miel Vander Sande
 
PMOD Challenges for Open Data Usage: Open derivatives and challenges
PMOD Challenges for Open Data Usage: Open derivatives and challengesPMOD Challenges for Open Data Usage: Open derivatives and challenges
PMOD Challenges for Open Data Usage: Open derivatives and challenges
Miel Vander Sande
 
Aan de slag met Linked Open Data
Aan de slag met Linked Open DataAan de slag met Linked Open Data
Aan de slag met Linked Open Data
Miel Vander Sande
 
The DataTank: an Open Data adapter with semantic output
The DataTank: an Open Data adapter with semantic outputThe DataTank: an Open Data adapter with semantic output
The DataTank: an Open Data adapter with semantic output
Miel Vander Sande
 

Recently uploaded (20)

Play It Safe: Manage Security Risks - Google Certificate
Play It Safe: Manage Security Risks - Google CertificatePlay It Safe: Manage Security Risks - Google Certificate
Play It Safe: Manage Security Risks - Google Certificate
VICTOR MAESTRE RAMIREZ
 
AI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of DocumentsAI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of Documents
UiPathCommunity
 
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à GenèveUiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPathCommunity
 
Canadian book publishing: Insights from the latest salary survey - Tech Forum...
Canadian book publishing: Insights from the latest salary survey - Tech Forum...Canadian book publishing: Insights from the latest salary survey - Tech Forum...
Canadian book publishing: Insights from the latest salary survey - Tech Forum...
BookNet Canada
 
TrsLabs - AI Agents for All - Chatbots to Multi-Agents Systems
TrsLabs - AI Agents for All - Chatbots to Multi-Agents SystemsTrsLabs - AI Agents for All - Chatbots to Multi-Agents Systems
TrsLabs - AI Agents for All - Chatbots to Multi-Agents Systems
Trs Labs
 
How to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabberHow to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabber
eGrabber
 
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz
 
The Changing Compliance Landscape in 2025.pdf
The Changing Compliance Landscape in 2025.pdfThe Changing Compliance Landscape in 2025.pdf
The Changing Compliance Landscape in 2025.pdf
Precisely
 
Foundations of Cybersecurity - Google Certificate
Foundations of Cybersecurity - Google CertificateFoundations of Cybersecurity - Google Certificate
Foundations of Cybersecurity - Google Certificate
VICTOR MAESTRE RAMIREZ
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Vaibhav Gupta BAML: AI work flows without Hallucinations
Vaibhav Gupta BAML: AI work flows without HallucinationsVaibhav Gupta BAML: AI work flows without Hallucinations
Vaibhav Gupta BAML: AI work flows without Hallucinations
john409870
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptxReimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
John Moore
 
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
James Anderson
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent LasterAI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
All Things Open
 
Unlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web AppsUnlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web Apps
Maximiliano Firtman
 
Vibe Coding_ Develop a web application using AI (1).pdf
Vibe Coding_ Develop a web application using AI (1).pdfVibe Coding_ Develop a web application using AI (1).pdf
Vibe Coding_ Develop a web application using AI (1).pdf
Baiju Muthukadan
 
AI You Can Trust: The Critical Role of Governance and Quality.pdf
AI You Can Trust: The Critical Role of Governance and Quality.pdfAI You Can Trust: The Critical Role of Governance and Quality.pdf
AI You Can Trust: The Critical Role of Governance and Quality.pdf
Precisely
 
Play It Safe: Manage Security Risks - Google Certificate
Play It Safe: Manage Security Risks - Google CertificatePlay It Safe: Manage Security Risks - Google Certificate
Play It Safe: Manage Security Risks - Google Certificate
VICTOR MAESTRE RAMIREZ
 
AI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of DocumentsAI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of Documents
UiPathCommunity
 
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à GenèveUiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPathCommunity
 
Canadian book publishing: Insights from the latest salary survey - Tech Forum...
Canadian book publishing: Insights from the latest salary survey - Tech Forum...Canadian book publishing: Insights from the latest salary survey - Tech Forum...
Canadian book publishing: Insights from the latest salary survey - Tech Forum...
BookNet Canada
 
TrsLabs - AI Agents for All - Chatbots to Multi-Agents Systems
TrsLabs - AI Agents for All - Chatbots to Multi-Agents SystemsTrsLabs - AI Agents for All - Chatbots to Multi-Agents Systems
TrsLabs - AI Agents for All - Chatbots to Multi-Agents Systems
Trs Labs
 
How to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabberHow to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabber
eGrabber
 
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz
 
The Changing Compliance Landscape in 2025.pdf
The Changing Compliance Landscape in 2025.pdfThe Changing Compliance Landscape in 2025.pdf
The Changing Compliance Landscape in 2025.pdf
Precisely
 
Foundations of Cybersecurity - Google Certificate
Foundations of Cybersecurity - Google CertificateFoundations of Cybersecurity - Google Certificate
Foundations of Cybersecurity - Google Certificate
VICTOR MAESTRE RAMIREZ
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Vaibhav Gupta BAML: AI work flows without Hallucinations
Vaibhav Gupta BAML: AI work flows without HallucinationsVaibhav Gupta BAML: AI work flows without Hallucinations
Vaibhav Gupta BAML: AI work flows without Hallucinations
john409870
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptxReimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
John Moore
 
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
James Anderson
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent LasterAI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
All Things Open
 
Unlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web AppsUnlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web Apps
Maximiliano Firtman
 
Vibe Coding_ Develop a web application using AI (1).pdf
Vibe Coding_ Develop a web application using AI (1).pdfVibe Coding_ Develop a web application using AI (1).pdf
Vibe Coding_ Develop a web application using AI (1).pdf
Baiju Muthukadan
 
AI You Can Trust: The Critical Role of Governance and Quality.pdf
AI You Can Trust: The Critical Role of Governance and Quality.pdfAI You Can Trust: The Critical Role of Governance and Quality.pdf
AI You Can Trust: The Critical Role of Governance and Quality.pdf
Precisely
 

Preserving a Web of Linked Data: Lessons and challenges from a fading web

  • 1. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 1/60 Preserving a Web of Linked Data Lessons and challenges from a fading Web Miel Vander Sande Ghent University – imec
  • 2. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 2/60 There are many sides to preservation.
  • 3. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 3/60 Web of Linked Data?
  • 4. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 4/60
  • 5. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 5/60
  • 6. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 6/60
  • 7. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 7/60
  • 8. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 8/60 “ We are loosing thousands of Alexandria libraries each day We have lost so much of the early Web history, just as we have lost so much of early Human history. —Kalev H. Leetaru - University of Illinois
  • 9. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 9/60 The forces of decay Link Rot Content Drift Digital Preservation Business Case Toolkit http://wiki.dpconline.org/
  • 10. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 10/60 Link Rot Illustration by the Project Twins
  • 11. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 11/60
  • 12. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 12/60 Content Drift Significant change in content within a 3-Month Period
  • 13. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 13/60
  • 14. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 14/60
  • 15. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 15/60
  • 16. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 16/60 Yesterday: Web archiving strategies Today: Tools for a Web of Linked Data Tomorrow: Things to keep in mind Preserving a Web of Linked Data 1 2 3
  • 17. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 17/60 Yesterday: Web archiving strategies Today: Tools for a Web of Linked Data Tomorrow: Things to keep in mind Preserving a Web of Linked Data 1 2 3
  • 18. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 18/60 Strategies Observational: perceived as discrete Snapshot Web archive Historical: perceived as continuous Versioning systems Transactional Notification-based
  • 19. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 19/60 Snapshot
  • 20. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 20/60 Web archive See: Open Wayback
  • 21. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 21/60 Versioning systems See: MediaWiki
  • 22. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 22/60 Transactional See: SiteStory apache plugin
  • 23. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 23/60 If a representation changes and nobody is around to see it, should it be archived?
  • 24. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 24/60 Notification-based
  • 25. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 25/60 Memento: travelling to the Web of the Past https://tools.ietf.org/html/rfc7089
  • 26. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 26/60
  • 27. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 27/60
  • 28. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 28/60 Yesterday: Web archiving strategies Today: Tools for a Web of Linked Data Tomorrow: Things to keep in mind Preserving a Web of Linked Data 1 2 3
  • 29. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 29/60 Archive or Archiving?
  • 30. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 30/60 Linked Data archiving as the product RDF indexes for versioning Dydra, Virtuoso, XRDF3X, ... Representations of versions, provenance & time: PROV, LDPatch, LODE, ... Technical (Increasingly) Popular research tracks.
  • 31. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 31/60 Linked Data archiving as the process Some technological building blocks Linked Data interfaces, change detection, publishing, crawling & querying Technical, as well as Infrastructural & Societal. Rather unknown territory (but there are technologies).
  • 32. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 32/60 What assumptions are there about data evolution? Historical Data Provenance is a timeline. Only truth can exist at the same time. Timeseries databases, Wikipedia Versioned Data Provenance is a directed acyclic graph. Multiple truths can exist at the same time.
  • 33. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 33/60 Decay becomes more complex Link Rot Content Drift Concept Drift "Please don't change your vocabulary" (Check out DRIFT-A-LOD workshop) Problem in other domains as well (Machine Learning)
  • 34. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 34/60 Study these issues within Linked Data Link Rot Subject or Object cannot be dereferenced Dataset/Interface is gone Content Drift Context graph of Subject or Object has changed Concept Drift Predicate or Object change meaning
  • 35. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 35/60 Archiving for the Reproducibility of Query results Sustain the validity of claims Backwards compatibility of applications Federated querying is highly affected How to shape a decentralized Quality of Service? The Hyperlink is the simplest form of decentralization, which we are already failing to preserve.
  • 36. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 36/60 Persistent Identification Figure by Herbert Van de Sompel
  • 37. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 37/60 Persistent Identification Dependency on publisher registering the PIDs Possible loss of connection between PIDs and the original Dependency on the PID provider Possibly replacing one potential Link rot problem by another
  • 38. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 38/60 Who are you to tell me my URI is not persistent? ISWC Resources track: Consensus on and trust in persistence in a decentralized Web: community-driven? standardization? blockchain,...?
  • 39. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 39/60 Robust links <a href="B" data-versionurl="URL of snapshot of B" data-versiondate="datetime of snapshot of B"> http://robustlinks.mementoweb.org/spec/
  • 40. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 40/60 Robust Links Open Annotation & Memento vocab Can be linked to PROV Figure by Herbert Van de Sompel
  • 41. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 41/60 Real-time data Parallel truths Open challenges with Memento HTTP Datetime format is per second No solution for accessing Versioned Data
  • 42. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 42/60 Who will be responsible for archiving? Publisher Snapshot Versioning systems 3rd party Traditional Hybrid: Publisher and/or 3rd party Transactional Notification-based
  • 43. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 43/60 Snapshot Often "End of Term" archive (DBPedia version) Exchangeable archives, eg. file-based HDT
  • 44. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 44/60
  • 45. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 45/60
  • 46. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 46/60 Web RDF Versioning systems Memento support can improve depends on query expressivity Significant progress in the RDF domain MediaWiki Storage: Dydra, Virtuoso, ... Memento-supported publishing: DBpedia Wayback machine, Linked Data Fragments Server
  • 47. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 47/60 Linked Data pages Triple Patterns Hybrid: Snapshot + Versioning Discrete snapshots + index for continuous versions Tailr, ... Ostrich (offset-enabled), ...
  • 48. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 48/60 Web archive Not much in place yet Indexes, but no notion of time Sindice, LODCache, LODLaundromat Many technologies targeted crawling, sindice LODLaundromat, Linked Data Crawling, ... No guarantees on completeness
  • 49. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 49/60 Transactional Decentralized, sustainable solution A challenge for completeness Dependence on resource granularity eg. SPARQL results or Linked Data pages? Interested to see how far we would get...
  • 50. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 50/60 Notification-based
  • 51. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 51/60 Yesterday: Web archiving strategies Today: tools for a Web of Linked Data Tomorrow: things to keep in mind Preserving a Web of Linked Data 1 2 3
  • 52. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 52/60 Data archiving intrests more than curators & activists For instance, Data driven journalism. Product: transparency of the editorial process Process: interaction with users, public Scolary communication, cultural heritage, legal publications, community databases (Wikipedia & Wikidata)
  • 53. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 53/60 Archivability of Linked Data
  • 54. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 54/60 Linked Data is in essence easier to archive. Raw, self-contained data Already machine processable/understandable No obfuscation by client-side scripting
  • 55. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 55/60 “ Accessibility of content to stimulate archiving. The content in HTML+RDFa that dokieli produces is accessible (readable) without requiring any CSS or JavaScript, ie. text-browser safe. Breaking this "rule" in future development should be considered an anti-pattern (or a bug) in dokieli. —dokieli documentation, Sarven Capadisli
  • 56. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 56/60 Intelligent Server Intelligent Client Choices in Linked Data interface increase or decrease archiving. High resource granularity Data not as accessible Need to participate in archiving process data dump Triple Pattern Fragments SPARQL endpoint interface offered by the server
  • 57. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 57/60 Prevent mistakes from the past in standardization Query interfaces: what can be archived? Protocols: is it accessible? Domain Modeling: can the semantics be preserved? How to select the subgraph?
  • 58. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 58/60 Yesterday: Web archiving strategies Today: Tools for a Web of Linked Data Tomorrow: Things to keep in mind Preserving a Web of Linked Data 1 2 3
  • 59. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 59/60 There are many sides to preservation. We don't start from scratch, many technologies are there. Start covering the uncovered sides. Add archiving to the discussion.
  • 60. 03/06/2018 Preserving a Web of Linked Data - Lessons and challenges from a fading Web https://mielvds.github.io/MEPDaW2018/#1 60/60 Preserving a Web of Linked Data Lessons and challenges from a fading Web Miel Vander Sande Ghent University – imec