SlideShare a Scribd company logo
Stockholm, November 11, 2018
KTH Royal Institute of Technology
From Linked Data to
Cognitive Data
--- VERTRAULICH ---
Zuse Z3: the
beginning of
Computing –
close to the
hardware
Foto: Konrad Zuse
Internet
Archiv/Deutsches
Museum/DFG
© Fraunhofer
--- VERTRAULICH ---
We can make things
more intuitive
Picture: The illustrated recipes
of lucy eldridge
http://thefoxisblack.com/2013/
07/18/the-illustrated-recipes-
of-lucy-eldridge/
Computing more inuitive: procedural programming
Sören Auer 6
Computing more inuitive: OO programming
Sören Auer 8
Sören Auer 9
Computing even more inuitive: with cognitive data?!
Page 10
Machine Learning and Big Data
http://www.spacemachine.net/views/2016/3/datasets-over-algorithms
 AI is not just the next hype after Big Data, Big Data is the
reason why we have AI!
Page 11
Source: Gesellschaft für
Informatik
The Three “V” of Big Data - Variety often Neglected
Linked Data Principles
Addressing the neglected third V (Variety)
1. Use URIs to identify the “things” in your data
2. Use http:// URIs so people (and machines) can look them up on the web
3. When a URI is looked up, return a description of the thing in the W3C
Resource Description Format (RDF)
4. Include links to related things
http://www.w3.org/DesignIssues/LinkedData.html
12
[1] Auer, Lehmann, Ngomo, Zaveri: Introduction to Linked Data and Its Lifecycle on the Web. Reasoning Web 2013
Page 13
1. Graph based RDF data model consisting of S-P-O statements (facts)
RDF & Linked Data in a Nutshell
OSLCFest
dbpedia:Stockholm
05.11.2018
KTH
conf:organizes
conf:starts
conf:takesPlaceIn
2. Serialised as RDF Triples:
KTH conf:organizes OSLCFest .
OSLCFest conf:starts “2018-11-05”^^xsd:date .
OSLCFest conf:takesPlaceAt dbpedia:Stockholm .
3. Publication under URL in Web, Intranet, Extranet
Subject Predicate Object
Page 14
Creating Knowledge Graphs with RDF
Linked Data
located in
label
industry
headquarters
full nameDHL
Post Tower
162.5 m
Bonn
Logistics Logistik
DHL International GmbH
height
物流
label
Page 15
Graph consists of:
 Resources (identified via URIs)
 Literals: data values with data type (URI) or language (multilinguality integrated)
 Attributes of resources are also URI-identified (from vocabularies)
Various data sources and vocabularies can be arbitrarily mixed and meshed
URIs can be shortened with namespace prefixes; e.g. dbp: → http://dbpedia.org/resource/
RDF Data Model (a bit more technical)
gn:locatedIn
rdfs:label
dbo:industry
ex:headquarters
foaf:namedbp:DHL_International_GmbH
dbp:Post_Tower
"162.5"^^xsd:decimal
dbp:Bonn
dbp:Logistics
"Logistik"@de
"DHL International GmbH"^^xsd:string
ex:height
"物流"@zh
rdfs:label
rdf:value
unit:Meter
ex:unit
Vocabularies – Breaking the mold!
• Semantic data virtualization allows for continuous expansion and enhancement of data and
metadata across data sources without loosing the overall perspective
Relational
data models
1:1 Relation between
Data Model und Application
Graph based
data model
Subject
Predicate
Object / Subject
Predicate
Object / Subject
1:n Relation between
Data Model and Application
RDF mediates between different Data Models & bridges between
Conceptual and Operational Layers
Id Title Screen
5624 SmartTV 104cm
5627 Tablet 21cm
Prod:5624 rdf:type Electronics
Prod:5624 rdfs:label “SmartTV”
Prod:5624 hasScreenSize “104”^^unit:cm
...
Electronics
Vehicle
Car Bus Truck
Vehicle rdf:type owl:Thing
Car rdfs:subClassOf Vehicle
Bus rdfs:subClassOf Vehicle
...
Tabular/Relational Data
Taxonomic/Tree Data
Logical Axioms / Schema
Male rdfs:subClassOf Human
Female rdfs:subClassOf Human
Male owl:disjointWith Female
...
Sören Auer 17
18
Engineering Manufactur. Logistics Marketing. . .
Parts of data are being curated, duplicated, annotated and simply
changed over time, making reconciliation and interpretation a challenge
Perspectives on data turn into silos
Engineering Manufactur. Logistics Marketing
19
Integrate Using RDF & Vocabularies
Page 20
The Trinity of Semantic Integration
Knowledge Graphs
• Complex fabric of concepts
& relationships
• Focus on heterogenous,
multi-domain knowledge
representation
Data Spaces
• Community of
organizations agreeing on
standards for data access/
security/ semantics/
governance/ licenses
• Focus on data sharing &
exchange
Semantic Data Lakes
• Storage facility for
enterprise/research data
• Use Big Data (HDFS)
management
• Focus on scalable data
access
Use in a single organization Intra-organizational use
Page 21
• Fabric of concept, class, property, relationships, entity descriptions
• Uses a knowledge representation formalism
(typically RDF, RDF-Schema, OWL)
• Holistic knowledge (multi-domain, source, granularity):
• instance data (ground truth),
• open (e.g. DBpedia, WikiData), private (e.g. supply chain data),
closed data (product models),
• derived, aggregated data,
• schema data (vocabularies, ontologies)
• meta-data (e.g. provenance, versioning, documentation licensing)
• comprehensive taxonomies to categorize entities
• links between internal and external data
• mappings to data stored in other systems and databases
Knowledge Graphs – A definition
Smart Data for
Machine Learning
Page 22
Page 23
Search Engine Optimization & Web-Commerce
 Schema.org used by >20% of Web sites
 Major search engines exploit semantic descriptions
Pharma, Lifesciences
 Mature, comprehensive vocabularies and ontologies
 Billions of disease, drug, clinical trial descriptions
Digital Libraries
 Many established vocabularies (DublinCore, FRBR,
EDM)
 Millions of aggregated from thousands of memory
institutions in Europeana, German Digital Library
Emerging Knowledge Graphs & Data Spaces
ENTERPRISE DATA INTEGRATION WITH A
SEMANTIC DATA LAKE
Example:
© eccenca GmbH 2016
The future of data management is semantic!
The Problem today Solution Tomorrow
App. 1 App. 2 App. 3 App. 1 App. 2 App. 3
Data Access limited
to connected source
Exploding cost
of ETL
Full Access to All Data
Lean Architecture
Great Synergies in data
lifting
Management
Accounting
Risk Management
Regulatory Reporting
Treasury MarketingAccounting
Corporate
Memory
Inbound
Data Sources
Outbound and
Consumption
Inbound Raw Data Store
Knowledge Graph for Meta Data, KPI Definition and Data Models
Frontend to Access Relationship and KPI Definition /
Documentation
Frontend to Access (ad hoc) Reports
Outbound Data Delivery to Target
Systems
Big Data DWH-
Infrastructure
High Level Architecture
Corporate Memory
Integration via Knowledge Graph and
Semantic Data Models
27
Knowledge Graph
(RDF)
XML
EDI
CSV
iDoc
RDF
JSON
XML
EDI
CSV
iDoc
RDF
JSON
Supplier OnBoarding cost/time reduction due to rich and flexible pivot format
OEMSupplier
© eccenca GmbH 2016
lift
ERP
sync
OEM
CMEM
Supplier
CMEM
ERP
CMEM-SYNC
tabulate
subscribe
Turning Strings into Things for Graph
Synchronization
CMEM-SYNC
Ingestion / Cataloging
• Cataloging of datasets and
vocabularies
• Rich meta data model
• Automatic profiling of datasets
• DataLake (HDFS) integration
• Extraction of metadata
• Continuous monitoring for new
versions and structural changes
29
Ingestion
Cataloging
Mapping Discovery Linking Selection Analytics, Experiments
Manage Datasets
30
Profiling Data
31
Mapping
• Sophisticated mapping management
• Mapping towards semantic vocabularies
(lifting)
• Self documentation of data (data
dictionary)
• Normalization of data
• Mapping suggestions
• Mapping reuse based on data profiling
• Advanced mapping suggestions
• machine learning
• data fingerprinting
32
Ingestion
Cataloging
Mapping Discovery Linking Selection Analytics, Experiments
Discovery
• Calculation of dataset
relatedness / similarity
• Visual exploration of
data neighborhood
• Similarity measure based
on profiling and mapping
• Similarity measure based
on data fingerprinting
33
Ingestion
Cataloging
Mapping Discovery Linking Selection Analytics, Experiments
Linking
• Linking based on expressive rule
trees
• Interactive machine learning of
linkage rules
• Continuous integration of gold
standard for quality assurance
• Data fusion support
34
Ingestion
Cataloging
Mapping Discovery Linking Selection Analytics, Experiments
© eccenca GmbH 2016
Create Declarative Matching Rules
Create Context-aware deterministic rules to match pairs of records, supported by machine learning.
© eccenca GmbH 2016
© Fraunhofer
Industrial/International
Data Space
Establishing Data Value Chains
© Fraunhofer 37
Digitisation of Industry
Digitisation Enables Data Driven Business Models
… for Example Precision Farming
Image sources: wiwo, traction-magazin.de. Quelle: Beecham Research Ltd. (2014).
“Precision Farming” Value Creation in the “Ecosystem”
“Digital
Farming
Eco-
system”
Machine
Producer
Seed
Provider
Farmers
Wholesale
Technology
Provider
Weather
Service
© Fraunhofer 38
Goal and Architecture of the Industrial Data Space
Der Industrial Data Space aims at blueprinting a
“Network of Trusted Data”.
Secure
Data
exchange
Trustworthiness
Certified
Members
Decentralisation
Federated
Architecture Sovereignty
over Data
and Services
Governance
Common Rules
of the Game
Scalability
Network Effects
Openness
Neutral and
User-Driven Ecosystem
Platform and
Services
© Fraunhofer 39
Goal and Architecture of the Industrial Data Space
Component Reference Architecture
© Fraunhofer
www.industrialdataspace.or
g
// 40
LOCATION IN THE CONTEXT OF “INDUSTRY 4.0”
FOCUS ON DATA
Retail 4.0 Bank 4.0Insurance
4.0
…
Industrie 4.0
Focus on Manufacturing
Industry
Smart Services
Transfer and
Networks
Real time systems
Industrial Data Space
Focus on Data
Data
…
© Fraunhofer 41
Goal and Architecture of the Industrial Data Space
The Industrial Data Space Connects the Internet of
Things and Smart Services.
Integration Millions of Metadata
Records from >2000 Memory
Institutions for the German Digital
Library
A Cultural Heritage Data Space
--- VERTRAULICH ---
43
Dataspace with
• 2000 memory institutions in Germany alone
• Common semantic data model: EDM
• Common data governance: CC0
• Common access scheme: OAI-PMH
--- VERTRAULICH ---
--- VERTRAULICH ---
Conclusion
Page 47
Hybrid AI – combination of smart data (knowledge graphs) and smart analytics
Distributed semantic technologies – knowledge representation using vocabularies,
ontologies
Question Answering
• Open Question Answering architecture – flexible, knowledge-based integration
architecture for QA components and pipelines
• Dialogue Systems - combination of language models and goal-driven question
answering
Integration with Crowdsourcing
Knowlege Graphs, Semantic Data Lakes
Robotics – usage of semantics for actuation
Agile Interoperability – leveraging community driven vocabulary development
Cognitive Data challenges where we can
make a difference
 Systematic Enterprise
Linked Data Framework
(GDPR is a driver)
https://de.linkedin.com/in/soerenauer
https://twitter.com/soerenauer
https://www.xing.com/profile/Soeren_Auer
http://www.researchgate.net/profile/Soeren_Auer
TIB & Leibniz University of Hannover
auer@tib.eu
Prof. Dr. Sören Auer

More Related Content

PPTX
Towards an Open Research Knowledge Graph
Sören Auer
 
PPTX
Describing Scholarly Contributions semantically with the Open Research Knowle...
Sören Auer
 
PPTX
Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Sören Auer
 
PPTX
Knowledge Graph Introduction
Sören Auer
 
PDF
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
Sören Auer
 
PPTX
Creating knowledge out of interlinked data
Sören Auer
 
PDF
LDOW2015 Position Talk and Discussion
Sören Auer
 
PPTX
Linked data for Enterprise Data Integration
Sören Auer
 
Towards an Open Research Knowledge Graph
Sören Auer
 
Describing Scholarly Contributions semantically with the Open Research Knowle...
Sören Auer
 
Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Sören Auer
 
Knowledge Graph Introduction
Sören Auer
 
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
Sören Auer
 
Creating knowledge out of interlinked data
Sören Auer
 
LDOW2015 Position Talk and Discussion
Sören Auer
 
Linked data for Enterprise Data Integration
Sören Auer
 

What's hot (19)

PPTX
Das Semantische Daten Web für Unternehmen
Sören Auer
 
PPTX
Enterprise knowledge graphs
Sören Auer
 
PDF
Managing Metadata for Science and Technology Studies: the RISIS case
Rinke Hoekstra
 
PPTX
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Sören Auer
 
PPTX
Towards digitizing scholarly communication
Sören Auer
 
PPTX
What can linked data do for digital libraries
Sören Auer
 
PDF
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Rinke Hoekstra
 
PDF
Prov-O-Viz: Interactive Provenance Visualization
Rinke Hoekstra
 
PDF
What is New in W3C land?
Ivan Herman
 
PDF
Knowledge Graph Maintenance
Paul Groth
 
PDF
Knowledge Representation on the Web
Rinke Hoekstra
 
PDF
An Ecosystem for Linked Humanities Data
Rinke Hoekstra
 
PDF
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Stefan Dietze
 
PDF
Research Knowledge Graphs at GESIS & NFDI4DataScience
Stefan Dietze
 
PDF
Knowledge Graph Futures
Paul Groth
 
PPTX
Data Communities - reusable data in and outside your organization.
Paul Groth
 
PPTX
End-to-End Learning for Answering Structured Queries Directly over Text
Paul Groth
 
PDF
Scalable and privacy-preserving data integration - part 1
ErhardRahm
 
PPTX
Thoughts on Knowledge Graphs & Deeper Provenance
Paul Groth
 
Das Semantische Daten Web für Unternehmen
Sören Auer
 
Enterprise knowledge graphs
Sören Auer
 
Managing Metadata for Science and Technology Studies: the RISIS case
Rinke Hoekstra
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Sören Auer
 
Towards digitizing scholarly communication
Sören Auer
 
What can linked data do for digital libraries
Sören Auer
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Rinke Hoekstra
 
Prov-O-Viz: Interactive Provenance Visualization
Rinke Hoekstra
 
What is New in W3C land?
Ivan Herman
 
Knowledge Graph Maintenance
Paul Groth
 
Knowledge Representation on the Web
Rinke Hoekstra
 
An Ecosystem for Linked Humanities Data
Rinke Hoekstra
 
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Stefan Dietze
 
Research Knowledge Graphs at GESIS & NFDI4DataScience
Stefan Dietze
 
Knowledge Graph Futures
Paul Groth
 
Data Communities - reusable data in and outside your organization.
Paul Groth
 
End-to-End Learning for Answering Structured Queries Directly over Text
Paul Groth
 
Scalable and privacy-preserving data integration - part 1
ErhardRahm
 
Thoughts on Knowledge Graphs & Deeper Provenance
Paul Groth
 
Ad

Similar to Cognitive data (20)

PPTX
Sören Auer | Enterprise Knowledge Graphs
semanticsconference
 
PDF
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
DATAVERSITY
 
PDF
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE
 
PDF
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
Linked Enterprise Date Services
 
PDF
The Emerging Data Lake IT Strategy
Thomas Kelly, PMP
 
ODP
Semantic Web - Introduction
Oleksandr Pryymak
 
PPTX
Introduction to Big data
cthanopoulos
 
PPTX
Big Data Overview 2013-2014
KMS Technology
 
PDF
Introduction to Knowledge Graphs and Semantic AI
Semantic Web Company
 
PPTX
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
Optum
 
PPTX
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
semanticsconference
 
PPTX
Linked Energy Data Generation
Filip Radulovic
 
PDF
IRJET- Data Retrieval using Master Resource Description Framework
IRJET Journal
 
PDF
Introduction to Knowledge Graphs for Information Architects.pdf
Heather Hedden
 
PPTX
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
National Information Standards Organization (NISO)
 
PDF
Intro to-technologies-Green-City-Hackathon-Athens
Stoitsis Giannis
 
PDF
Implementing Linked Data in Low-Resource Conditions
AIMS (Agricultural Information Management Standards)
 
PDF
The Web of Data: The W3C Semantic Web Initiative
National Information Standards Organization (NISO)
 
PPTX
Everything Self-Service:Linked Data Applications with the Information Workbench
Peter Haase
 
Sören Auer | Enterprise Knowledge Graphs
semanticsconference
 
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
DATAVERSITY
 
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE
 
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
Linked Enterprise Date Services
 
The Emerging Data Lake IT Strategy
Thomas Kelly, PMP
 
Semantic Web - Introduction
Oleksandr Pryymak
 
Introduction to Big data
cthanopoulos
 
Big Data Overview 2013-2014
KMS Technology
 
Introduction to Knowledge Graphs and Semantic AI
Semantic Web Company
 
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
Optum
 
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
semanticsconference
 
Linked Energy Data Generation
Filip Radulovic
 
IRJET- Data Retrieval using Master Resource Description Framework
IRJET Journal
 
Introduction to Knowledge Graphs for Information Architects.pdf
Heather Hedden
 
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
National Information Standards Organization (NISO)
 
Intro to-technologies-Green-City-Hackathon-Athens
Stoitsis Giannis
 
Implementing Linked Data in Low-Resource Conditions
AIMS (Agricultural Information Management Standards)
 
The Web of Data: The W3C Semantic Web Initiative
National Information Standards Organization (NISO)
 
Everything Self-Service:Linked Data Applications with the Information Workbench
Peter Haase
 
Ad

More from Sören Auer (13)

PDF
Knowledge Graph Research and Innovation Challenges
Sören Auer
 
PPTX
DBpedia - 10 year ISWC SWSA best paper award presentation
Sören Auer
 
PPTX
Project overview big data europe
Sören Auer
 
PPTX
Open data for smart cities
Sören Auer
 
PDF
The web of interlinked data and knowledge stripped
Sören Auer
 
PPT
Проект Евросоюза LOD2 и Британский Институт Открытых данных
Sören Auer
 
PPTX
Linked data and semantic wikis
Sören Auer
 
PPTX
ESWC2010 "Linked Data: Now what?" Panel Discussion slides
Sören Auer
 
PPTX
LESS - Template-based Syndication and Presentation of Linked Data for End-users
Sören Auer
 
PPT
Overview AG AKSW
Sören Auer
 
PPTX
WWW09 - Triplify Light-Weight Linked Data Publication from Relational Databases
Sören Auer
 
PPT
Linked Data Tutorial
Sören Auer
 
PDF
Participatory Research
Sören Auer
 
Knowledge Graph Research and Innovation Challenges
Sören Auer
 
DBpedia - 10 year ISWC SWSA best paper award presentation
Sören Auer
 
Project overview big data europe
Sören Auer
 
Open data for smart cities
Sören Auer
 
The web of interlinked data and knowledge stripped
Sören Auer
 
Проект Евросоюза LOD2 и Британский Институт Открытых данных
Sören Auer
 
Linked data and semantic wikis
Sören Auer
 
ESWC2010 "Linked Data: Now what?" Panel Discussion slides
Sören Auer
 
LESS - Template-based Syndication and Presentation of Linked Data for End-users
Sören Auer
 
Overview AG AKSW
Sören Auer
 
WWW09 - Triplify Light-Weight Linked Data Publication from Relational Databases
Sören Auer
 
Linked Data Tutorial
Sören Auer
 
Participatory Research
Sören Auer
 

Recently uploaded (20)

PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PDF
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
PDF
Practical Measurement Systems Analysis (Gage R&R) for design
Rob Schubert
 
PDF
Fundamentals and Techniques of Biophysics and Molecular Biology (Pranav Kumar...
RohitKumar868624
 
PPTX
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
PPT
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
PPTX
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
PDF
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
PPTX
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
PDF
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
PPTX
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
PPTX
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
PPTX
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
PDF
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
PPTX
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
Practical Measurement Systems Analysis (Gage R&R) for design
Rob Schubert
 
Fundamentals and Techniques of Biophysics and Molecular Biology (Pranav Kumar...
RohitKumar868624
 
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 

Cognitive data

  • 1. Stockholm, November 11, 2018 KTH Royal Institute of Technology From Linked Data to Cognitive Data
  • 2. --- VERTRAULICH --- Zuse Z3: the beginning of Computing – close to the hardware Foto: Konrad Zuse Internet Archiv/Deutsches Museum/DFG
  • 4. --- VERTRAULICH --- We can make things more intuitive Picture: The illustrated recipes of lucy eldridge http://thefoxisblack.com/2013/ 07/18/the-illustrated-recipes- of-lucy-eldridge/
  • 5. Computing more inuitive: procedural programming
  • 7. Computing more inuitive: OO programming
  • 9. Sören Auer 9 Computing even more inuitive: with cognitive data?!
  • 10. Page 10 Machine Learning and Big Data http://www.spacemachine.net/views/2016/3/datasets-over-algorithms  AI is not just the next hype after Big Data, Big Data is the reason why we have AI!
  • 11. Page 11 Source: Gesellschaft für Informatik The Three “V” of Big Data - Variety often Neglected
  • 12. Linked Data Principles Addressing the neglected third V (Variety) 1. Use URIs to identify the “things” in your data 2. Use http:// URIs so people (and machines) can look them up on the web 3. When a URI is looked up, return a description of the thing in the W3C Resource Description Format (RDF) 4. Include links to related things http://www.w3.org/DesignIssues/LinkedData.html 12 [1] Auer, Lehmann, Ngomo, Zaveri: Introduction to Linked Data and Its Lifecycle on the Web. Reasoning Web 2013
  • 13. Page 13 1. Graph based RDF data model consisting of S-P-O statements (facts) RDF & Linked Data in a Nutshell OSLCFest dbpedia:Stockholm 05.11.2018 KTH conf:organizes conf:starts conf:takesPlaceIn 2. Serialised as RDF Triples: KTH conf:organizes OSLCFest . OSLCFest conf:starts “2018-11-05”^^xsd:date . OSLCFest conf:takesPlaceAt dbpedia:Stockholm . 3. Publication under URL in Web, Intranet, Extranet Subject Predicate Object
  • 14. Page 14 Creating Knowledge Graphs with RDF Linked Data located in label industry headquarters full nameDHL Post Tower 162.5 m Bonn Logistics Logistik DHL International GmbH height 物流 label
  • 15. Page 15 Graph consists of:  Resources (identified via URIs)  Literals: data values with data type (URI) or language (multilinguality integrated)  Attributes of resources are also URI-identified (from vocabularies) Various data sources and vocabularies can be arbitrarily mixed and meshed URIs can be shortened with namespace prefixes; e.g. dbp: → http://dbpedia.org/resource/ RDF Data Model (a bit more technical) gn:locatedIn rdfs:label dbo:industry ex:headquarters foaf:namedbp:DHL_International_GmbH dbp:Post_Tower "162.5"^^xsd:decimal dbp:Bonn dbp:Logistics "Logistik"@de "DHL International GmbH"^^xsd:string ex:height "物流"@zh rdfs:label rdf:value unit:Meter ex:unit
  • 16. Vocabularies – Breaking the mold! • Semantic data virtualization allows for continuous expansion and enhancement of data and metadata across data sources without loosing the overall perspective Relational data models 1:1 Relation between Data Model und Application Graph based data model Subject Predicate Object / Subject Predicate Object / Subject 1:n Relation between Data Model and Application
  • 17. RDF mediates between different Data Models & bridges between Conceptual and Operational Layers Id Title Screen 5624 SmartTV 104cm 5627 Tablet 21cm Prod:5624 rdf:type Electronics Prod:5624 rdfs:label “SmartTV” Prod:5624 hasScreenSize “104”^^unit:cm ... Electronics Vehicle Car Bus Truck Vehicle rdf:type owl:Thing Car rdfs:subClassOf Vehicle Bus rdfs:subClassOf Vehicle ... Tabular/Relational Data Taxonomic/Tree Data Logical Axioms / Schema Male rdfs:subClassOf Human Female rdfs:subClassOf Human Male owl:disjointWith Female ... Sören Auer 17
  • 18. 18 Engineering Manufactur. Logistics Marketing. . . Parts of data are being curated, duplicated, annotated and simply changed over time, making reconciliation and interpretation a challenge Perspectives on data turn into silos
  • 19. Engineering Manufactur. Logistics Marketing 19 Integrate Using RDF & Vocabularies
  • 20. Page 20 The Trinity of Semantic Integration Knowledge Graphs • Complex fabric of concepts & relationships • Focus on heterogenous, multi-domain knowledge representation Data Spaces • Community of organizations agreeing on standards for data access/ security/ semantics/ governance/ licenses • Focus on data sharing & exchange Semantic Data Lakes • Storage facility for enterprise/research data • Use Big Data (HDFS) management • Focus on scalable data access Use in a single organization Intra-organizational use
  • 21. Page 21 • Fabric of concept, class, property, relationships, entity descriptions • Uses a knowledge representation formalism (typically RDF, RDF-Schema, OWL) • Holistic knowledge (multi-domain, source, granularity): • instance data (ground truth), • open (e.g. DBpedia, WikiData), private (e.g. supply chain data), closed data (product models), • derived, aggregated data, • schema data (vocabularies, ontologies) • meta-data (e.g. provenance, versioning, documentation licensing) • comprehensive taxonomies to categorize entities • links between internal and external data • mappings to data stored in other systems and databases Knowledge Graphs – A definition Smart Data for Machine Learning
  • 23. Page 23 Search Engine Optimization & Web-Commerce  Schema.org used by >20% of Web sites  Major search engines exploit semantic descriptions Pharma, Lifesciences  Mature, comprehensive vocabularies and ontologies  Billions of disease, drug, clinical trial descriptions Digital Libraries  Many established vocabularies (DublinCore, FRBR, EDM)  Millions of aggregated from thousands of memory institutions in Europeana, German Digital Library Emerging Knowledge Graphs & Data Spaces
  • 24. ENTERPRISE DATA INTEGRATION WITH A SEMANTIC DATA LAKE Example:
  • 25. © eccenca GmbH 2016 The future of data management is semantic! The Problem today Solution Tomorrow App. 1 App. 2 App. 3 App. 1 App. 2 App. 3 Data Access limited to connected source Exploding cost of ETL Full Access to All Data Lean Architecture Great Synergies in data lifting
  • 26. Management Accounting Risk Management Regulatory Reporting Treasury MarketingAccounting Corporate Memory Inbound Data Sources Outbound and Consumption Inbound Raw Data Store Knowledge Graph for Meta Data, KPI Definition and Data Models Frontend to Access Relationship and KPI Definition / Documentation Frontend to Access (ad hoc) Reports Outbound Data Delivery to Target Systems Big Data DWH- Infrastructure High Level Architecture Corporate Memory
  • 27. Integration via Knowledge Graph and Semantic Data Models 27 Knowledge Graph (RDF) XML EDI CSV iDoc RDF JSON XML EDI CSV iDoc RDF JSON Supplier OnBoarding cost/time reduction due to rich and flexible pivot format OEMSupplier
  • 28. © eccenca GmbH 2016 lift ERP sync OEM CMEM Supplier CMEM ERP CMEM-SYNC tabulate subscribe Turning Strings into Things for Graph Synchronization CMEM-SYNC
  • 29. Ingestion / Cataloging • Cataloging of datasets and vocabularies • Rich meta data model • Automatic profiling of datasets • DataLake (HDFS) integration • Extraction of metadata • Continuous monitoring for new versions and structural changes 29 Ingestion Cataloging Mapping Discovery Linking Selection Analytics, Experiments
  • 32. Mapping • Sophisticated mapping management • Mapping towards semantic vocabularies (lifting) • Self documentation of data (data dictionary) • Normalization of data • Mapping suggestions • Mapping reuse based on data profiling • Advanced mapping suggestions • machine learning • data fingerprinting 32 Ingestion Cataloging Mapping Discovery Linking Selection Analytics, Experiments
  • 33. Discovery • Calculation of dataset relatedness / similarity • Visual exploration of data neighborhood • Similarity measure based on profiling and mapping • Similarity measure based on data fingerprinting 33 Ingestion Cataloging Mapping Discovery Linking Selection Analytics, Experiments
  • 34. Linking • Linking based on expressive rule trees • Interactive machine learning of linkage rules • Continuous integration of gold standard for quality assurance • Data fusion support 34 Ingestion Cataloging Mapping Discovery Linking Selection Analytics, Experiments
  • 35. © eccenca GmbH 2016 Create Declarative Matching Rules Create Context-aware deterministic rules to match pairs of records, supported by machine learning. © eccenca GmbH 2016
  • 37. © Fraunhofer 37 Digitisation of Industry Digitisation Enables Data Driven Business Models … for Example Precision Farming Image sources: wiwo, traction-magazin.de. Quelle: Beecham Research Ltd. (2014). “Precision Farming” Value Creation in the “Ecosystem” “Digital Farming Eco- system” Machine Producer Seed Provider Farmers Wholesale Technology Provider Weather Service
  • 38. © Fraunhofer 38 Goal and Architecture of the Industrial Data Space Der Industrial Data Space aims at blueprinting a “Network of Trusted Data”. Secure Data exchange Trustworthiness Certified Members Decentralisation Federated Architecture Sovereignty over Data and Services Governance Common Rules of the Game Scalability Network Effects Openness Neutral and User-Driven Ecosystem Platform and Services
  • 39. © Fraunhofer 39 Goal and Architecture of the Industrial Data Space Component Reference Architecture
  • 40. © Fraunhofer www.industrialdataspace.or g // 40 LOCATION IN THE CONTEXT OF “INDUSTRY 4.0” FOCUS ON DATA Retail 4.0 Bank 4.0Insurance 4.0 … Industrie 4.0 Focus on Manufacturing Industry Smart Services Transfer and Networks Real time systems Industrial Data Space Focus on Data Data …
  • 41. © Fraunhofer 41 Goal and Architecture of the Industrial Data Space The Industrial Data Space Connects the Internet of Things and Smart Services.
  • 42. Integration Millions of Metadata Records from >2000 Memory Institutions for the German Digital Library A Cultural Heritage Data Space
  • 43. --- VERTRAULICH --- 43 Dataspace with • 2000 memory institutions in Germany alone • Common semantic data model: EDM • Common data governance: CC0 • Common access scheme: OAI-PMH
  • 47. Page 47 Hybrid AI – combination of smart data (knowledge graphs) and smart analytics Distributed semantic technologies – knowledge representation using vocabularies, ontologies Question Answering • Open Question Answering architecture – flexible, knowledge-based integration architecture for QA components and pipelines • Dialogue Systems - combination of language models and goal-driven question answering Integration with Crowdsourcing Knowlege Graphs, Semantic Data Lakes Robotics – usage of semantics for actuation Agile Interoperability – leveraging community driven vocabulary development Cognitive Data challenges where we can make a difference  Systematic Enterprise Linked Data Framework (GDPR is a driver)

Editor's Notes

  • #3: Die Z3 war der erste funktionsfähige Digitalrechner weltweit und wurde 1941 von Konrad Zuse in Zusammenarbeit mit Helmut Schreyer in Berlin gebaut. Die Z3 wurde in elektromagnetischer Relaistechnik mit 600 Relais für das Rechenwerk und 1400 Relais für das Speicherwerk ausgeführt.
  • #7: Longquan stoneware incense burner, China, 12th-13th century AD. Part of the Percival David Collection of Chinese Ceramics.
  • #11: Breakthroughs in AI come after data is available, not after algorithmic discoveries If you think about AI, think about the data, not algorithms Fun fact: most major AI companies share their internal deep learning toolkits
  • #19: Map the silos to their domain appropriate schemas Link the nodes (Linked Data) The schema can be virtual – multiple schemas/views may be appropriate
  • #20: Map the silos to their domain appropriate schemas Link the nodes (Linked Data) The schema can be virtual – multiple schemas/views may be appropriate
  • #26: You could argue: That MDM & BI Hub-Spoke systems have had the objective of the “Solution Tomorrow”, but were never able to fulfill on this promise due to their reliance on relational paradigm that prevent them from having the flexibility to truly provide an unlimited amount of perspectives on the same data. MDM & BI Hubs in the opposite have required all perspectives to be aligned with the one single truth that was physically incorporated in the backbone and paradigm of these respective approaches.
  • #30: Black current features Gray future / planned features
  • #33: Black current features Gray future / planned features
  • #34: Black current features Gray future / planned features
  • #35: Black current features Gray future / planned features
  • #41: Plattform Industrie 4.0: Gemeinschaftsprojekt der Wirtschaftsverbände BITKOM (IuK), VDMA (Maschinen/Anlagen), ZVEI (Elektro/Elektronik). Eine gleichnamige Plattform gibt’s auch in Österreich.