SlideShare a Scribd company logo
CV support in the next
Dataverse release
Slava Tykhonov
lead software engineer
DANS-KNAW R&D
CESSDA Tools Open Hour: Dataverse, 18.11.2021
DANS Data Stations - Future DANS Data Services
Dataverse is API based data platform and a key framework for Open Innovation!
FAIR and Dataverse
Source:
Mercè Crosas,
“FAIR principles and
beyond: implementation in
Dataverse”
Out of the box CV support in Dataverse (1)
Source: Dataverse Metadata Schema
Out of the box CV support in Dataverse (2)
Internal vocabularies are stored in Dataverse, we need more CVs!
Semantic interoperability on the infrastructure level
Dataverse Semantic API in release 5.6: https://github.com/IQSS/dataverse/releases/tag/v5.6
“Dataset metadata can be retrieved, set, and updated using a new, flatter JSON-LD format -
following the format of an OAI-ORE export (RDA-conformant Bags), allowing for easier transfer of
metadata to/from other systems (i.e. without needing to know Dataverse's metadata block and field
storage architecture). This new API also allows for the update of terms metadata“.
External controlled vocabularies support is being developed by DANS in SSHOC project and
already integrated in Dataverse core in the release 5.7.
Proposal: https://docs.google.com/document/d/1txdcFuxskRx_tLsDQ7KKLFTMR_r9IBhorDu3V_r445w/
Interfaces: http://github.com/gdcc/dataverse-external-vocab-support
Integrations: Wikidata, ORCID, MeSH, Skosmos vocabularies
Building block: Skosmos to host ontologies
7
● SKOSMOS is developed in
Europe by the National Library
of Finland (NLF)
● active global user community
● search and browsing interface
for SKOS concept
● multilingual vocabularies
support
● used for different use cases
(publish vocabularies, build
discovery systems, vocabulary
visualization)
Skosmos API with python module
pip install skosmos-client
SKOSMOS API for GRID ontology
9
Dataverse deposit form with connection to
ontologies
Every field can be linked to the appropriate controlled vocabularies in FAIR way!
One metadata field can be linked to many ontologies
Language switch in Dataverse will change the language of suggested terms!
Configuration to add external controlled vocabularies
Pull Request to Dataverse core https://github.com/IQSS/dataverse/pull/7712
Javascript interface
CV interface implemented as
Javascript and placed outside of
Dataverse application.
internal:
“js-url”: “/resources/js/cvoc-interface.js”
External:
“js-url”:
“https://raw.githubusercontent.com/Dans-
labs/semantic-
gateway/main/static/js/interface.js”
Example of the CV configuration in Dataverse
Configuration in plugable JavaScript:
● Field cvocDemo connected to “unesco”
controlled vocabulary hosted by
Skosmos
● 4 languages available (en, fr, es, ru)
● js-url pointing to javascript gateway to
read and transform output from
external API endpoint
● every Skosmos concept cached
internally in Dataverse to increase the
sustainability
We created Semantic Gateway as plugin app
Source: Dataverse gateway
Semantic Gateway for Skosmos and NDE
Suggestions for the usage of FAIR CVs
● Dutch Digital Heritage Network https://netwerkdigitaalerfgoed.nl
● Skosmos instances, for example, https://bartoc-skosmos.unibas.ch/en/
Skosmos client to access vocabularies https://pypi.org/project/skosmos-client/
● ORCID API to link CMDI records to identifiers of researchers
https://info.orcid.org
● CESSDA CV Service https://vocabularies.cessda.eu
More are coming! https://github.com/CLARIAH/awesome-humanities-
ontologies
Known issues with support of external CVs
● how CV support could be applied to any field
● support and ownership available vocabularies
● backward compatibility with fields from the old metadata schema
● clean UI experience (one selection can fill 1, 2 or 4 child fields)
● can we use non-managed vocabularies or free-text values in same field
● concept drift (the change of meaning of concepts)
● interoperability across all Dataverse instances
● how to ensure CVs are coming from authoritative services
Future plans
● Dataverse will be offered as an easy to install and maintain “archive in the
box” solution available for all data providers
● External controlled vocabularies will be available out-of-the-box and will be
included within CESSDA Metadata Schema (CMM) and CLARIN CMDI
● Dataverse administrators should be able to turn on external CV support for
any specific metadata field
● The same functionality will be implemented on the datafiles level to get
variables linked to external CVs
Future plans: linking data (files) to external CVs
Source: Scholars Portal’ Data Curation Tool (Canada)
Questions?
Slava Tykhonov (DANS-KNAW)
vyacheslav.tykhonov@dans.knaw.nl
References:
Dataverse 5.7 https://github.com/IQSS/dataverse/releases/tag/v5.7
Semantic Gateway: https://github.com/Dans-labs/semantic-gateway
SSHOC task 5.2 http://github.com/SSHOC

More Related Content

PPTX
CLARIN CMDI use case and flexible metadata schemes
vty
 
PPTX
CLARIAH CMDI use case and flexible metadata schemes
Vyacheslav Tykhonov
 
PPTX
CLARIN CMDI support in Dataverse
vty
 
PPTX
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
vty
 
PPTX
Technical integration of data repositories status and challenges
vty
 
PPTX
Building COVID-19 Knowledge Graph at CoronaWhy
vty
 
PPTX
Flexible metadata schemes for research data repositories - Clarin Conference...
Vyacheslav Tykhonov
 
PPTX
Ontologies, controlled vocabularies and Dataverse
vty
 
CLARIN CMDI use case and flexible metadata schemes
vty
 
CLARIAH CMDI use case and flexible metadata schemes
Vyacheslav Tykhonov
 
CLARIN CMDI support in Dataverse
vty
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
vty
 
Technical integration of data repositories status and challenges
vty
 
Building COVID-19 Knowledge Graph at CoronaWhy
vty
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Vyacheslav Tykhonov
 
Ontologies, controlled vocabularies and Dataverse
vty
 

What's hot (20)

PPTX
Setting up Dataverse repository for research data
vty
 
PPTX
Metaverse for Dataverse
vty
 
PPTX
5 years of Dataverse evolution
vty
 
PPTX
Controlled vocabularies and ontologies in Dataverse data repository
vty
 
PPTX
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
vty
 
PDF
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Andrea Scharnhorst
 
PPTX
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
vty
 
PPTX
Running Dataverse repository in the European Open Science Cloud (EOSC)
vty
 
PPTX
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
vty
 
PPTX
The world of Docker and Kubernetes
vty
 
PPTX
Building COVID-19 Museum as Open Science Project
vty
 
PPTX
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
vty
 
PDF
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
Gezim Sejdiu
 
PPT
DCMI Keynote: Bridging the Semantic Gaps and Interoperability
Mike Bergman
 
PDF
Data analysis in dataverse & visualization of datasets on historical maps
vty
 
PDF
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
semanticsconference
 
PDF
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Gezim Sejdiu
 
PPTX
Semantic Mapping in CLARIN Component Metadata.
Menzo Windhouwer
 
PPT
Open Archives Initiative Object Reuse and Exchange
lagoze
 
PDF
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE
 
Setting up Dataverse repository for research data
vty
 
Metaverse for Dataverse
vty
 
5 years of Dataverse evolution
vty
 
Controlled vocabularies and ontologies in Dataverse data repository
vty
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
vty
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Andrea Scharnhorst
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
vty
 
Running Dataverse repository in the European Open Science Cloud (EOSC)
vty
 
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
vty
 
The world of Docker and Kubernetes
vty
 
Building COVID-19 Museum as Open Science Project
vty
 
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
vty
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
Gezim Sejdiu
 
DCMI Keynote: Bridging the Semantic Gaps and Interoperability
Mike Bergman
 
Data analysis in dataverse & visualization of datasets on historical maps
vty
 
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
semanticsconference
 
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Gezim Sejdiu
 
Semantic Mapping in CLARIN Component Metadata.
Menzo Windhouwer
 
Open Archives Initiative Object Reuse and Exchange
lagoze
 
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE
 
Ad

Similar to External CV support in Dataverse 5.7 (20)

PPTX
External controlled vocabularies support in Dataverse
vty
 
PPTX
Flexible metadata schemes for research data repositories - CLARIN Conference'21
vty
 
PPTX
Building an electronic repository and archives on Dataverse in the European O...
vty
 
PPTX
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Péter Király
 
PPTX
Dataverse repository for research data in the COVID-19 Museum
vty
 
PPTX
DataverseEU: Building Multilingual infrastructure for the Social Sciences in...
vty
 
PDF
Dataverse opportunities
vty
 
PPTX
Decentralised identifiers and knowledge graphs
vty
 
PPTX
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
Andre Freitas
 
ODP
RDA-DCAM and Application Profiles
Mikael Nilsson
 
PPTX
Data standardization process for social sciences and humanities
vty
 
PPTX
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Mark Wilkinson
 
PPTX
SSHOC Dataverse in the European Open Science Cloud
vty
 
PDF
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...
MakoLab SA
 
PDF
Industry Ontologies: Case Studies in Creating and Extending Schema.org
sopekmir
 
PDF
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
datascienceiqss
 
PPTX
How to Describe a Dataset. Interoperability Issues, by Valeria Pesce
AIMS (Agricultural Information Management Standards)
 
PPTX
How to describe a dataset. Interoperability issues
Valeria Pesce
 
PDF
Property graph vs. RDF Triplestore comparison in 2020
Ontotext
 
PDF
DataCite and its Members: Connecting Research and Identifying Knowledge
ETH-Bibliothek
 
External controlled vocabularies support in Dataverse
vty
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
vty
 
Building an electronic repository and archives on Dataverse in the European O...
vty
 
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Péter Király
 
Dataverse repository for research data in the COVID-19 Museum
vty
 
DataverseEU: Building Multilingual infrastructure for the Social Sciences in...
vty
 
Dataverse opportunities
vty
 
Decentralised identifiers and knowledge graphs
vty
 
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
Andre Freitas
 
RDA-DCAM and Application Profiles
Mikael Nilsson
 
Data standardization process for social sciences and humanities
vty
 
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Mark Wilkinson
 
SSHOC Dataverse in the European Open Science Cloud
vty
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...
MakoLab SA
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org
sopekmir
 
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
datascienceiqss
 
How to Describe a Dataset. Interoperability Issues, by Valeria Pesce
AIMS (Agricultural Information Management Standards)
 
How to describe a dataset. Interoperability issues
Valeria Pesce
 
Property graph vs. RDF Triplestore comparison in 2020
Ontotext
 
DataCite and its Members: Connecting Research and Identifying Knowledge
ETH-Bibliothek
 
Ad

More from vty (6)

PPTX
Decentralisation and knowledge graphs
vty
 
PPTX
Decentralised identifiers for CLARIAH infrastructure
vty
 
PPTX
Fighting COVID-19 with Artificial Intelligence
vty
 
PPTX
Dataverse in the European Open Science Cloud
vty
 
PPTX
Development in Dataverse SSHOC project
vty
 
PPTX
DataverseEU as multilingual repository
vty
 
Decentralisation and knowledge graphs
vty
 
Decentralised identifiers for CLARIAH infrastructure
vty
 
Fighting COVID-19 with Artificial Intelligence
vty
 
Dataverse in the European Open Science Cloud
vty
 
Development in Dataverse SSHOC project
vty
 
DataverseEU as multilingual repository
vty
 

Recently uploaded (20)

PPTX
Feeding stratagey for climate change dairy animals.
Dr.Zulfy haq
 
PDF
Challenges of Transpiling Smalltalk to JavaScript
ESUG
 
PPTX
Hericium erinaceus, also known as lion's mane mushroom
TinaDadkhah1
 
PPTX
Brain_stem_Medulla oblongata_functions of pons_mid brain
muralinath2
 
PPT
1. Basic Principles of Medical Microbiology Part 1.ppt
separatedwalk
 
PDF
Renewable Energy Resources (Solar, Wind, Nuclear, Geothermal) Presentation
RimshaNaeem23
 
PPTX
Evolution of diet breadth in herbivorus insects.pptx
Mr. Suresh R. Jambagi
 
PPT
1a. Basic Principles of Medical Microbiology Part 2 [Autosaved].ppt
separatedwalk
 
PDF
NSF-DOE Vera C. Rubin Observatory Observations of Interstellar Comet 3I/ATLAS...
Sérgio Sacani
 
PPTX
Unit 4 - Astronomy and Astrophysics - Milky Way And External Galaxies
RDhivya6
 
PDF
Migrating Katalon Studio Tests to Playwright with Model Driven Engineering
ESUG
 
PPTX
fghvqwhfugqaifbiqufbiquvbfuqvfuqyvfqvfouiqvfq
PERMISONJERWIN
 
PPTX
Internal Capsule_Divisions_fibres_lesions
muralinath2
 
PPTX
Home Garden as a Component of Agroforestry system : A survey-based Study
AkhangshaRoy
 
PPTX
Nanofertilizer: Its potential benefits and associated challenges.pptx
BikramjitDeuri
 
PDF
JADESreveals a large population of low mass black holes at high redshift
Sérgio Sacani
 
PDF
High-definition imaging of a filamentary connection between a close quasar pa...
Sérgio Sacani
 
PDF
study of microbiologically influenced corrosion of 2205 duplex stainless stee...
ahmadfreak180
 
PDF
Approximating manifold orbits by means of Machine Learning Techniques
Esther Barrabés Vera
 
PDF
Identification of unnecessary object allocations using static escape analysis
ESUG
 
Feeding stratagey for climate change dairy animals.
Dr.Zulfy haq
 
Challenges of Transpiling Smalltalk to JavaScript
ESUG
 
Hericium erinaceus, also known as lion's mane mushroom
TinaDadkhah1
 
Brain_stem_Medulla oblongata_functions of pons_mid brain
muralinath2
 
1. Basic Principles of Medical Microbiology Part 1.ppt
separatedwalk
 
Renewable Energy Resources (Solar, Wind, Nuclear, Geothermal) Presentation
RimshaNaeem23
 
Evolution of diet breadth in herbivorus insects.pptx
Mr. Suresh R. Jambagi
 
1a. Basic Principles of Medical Microbiology Part 2 [Autosaved].ppt
separatedwalk
 
NSF-DOE Vera C. Rubin Observatory Observations of Interstellar Comet 3I/ATLAS...
Sérgio Sacani
 
Unit 4 - Astronomy and Astrophysics - Milky Way And External Galaxies
RDhivya6
 
Migrating Katalon Studio Tests to Playwright with Model Driven Engineering
ESUG
 
fghvqwhfugqaifbiqufbiquvbfuqvfuqyvfqvfouiqvfq
PERMISONJERWIN
 
Internal Capsule_Divisions_fibres_lesions
muralinath2
 
Home Garden as a Component of Agroforestry system : A survey-based Study
AkhangshaRoy
 
Nanofertilizer: Its potential benefits and associated challenges.pptx
BikramjitDeuri
 
JADESreveals a large population of low mass black holes at high redshift
Sérgio Sacani
 
High-definition imaging of a filamentary connection between a close quasar pa...
Sérgio Sacani
 
study of microbiologically influenced corrosion of 2205 duplex stainless stee...
ahmadfreak180
 
Approximating manifold orbits by means of Machine Learning Techniques
Esther Barrabés Vera
 
Identification of unnecessary object allocations using static escape analysis
ESUG
 

External CV support in Dataverse 5.7

  • 1. CV support in the next Dataverse release Slava Tykhonov lead software engineer DANS-KNAW R&D CESSDA Tools Open Hour: Dataverse, 18.11.2021
  • 2. DANS Data Stations - Future DANS Data Services Dataverse is API based data platform and a key framework for Open Innovation!
  • 3. FAIR and Dataverse Source: Mercè Crosas, “FAIR principles and beyond: implementation in Dataverse”
  • 4. Out of the box CV support in Dataverse (1) Source: Dataverse Metadata Schema
  • 5. Out of the box CV support in Dataverse (2) Internal vocabularies are stored in Dataverse, we need more CVs!
  • 6. Semantic interoperability on the infrastructure level Dataverse Semantic API in release 5.6: https://github.com/IQSS/dataverse/releases/tag/v5.6 “Dataset metadata can be retrieved, set, and updated using a new, flatter JSON-LD format - following the format of an OAI-ORE export (RDA-conformant Bags), allowing for easier transfer of metadata to/from other systems (i.e. without needing to know Dataverse's metadata block and field storage architecture). This new API also allows for the update of terms metadata“. External controlled vocabularies support is being developed by DANS in SSHOC project and already integrated in Dataverse core in the release 5.7. Proposal: https://docs.google.com/document/d/1txdcFuxskRx_tLsDQ7KKLFTMR_r9IBhorDu3V_r445w/ Interfaces: http://github.com/gdcc/dataverse-external-vocab-support Integrations: Wikidata, ORCID, MeSH, Skosmos vocabularies
  • 7. Building block: Skosmos to host ontologies 7 ● SKOSMOS is developed in Europe by the National Library of Finland (NLF) ● active global user community ● search and browsing interface for SKOS concept ● multilingual vocabularies support ● used for different use cases (publish vocabularies, build discovery systems, vocabulary visualization)
  • 8. Skosmos API with python module pip install skosmos-client
  • 9. SKOSMOS API for GRID ontology 9
  • 10. Dataverse deposit form with connection to ontologies Every field can be linked to the appropriate controlled vocabularies in FAIR way!
  • 11. One metadata field can be linked to many ontologies Language switch in Dataverse will change the language of suggested terms!
  • 12. Configuration to add external controlled vocabularies Pull Request to Dataverse core https://github.com/IQSS/dataverse/pull/7712
  • 13. Javascript interface CV interface implemented as Javascript and placed outside of Dataverse application. internal: “js-url”: “/resources/js/cvoc-interface.js” External: “js-url”: “https://raw.githubusercontent.com/Dans- labs/semantic- gateway/main/static/js/interface.js”
  • 14. Example of the CV configuration in Dataverse Configuration in plugable JavaScript: ● Field cvocDemo connected to “unesco” controlled vocabulary hosted by Skosmos ● 4 languages available (en, fr, es, ru) ● js-url pointing to javascript gateway to read and transform output from external API endpoint ● every Skosmos concept cached internally in Dataverse to increase the sustainability
  • 15. We created Semantic Gateway as plugin app Source: Dataverse gateway
  • 16. Semantic Gateway for Skosmos and NDE
  • 17. Suggestions for the usage of FAIR CVs ● Dutch Digital Heritage Network https://netwerkdigitaalerfgoed.nl ● Skosmos instances, for example, https://bartoc-skosmos.unibas.ch/en/ Skosmos client to access vocabularies https://pypi.org/project/skosmos-client/ ● ORCID API to link CMDI records to identifiers of researchers https://info.orcid.org ● CESSDA CV Service https://vocabularies.cessda.eu More are coming! https://github.com/CLARIAH/awesome-humanities- ontologies
  • 18. Known issues with support of external CVs ● how CV support could be applied to any field ● support and ownership available vocabularies ● backward compatibility with fields from the old metadata schema ● clean UI experience (one selection can fill 1, 2 or 4 child fields) ● can we use non-managed vocabularies or free-text values in same field ● concept drift (the change of meaning of concepts) ● interoperability across all Dataverse instances ● how to ensure CVs are coming from authoritative services
  • 19. Future plans ● Dataverse will be offered as an easy to install and maintain “archive in the box” solution available for all data providers ● External controlled vocabularies will be available out-of-the-box and will be included within CESSDA Metadata Schema (CMM) and CLARIN CMDI ● Dataverse administrators should be able to turn on external CV support for any specific metadata field ● The same functionality will be implemented on the datafiles level to get variables linked to external CVs
  • 20. Future plans: linking data (files) to external CVs Source: Scholars Portal’ Data Curation Tool (Canada)
  • 21. Questions? Slava Tykhonov (DANS-KNAW) [email protected] References: Dataverse 5.7 https://github.com/IQSS/dataverse/releases/tag/v5.7 Semantic Gateway: https://github.com/Dans-labs/semantic-gateway SSHOC task 5.2 http://github.com/SSHOC