SlideShare a Scribd company logo
NISO/DCMI Webinar:
Semantic Mashups Across
Large, Heterogeneous Institutions:
Experiences from the VIVO Service
May 22, 2013
Speaker:
John Fereira,
Senior Programmer/Analyst and
Technology Strategist at Cornell University
http://www.niso.org/news/events/2013/dcmi/vivo
Semantic mashups across
large, heterogeneous
institutions: experiences
from the VIVO service
John Fereira
Cornell University
Overview
• What is VIVO?
• History of VIVO
• High level Overview
• Ingesting Data into VIVO
• Exposing Data in Vivo
What is VIVO?
• VIVO is not an acronym
• A semantic web application that enables the discovery of
research and scholarship across disciplines in an
institution.
• VIVO enables collaboration and understanding across an
institution and among institutions – and not just for
scientists.
• A powerful search/browse functionality for locating people
and information within or across institutions.
What is VIVO?
• An ontology editor. Vivo includes a “vivo” ontology
with can be modified and extended
• An instance editor. Instances of classes such as a
Person, Organization, Event, etc. can be created,
modified, and deleted
• Content can also be brought into VIVO in automated
ways from local systems of record, such as HR,
grants, course, and faculty activity databases, or
from database providers such as publication
aggregators and funding agencies.
What is VIVO?
• VIVO is a content disseminator
• Views of People, Organizations, etc. can be highly
customized
• VIVO provides visualizations such as topic maps, co-
authorship networks
• Open data means other applications can use it
A brief History of VIVO
• 2003 – Vivo created for local use at Cornell University
for life sciences collaboration
• 2007 - Reimplemented using RDF, OWL, Jena and
SPARQL
• 2007 – Implemented at Cornell and University of
Florida as “production” systems
A brief History of VIVO
• 2009 - seven institutions received $12.2 million in
funding from the National Center for Research
Resources of the NIH to enable a national network of
scientists
• 2010 – Version 1.0 released as open source
• 2013 – Now at version 1.5.1
• 2013 – Transitioning from funded project to a
sustainable community open source project
A high level Overview
• Core ideas
• Searching/browsing
• Self editing
Core ideas
• Research and researchers should be discoverable
independently of administrative hierarchies
• Relationships are as interesting as the facts
• It’s the network, not just the nodes
• Static data models are too confining
• Granular data management allows multiple views and
re-purposing
• Discovery is improved by linking pages to surrounding
context
VIVO and Linked Open Data
• VIVO enables authoritative data about researchers to become
part of the Linked Open Data (LOD) cloud
Tim Berners-Lee, http://www.w3.org/2009/Talks/0204-ted-tbl
Linked Data principles
Tim Berners-Lee:
▫ Use URIs as names for things
▫ Use HTTP URIs so that people can look up those names
▫ When someone looks up a URI, provide useful
information, using the standards (RDF, SPARQL)
▫ Include links to other URIs so that people can discover
more things
http://linkeddata.org
VIVO in the LOD cloud
Searching and Browsing
• Triple store indexed into a SOLR instance
• Searches are against SOLR
• Instance data comes from triplestore
• An example…
Food security
 NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service
 NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service
 NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service
 NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service
Self Editing
• Users can edit their own profile
• System can delegate editing to “proxy” editors
• Some data can be locked
• An example
Editable and non-editable fields
Most text fields support “rich text”
External Concepts for “terms”
Data Ingest (harvesting)
VIVO harvests much of its data automatically from
verified sources
•Reduces the need for manual input of data
•Provides an integrated and flexible source of publicly
visible data at an institutional level
Data, data, data
Individuals may also edit and customize their profiles to
suit their professional needs
External data
sources
Internal data
sources
Ingesting data with the Vivo Harvester
• A pipeline of tools
• Tools are written java, using Jena APIs
• Can fetch data from a variety of data formats
• Data can be sanitized and disambiguated
• Data is ingested directly to the triple store…does not
require VIVO web app to be running
Harvesting Pipeline
• Fetcher/Parser
• Translate: maps rdf to “vivo” RDF
• Transfer to local triple store (Jena TDB)
• Disambiguate using Scoring/Matching
• Changenamespace (mint unique URIs)
• Diff with previous model to create subtractions
• Transfer to VIVO triple store
Fetching and Parsing
• Fetches data from a URL, Database, local file
• Many different types of fetchers
▫ CSV fetcher
▫ JDBC fetcher
▫ SimpleXMLFetcher
▫ JSONFetcher
• Output is intermediate RDF Format, one file per
record
• “Fake” namespace used
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:node-person="http://vivo.example.com/harvest/aims_users/fields/person/"
xml:base="http://vivo.example.com/harvest/aims_users/person">
<rdf:Description rdf:ID="node_-_0">
<rdf:type rdf:resource="http://vivo.example.com/harvest/aims_users/types#person"/>
<node-person:Picture>http://aims.fao.org/sites/default/files/profiles/profile_image_108074.jpg</node-person:Picture>
<node-person:Website>http://www.valeriapesce.name</node-person:Website>
<node-person:Nid>108074</node-person:Nid>
<node-person:Profile>In the last six years at the Global Forum on Agricultural research (GFAR) I have worked extensively on
metad
ata standards and protocols for managing and exchanging information between systems, in strict collaboration with the OEKCS
group in
FAO.</node-person:Profile>
<node-person:Organization>Food and Agriculture Organization of the United Nations (FAO)</node-person:Organization>
<node-person:Expertise>Information management tools, information systems, information architectures</node-person:Expertise>
<node-person:LastName>Pesce</node-person:LastName>
<node-person:Country>Italy</node-person:Country>
<node-person:Email>valeria.pesce@fao.org</node-person:Email>
<node-person:geolocation>http://aims.fao.org/aos/geopolitical.owl#Italy</node-person:geolocation>
<node-person:Profile_URL>http://aims.fao.org/node/108074</node-person:Profile_URL>
<node-person:Username>valeria.pesce</node-person:Username>
<node-person:FirstName>Valeria</node-person:FirstName>
<node-person:Role>Information Management Specialist</node-person:Role>
<node-person:Interests>agINFRA, AgriDrupal, AgriFeeds, AgriVIVO, authority control, automatic indexing, CIARD Content
Management
Task Force, CIARD RING, cloud services, CMS - Content Management Systems, data exchange, Drupal, IAALD - International
Association of
Agricultural Information Specialists, information management, institutional repository software, interoperability, Linked Open Data
- LOD, RDF - Resource Description Framework, Semantic Web</node-person:Interests>
</rdf:Description>
</rdf:RDF>
Translate
• Map “fake” namespace to VIVO classes and
properties
• Uses XSLT transform
• Unique ID for each record
• node-person:Organization becomes
foaf:Organization
• Relationships created
Translated RDF
<rdf:Description rdf:about="http://vivo.example.com/harvest/aims_users/person/uid-108074">
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
<rdfs:label>Pesce, Valeria</rdfs:label>
<core:currentMemberOf rdf:resource="http://vivo.example.com/harvest/aims_users/org/aims"/>
<foaf:firstName>Valeria</foaf:firstName>
<foaf:lastName>Pesce</foaf:lastName>
<core:primaryEmail>valeria.pesce@fao.org</core:primaryEmail>
<core:positionInOrganization
rdf:resource="http://vivo.example.com/harvest/aims_users/org/Food%20and%20Agriculture%20Organization%20of%20the%20
United%20Nations%20(FAO)"/>
</rdf:Description>
<rdf:Description
rdf:about="http://vivo.example.com/harvest/aims_users/org/Food%20and%20Agriculture%20Organization%20of%20the%20Uni
ted%20Nations%20(FAO)">
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Organization"/>
<rdfs:label>Food and Agriculture Organization of the United Nations (FAO)</rdfs:label>
<core:organizationForPosition
rdf:resource="http://vivo.example.com/harvest/aims_users/position/positionFor108074inFood%20and%20Agriculture%20Organ
ization%20of%20the%20United%20Nations%20(FAO)"/>
<core:hasGeographicLocation rdf:resource="http://aims.fao.org/aos/geopolitical.owl#Italy"/>
</rdf:Description>
Transfer
• Load RDF into TDB triplestore
• Duplicate URIs are not loaded
• Further operations are made in the triple store
Scoring/Match
• Disambiguates People, Organizations, etc. based
upon property values
• Supports Equality, NameCompare,
NormalizedLevenshteinDifference, Soundex
algorithms
• Each property is weighted
▫ firstName: 0.5
▫ lastName: 0.5
▫ Email: 1.0
• MatchThreshHold: 1.0
Matching
• Determines what should be done with a record
which matches another record based upon it’s
“score”
▫ Replace old record
▫ Merge records
▫ Ignore record
ChangeNameSpace
• Match old namespace pattern in configuration file
http://vivo.example.com/harvest/aims_users/person/
• Specify namespace in VIVO
http://agrivivodev.mannlib.cornell.edu/vivo/individual/
• Mint a new URI in the vivo namespace
http://agrivivodev.mannlib.cornell.edu/vivo/individual/n123456
Diff of previous harvest
• Compare TDB model with previous harvest
• Generate vivo-additions.rdf
• Generate vivo-substractions.rdf
Final Transfer
• Load vivo-subtractions.rdf file into SDB
• Load vivo-additions.rdf file into SDB
Data Ingest alternatives
• Karma: an information integration tool which
provides a GUI for modeling data into an ontology
• Google Refine: Good for one time ingests and has a
VIVO RDF plugin
• VIVO admin tools can load RDF
Exposing Data in VIVO
• Vivo web pages
• View data as RDF
• Query a Sparql Endpoint and transform results
• Drupal front end
Default VIVO theme
Cornell VIVO
Griffiths University
Melbourne Find an Expert
Visualization
• Completed Work
▫ Co-Author visualization
▫ Sparklines
▫ VIVO world activity map
 NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service
VIVO 1.0 source code was publicly released on April 14, 2010
87 downloads by June 11, 2010. 917 downloads on July 16, 2o10.
The more institutions adopt VIVO, the more high quality data will be available to understand, navigate,
manage, utilize, and communicate progress in science and technology.
06/2010
View RDF from profile page
Requesting RDF using an Accept Header
• curl -H "Accept: application/rdf+xml" -X GET
http://vivo.ufl.edu/display/n25562
Retrieving data with SPARQL
• Fuseki sparql endpoint installed (not included)
• Callable with a SPARQL Client
• Semantic Services
▫ Manages custom sparql queries
▫ Exposes URL for external sites
▫ Can ask for output as html, xml, json
Semantic Services application
Hector Abruna in VIVO
Hector Abruna on Chemistry Site
Viewing VIVO data with Drupal
• Import data with Feeds module and Linked Data
Importer
• Examples
Cals Impact Statements
Agrivivo Home Page
Agrivivo map page
AgriVivo
VivoSearch: search across multiple
vivo sites
Vivo SearchLight bookmarklet
Vivo Searchlight
Some Links
• Vivoweb
▫ http://vivoweb.org
• Vivoweb on Sourceforge
▫ http://www.sourceforge.net/projects/vivo
• VivoSearch
▫ http://vivosearch.org
• Vivo Wiki on Duraspace
▫ https://wiki.duraspace.org/display/VIVO
• Mailing Lists
▫ http://sourceforge.net/p/vivo/sfx-list/
Thank you
NISO/DCMI Webinar
Semantic Mashups Across Large, Heterogeneous
Institutions: Experiences from the VIVO Service
NISO/DCMI Webinar • May 22, 2013
Questions?
All questions will be posted with presenter answers on
the NISO website following the webinar:
http://www.niso.org/news/events/2013/dcmi/vivo
Thank you for joining us today.
Please take a moment to fill out the brief online survey.
We look forward to hearing from you!
THANK YOU

More Related Content

PPTX
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
PPTX
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
PPTX
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
PPTX
April 24, 2013 NISO/DCMI Webinar: Deployment of RDA (Resource Description and...
PDF
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
PPTX
NISO/DCMI Webinar: Metadata for Public Sector Administration
PDF
NISO DCMI Webinar bibframe-20130123
PPTX
NISO/DCMI Webinar: Metadata for Managing Scientific Research Data
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
April 24, 2013 NISO/DCMI Webinar: Deployment of RDA (Resource Description and...
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
NISO/DCMI Webinar: Metadata for Public Sector Administration
NISO DCMI Webinar bibframe-20130123
NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

What's hot (20)

PPT
Metadata Training for Staff and Librarians for the New Data Environment
PPTX
Usage of Linked Data: Introduction and Application Scenarios
PPTX
It19 20140721 linked data personal perspective
PDF
Metadata Workshop
PPTX
Contributing to the Smart City Through Linked Library Data
PPT
Library Linked Data and the Future of Bibliographic Control
PDF
Better Search With Structured Knowledge
PPTX
Lecture linked data cloud & sparql
PPT
Corrib.org - OpenSource and Research
PPTX
Scaling up Linked Data
PPTX
NISO Webinar: Library Linked Data: From Vision to Reality
PPTX
Linked Open Data and Digital Curation (Islandora)
PDF
4.2.15 Slides, “Hydra: many heads, many connections. Enriching Fedora Reposit...
PDF
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
PPTX
Best Practices for Descriptive Metadata for Web Archiving
PPTX
Multilingual presentation ifla 2013 08-19
KEY
Snac webinar v3
PDF
Islandora and Linked Open Data
PPTX
NISO Webinar: Back From the Endangered List: Using Authority Data to Enhance ...
Metadata Training for Staff and Librarians for the New Data Environment
Usage of Linked Data: Introduction and Application Scenarios
It19 20140721 linked data personal perspective
Metadata Workshop
Contributing to the Smart City Through Linked Library Data
Library Linked Data and the Future of Bibliographic Control
Better Search With Structured Knowledge
Lecture linked data cloud & sparql
Corrib.org - OpenSource and Research
Scaling up Linked Data
NISO Webinar: Library Linked Data: From Vision to Reality
Linked Open Data and Digital Curation (Islandora)
4.2.15 Slides, “Hydra: many heads, many connections. Enriching Fedora Reposit...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
Best Practices for Descriptive Metadata for Web Archiving
Multilingual presentation ifla 2013 08-19
Snac webinar v3
Islandora and Linked Open Data
NISO Webinar: Back From the Endangered List: Using Authority Data to Enhance ...
Ad

Similar to NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service (20)

PPT
5-14-13 An Introduction to VIVO Presentation Slides
PDF
Kristi Holmes. A bird’s-eye view of scholarship at the individual, institutio...
PPTX
VIVO at the University of Idaho
PPT
AgriVIVO: A Global Ontology-Driven RDF Store Based on a Distributed Architect...
PPTX
Alamw15 VIVO
PPTX
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
PPTX
Charleston Conference: VIVO, libraries, and users.
PPT
VIVO: enabling the discovery of research and scholarship
PDF
Javed - VIVO: Community Driven RIM
PDF
VIVO: A Community-driven Research Information Management System: Challenges a...
PPTX
Vivo; Discovery; Profile Management; Data management
PDF
#ALAAC15 Linked Data Love
PDF
VIVO for visualization and analysis
PDF
6-11-13 VIVO Technical Deep Dive Presentation Slides
PPTX
5-pln-1520-Conlon
PPTX
2.24.16 Slides, “VIVO plus SHARE: Closing the Loop on Tracking Scholarly Acti...
PPTX
Using Taxonomies to Create People Directories and Author Networks
PPTX
VIVO Team Builder - VIVO conference 2014
PPTX
PDF
An Introduction to VIVO
5-14-13 An Introduction to VIVO Presentation Slides
Kristi Holmes. A bird’s-eye view of scholarship at the individual, institutio...
VIVO at the University of Idaho
AgriVIVO: A Global Ontology-Driven RDF Store Based on a Distributed Architect...
Alamw15 VIVO
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Charleston Conference: VIVO, libraries, and users.
VIVO: enabling the discovery of research and scholarship
Javed - VIVO: Community Driven RIM
VIVO: A Community-driven Research Information Management System: Challenges a...
Vivo; Discovery; Profile Management; Data management
#ALAAC15 Linked Data Love
VIVO for visualization and analysis
6-11-13 VIVO Technical Deep Dive Presentation Slides
5-pln-1520-Conlon
2.24.16 Slides, “VIVO plus SHARE: Closing the Loop on Tracking Scholarly Acti...
Using Taxonomies to Create People Directories and Author Networks
VIVO Team Builder - VIVO conference 2014
An Introduction to VIVO
Ad

More from National Information Standards Organization (NISO) (20)

PPTX
Larry Bennett_ ALA Annual Convention 2025AL2 slides.pptx
PPTX
Potash "Our Journey & Vision for Accessible Content"
PPTX
O'Leary "Progress Assessment - How Far Are We from Delivery"
PPTX
Carpenter and O'Leary "Accessibility Standards and the Future of Inclusive Pu...
PPTX
Davidian "Transfer Code of Practice Standing Committee Update"
PPTX
Patham "NISO Open Discovery Initiative (ODI) Update"
PPTX
Hichliffe "A Standard Terminology for Peer Review"
PPTX
Levin "KBART RP Update at ALA Annual 2025"
PPTX
Carpenter "Advancing Infrastructure for Sustainable Collections: CCLP Project...
PPTX
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
PPTX
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
PDF
Carpenter "2025 NISO Annual Members Meeting"
PPTX
Allen "Social Marketing in Scholarly Communications"
PPTX
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
PDF
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
PDF
Pfeiffer "Secrets to Changing Behavior in Scholarly Communication: A 2025 NIS...
PPTX
Gilstrap "Accessibility Essentials: A 2025 NISO Training Series, Session 7, M...
PPTX
Turner "Accessibility Essentials: A 2025 NISO Training Series, Session 7, Lan...
PPTX
Comeford "Accessibility Essentials: A 2025 NISO Training Series, Session 7, A...
PPTX
Laverick and Richard "Accessibility Essentials: A 2025 NISO Training Series, ...
Larry Bennett_ ALA Annual Convention 2025AL2 slides.pptx
Potash "Our Journey & Vision for Accessible Content"
O'Leary "Progress Assessment - How Far Are We from Delivery"
Carpenter and O'Leary "Accessibility Standards and the Future of Inclusive Pu...
Davidian "Transfer Code of Practice Standing Committee Update"
Patham "NISO Open Discovery Initiative (ODI) Update"
Hichliffe "A Standard Terminology for Peer Review"
Levin "KBART RP Update at ALA Annual 2025"
Carpenter "Advancing Infrastructure for Sustainable Collections: CCLP Project...
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Carpenter "2025 NISO Annual Members Meeting"
Allen "Social Marketing in Scholarly Communications"
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Pfeiffer "Secrets to Changing Behavior in Scholarly Communication: A 2025 NIS...
Gilstrap "Accessibility Essentials: A 2025 NISO Training Series, Session 7, M...
Turner "Accessibility Essentials: A 2025 NISO Training Series, Session 7, Lan...
Comeford "Accessibility Essentials: A 2025 NISO Training Series, Session 7, A...
Laverick and Richard "Accessibility Essentials: A 2025 NISO Training Series, ...

Recently uploaded (20)

PPTX
Introduction and Scope of Bichemistry.pptx
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
PSYCHOLOGY IN EDUCATION.pdf ( nice pdf ...)
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
Open Quiz Monsoon Mind Game Prelims.pptx
PPTX
Open Quiz Monsoon Mind Game Final Set.pptx
PDF
Pre independence Education in Inndia.pdf
PPTX
COMPUTERS AS DATA ANALYSIS IN PRECLINICAL DEVELOPMENT.pptx
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Pharma ospi slides which help in ospi learning
PDF
O7-L3 Supply Chain Operations - ICLT Program
Introduction and Scope of Bichemistry.pptx
2.FourierTransform-ShortQuestionswithAnswers.pdf
TR - Agricultural Crops Production NC III.pdf
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
STATICS OF THE RIGID BODIES Hibbelers.pdf
PSYCHOLOGY IN EDUCATION.pdf ( nice pdf ...)
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
Renaissance Architecture: A Journey from Faith to Humanism
Open Quiz Monsoon Mind Game Prelims.pptx
Open Quiz Monsoon Mind Game Final Set.pptx
Pre independence Education in Inndia.pdf
COMPUTERS AS DATA ANALYSIS IN PRECLINICAL DEVELOPMENT.pptx
O5-L3 Freight Transport Ops (International) V1.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
102 student loan defaulters named and shamed – Is someone you know on the list?
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Pharma ospi slides which help in ospi learning
O7-L3 Supply Chain Operations - ICLT Program

NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service

  • 1. NISO/DCMI Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service May 22, 2013 Speaker: John Fereira, Senior Programmer/Analyst and Technology Strategist at Cornell University http://www.niso.org/news/events/2013/dcmi/vivo
  • 2. Semantic mashups across large, heterogeneous institutions: experiences from the VIVO service John Fereira Cornell University
  • 3. Overview • What is VIVO? • History of VIVO • High level Overview • Ingesting Data into VIVO • Exposing Data in Vivo
  • 4. What is VIVO? • VIVO is not an acronym • A semantic web application that enables the discovery of research and scholarship across disciplines in an institution. • VIVO enables collaboration and understanding across an institution and among institutions – and not just for scientists. • A powerful search/browse functionality for locating people and information within or across institutions.
  • 5. What is VIVO? • An ontology editor. Vivo includes a “vivo” ontology with can be modified and extended • An instance editor. Instances of classes such as a Person, Organization, Event, etc. can be created, modified, and deleted • Content can also be brought into VIVO in automated ways from local systems of record, such as HR, grants, course, and faculty activity databases, or from database providers such as publication aggregators and funding agencies.
  • 6. What is VIVO? • VIVO is a content disseminator • Views of People, Organizations, etc. can be highly customized • VIVO provides visualizations such as topic maps, co- authorship networks • Open data means other applications can use it
  • 7. A brief History of VIVO • 2003 – Vivo created for local use at Cornell University for life sciences collaboration • 2007 - Reimplemented using RDF, OWL, Jena and SPARQL • 2007 – Implemented at Cornell and University of Florida as “production” systems
  • 8. A brief History of VIVO • 2009 - seven institutions received $12.2 million in funding from the National Center for Research Resources of the NIH to enable a national network of scientists • 2010 – Version 1.0 released as open source • 2013 – Now at version 1.5.1 • 2013 – Transitioning from funded project to a sustainable community open source project
  • 9. A high level Overview • Core ideas • Searching/browsing • Self editing
  • 10. Core ideas • Research and researchers should be discoverable independently of administrative hierarchies • Relationships are as interesting as the facts • It’s the network, not just the nodes • Static data models are too confining • Granular data management allows multiple views and re-purposing • Discovery is improved by linking pages to surrounding context
  • 11. VIVO and Linked Open Data • VIVO enables authoritative data about researchers to become part of the Linked Open Data (LOD) cloud Tim Berners-Lee, http://www.w3.org/2009/Talks/0204-ted-tbl
  • 12. Linked Data principles Tim Berners-Lee: ▫ Use URIs as names for things ▫ Use HTTP URIs so that people can look up those names ▫ When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) ▫ Include links to other URIs so that people can discover more things http://linkeddata.org
  • 13. VIVO in the LOD cloud
  • 14. Searching and Browsing • Triple store indexed into a SOLR instance • Searches are against SOLR • Instance data comes from triplestore • An example…
  • 20. Self Editing • Users can edit their own profile • System can delegate editing to “proxy” editors • Some data can be locked • An example
  • 22. Most text fields support “rich text”
  • 23. External Concepts for “terms”
  • 25. VIVO harvests much of its data automatically from verified sources •Reduces the need for manual input of data •Provides an integrated and flexible source of publicly visible data at an institutional level Data, data, data Individuals may also edit and customize their profiles to suit their professional needs External data sources Internal data sources
  • 26. Ingesting data with the Vivo Harvester • A pipeline of tools • Tools are written java, using Jena APIs • Can fetch data from a variety of data formats • Data can be sanitized and disambiguated • Data is ingested directly to the triple store…does not require VIVO web app to be running
  • 27. Harvesting Pipeline • Fetcher/Parser • Translate: maps rdf to “vivo” RDF • Transfer to local triple store (Jena TDB) • Disambiguate using Scoring/Matching • Changenamespace (mint unique URIs) • Diff with previous model to create subtractions • Transfer to VIVO triple store
  • 28. Fetching and Parsing • Fetches data from a URL, Database, local file • Many different types of fetchers ▫ CSV fetcher ▫ JDBC fetcher ▫ SimpleXMLFetcher ▫ JSONFetcher • Output is intermediate RDF Format, one file per record • “Fake” namespace used
  • 29. <?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:node-person="http://vivo.example.com/harvest/aims_users/fields/person/" xml:base="http://vivo.example.com/harvest/aims_users/person"> <rdf:Description rdf:ID="node_-_0"> <rdf:type rdf:resource="http://vivo.example.com/harvest/aims_users/types#person"/> <node-person:Picture>http://aims.fao.org/sites/default/files/profiles/profile_image_108074.jpg</node-person:Picture> <node-person:Website>http://www.valeriapesce.name</node-person:Website> <node-person:Nid>108074</node-person:Nid> <node-person:Profile>In the last six years at the Global Forum on Agricultural research (GFAR) I have worked extensively on metad ata standards and protocols for managing and exchanging information between systems, in strict collaboration with the OEKCS group in FAO.</node-person:Profile> <node-person:Organization>Food and Agriculture Organization of the United Nations (FAO)</node-person:Organization> <node-person:Expertise>Information management tools, information systems, information architectures</node-person:Expertise> <node-person:LastName>Pesce</node-person:LastName> <node-person:Country>Italy</node-person:Country> <node-person:Email>[email protected]</node-person:Email> <node-person:geolocation>http://aims.fao.org/aos/geopolitical.owl#Italy</node-person:geolocation> <node-person:Profile_URL>http://aims.fao.org/node/108074</node-person:Profile_URL> <node-person:Username>valeria.pesce</node-person:Username> <node-person:FirstName>Valeria</node-person:FirstName> <node-person:Role>Information Management Specialist</node-person:Role> <node-person:Interests>agINFRA, AgriDrupal, AgriFeeds, AgriVIVO, authority control, automatic indexing, CIARD Content Management Task Force, CIARD RING, cloud services, CMS - Content Management Systems, data exchange, Drupal, IAALD - International Association of Agricultural Information Specialists, information management, institutional repository software, interoperability, Linked Open Data - LOD, RDF - Resource Description Framework, Semantic Web</node-person:Interests> </rdf:Description> </rdf:RDF>
  • 30. Translate • Map “fake” namespace to VIVO classes and properties • Uses XSLT transform • Unique ID for each record • node-person:Organization becomes foaf:Organization • Relationships created
  • 31. Translated RDF <rdf:Description rdf:about="http://vivo.example.com/harvest/aims_users/person/uid-108074"> <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/> <rdfs:label>Pesce, Valeria</rdfs:label> <core:currentMemberOf rdf:resource="http://vivo.example.com/harvest/aims_users/org/aims"/> <foaf:firstName>Valeria</foaf:firstName> <foaf:lastName>Pesce</foaf:lastName> <core:primaryEmail>[email protected]</core:primaryEmail> <core:positionInOrganization rdf:resource="http://vivo.example.com/harvest/aims_users/org/Food%20and%20Agriculture%20Organization%20of%20the%20 United%20Nations%20(FAO)"/> </rdf:Description> <rdf:Description rdf:about="http://vivo.example.com/harvest/aims_users/org/Food%20and%20Agriculture%20Organization%20of%20the%20Uni ted%20Nations%20(FAO)"> <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Organization"/> <rdfs:label>Food and Agriculture Organization of the United Nations (FAO)</rdfs:label> <core:organizationForPosition rdf:resource="http://vivo.example.com/harvest/aims_users/position/positionFor108074inFood%20and%20Agriculture%20Organ ization%20of%20the%20United%20Nations%20(FAO)"/> <core:hasGeographicLocation rdf:resource="http://aims.fao.org/aos/geopolitical.owl#Italy"/> </rdf:Description>
  • 32. Transfer • Load RDF into TDB triplestore • Duplicate URIs are not loaded • Further operations are made in the triple store
  • 33. Scoring/Match • Disambiguates People, Organizations, etc. based upon property values • Supports Equality, NameCompare, NormalizedLevenshteinDifference, Soundex algorithms • Each property is weighted ▫ firstName: 0.5 ▫ lastName: 0.5 ▫ Email: 1.0 • MatchThreshHold: 1.0
  • 34. Matching • Determines what should be done with a record which matches another record based upon it’s “score” ▫ Replace old record ▫ Merge records ▫ Ignore record
  • 35. ChangeNameSpace • Match old namespace pattern in configuration file http://vivo.example.com/harvest/aims_users/person/ • Specify namespace in VIVO http://agrivivodev.mannlib.cornell.edu/vivo/individual/ • Mint a new URI in the vivo namespace http://agrivivodev.mannlib.cornell.edu/vivo/individual/n123456
  • 36. Diff of previous harvest • Compare TDB model with previous harvest • Generate vivo-additions.rdf • Generate vivo-substractions.rdf
  • 37. Final Transfer • Load vivo-subtractions.rdf file into SDB • Load vivo-additions.rdf file into SDB
  • 38. Data Ingest alternatives • Karma: an information integration tool which provides a GUI for modeling data into an ontology • Google Refine: Good for one time ingests and has a VIVO RDF plugin • VIVO admin tools can load RDF
  • 39. Exposing Data in VIVO • Vivo web pages • View data as RDF • Query a Sparql Endpoint and transform results • Drupal front end
  • 44. Visualization • Completed Work ▫ Co-Author visualization ▫ Sparklines ▫ VIVO world activity map
  • 46. VIVO 1.0 source code was publicly released on April 14, 2010 87 downloads by June 11, 2010. 917 downloads on July 16, 2o10. The more institutions adopt VIVO, the more high quality data will be available to understand, navigate, manage, utilize, and communicate progress in science and technology. 06/2010
  • 47. View RDF from profile page
  • 48. Requesting RDF using an Accept Header • curl -H "Accept: application/rdf+xml" -X GET http://vivo.ufl.edu/display/n25562
  • 49. Retrieving data with SPARQL • Fuseki sparql endpoint installed (not included) • Callable with a SPARQL Client • Semantic Services ▫ Manages custom sparql queries ▫ Exposes URL for external sites ▫ Can ask for output as html, xml, json
  • 52. Hector Abruna on Chemistry Site
  • 53. Viewing VIVO data with Drupal • Import data with Feeds module and Linked Data Importer • Examples
  • 58. VivoSearch: search across multiple vivo sites
  • 61. Some Links • Vivoweb ▫ http://vivoweb.org • Vivoweb on Sourceforge ▫ http://www.sourceforge.net/projects/vivo • VivoSearch ▫ http://vivosearch.org • Vivo Wiki on Duraspace ▫ https://wiki.duraspace.org/display/VIVO • Mailing Lists ▫ http://sourceforge.net/p/vivo/sfx-list/
  • 63. NISO/DCMI Webinar Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service NISO/DCMI Webinar • May 22, 2013 Questions? All questions will be posted with presenter answers on the NISO website following the webinar: http://www.niso.org/news/events/2013/dcmi/vivo
  • 64. Thank you for joining us today. Please take a moment to fill out the brief online survey. We look forward to hearing from you! THANK YOU

Editor's Notes

  • #26: Authoritative data, diverse formats, filter out private informationTalk about verified dataTalking points: Much of the data in VIVO profiles is ingested from authoritative sources so it is accurate and current, reducing the need for manual input.Private or sensitive information is never imported into VIVO. Only public information will be stored and displayed. Data is housed and maintained at the local institutions. There it can be updated on a regular basis. There are three ways to get data: internal, external, individuals. Internal is authoritative!The rich information in VIVO profiles can be repurposed and shared with other institutional web pages and consumers, reducing cost and increasing efficiencies across the institution.
  • #46: Co-author visAn at-a-glance view of an individual&apos;s collaboration space. Who do they collaborate with most often? Do they always work with the same people, or do they work with multiple separate communities?Links increase in size and color with more frequent collaboration. Co-authors are clustered into communities. Users can explore the social network by traveling to co-authors pages.
  • #55: Since VIVO stores profile information drawn from a variety of sources in a single, flexible format, it can be easily “re-skinned” or “re-purposed” to present specialized views into the institution.
  • #58: Since VIVO stores profile information drawn from a variety of sources in a single, flexible format, it can be easily “re-skinned” or “re-purposed” to present specialized views into the institution.