RDF Journal Compilation
RDF Journal Compilation
s=rdf
Search Results
Showing 30 articles matching "rdf"
The Semantics of Metadata: Avalon Media System and the Move to RDF
Juliet L. Hardesty and Jennifer B. Young
The Avalon Media System (Avalon) provides access and management for digital audio and video collections in libraries and archives. The
open source project is led by the libraries of Indiana University Bloomington and Northwestern University and is funded in part by grants
from The Andrew W. Mellon Foundation and Institute of Museum and Library Services.
Avalon is based on the Samvera Community (formerly Hydra Project) software stack and uses Fedora as the digital repository back end.
The Avalon project team is in the process of migrating digital repositories from Fedora 3 to Fedora 4 and incorporating metadata statements
using the Resource Description Framework (RDF) instead of XML files accompanying the digital objects in the repository. The Avalon team
has worked on the migration path for technical metadata and is now working on the migration paths for structural metadata (PCDM) and
descriptive metadata (from MODS XML to RDF). This paper covers the decisions made to begin using RDF for software development and
offers a window into how Semantic Web technology functions in the real world.
The strengths and best practices of each tool are discussed in the context of metadata munging use cases for an institution’s metadata
migration workflow. There is a focus on Linked Open Data modeling and transformation applications of each tool, in particular how
metadataists, catalogers, and programmers can create metadata quality reports, enhance existing data with LOD sets, and transform that
data to a RDF model. Integration of these tools with other systems and projects, the use of domain specific transformation languages, and
the expansion of vocabulary reconciliation services are mentioned.
Using Semantic Web Technologies to Collaboratively Collect and Share User-Generated Content in Order to
Enrich the Presentation of Bibliographic Records–Development of a Prototype Based on RDF, D2RQ, Jena,
SPARQL and WorldCat’s FRBRization Web Service
Ragnhild Holgersen, Michael Preminger, David Massey
In this article we present a prototype of a semantic web-based framework for collecting and sharing user-generated content (reviews,
ratings, tags, etc.) across different libraries in order to enrich the presentation of bibliographic records. The user-generated data is
remodeled into RDF, utilizing established linked data ontologies. This is done in a semi-automatic manner utilizing the Jena and the D2RQ-
toolkits. For the remodeling, a SPARQL-construct statement is tailored for each data source. In the data source used in our prototype, user-
generated content is linked to the relevant books via their ISBN. By remodeling the data according to the FRBR model, and expanding the
RDF graph with data returned by WorldCat’s FRBRization web service, we are able to greatly increase the number of entry points to each
book. We make the social content available through a RESTful web service with ISBN as a parameter. The web service returns a graph of
all user-generated data registered to any edition of the book in question in the RDF/XML format. Libraries using our framework would thus
be able to present relevant social content in association with bibliographic records, even if they hold a different version of a book than the
one that was originally accessed by users. Finally, we connect our RDF graph to the linked open data cloud through the use of Talis’
openlibrary.org SPARQL endpoint.
Content Dissemination from Small-scale Museum and Archival Collections: Community Reusable Semantic
1 of 7 7/11/19, 11:49 AM
The Code4Lib Journal – Search Results – rdf http://journal.code4lib.org/?s=rdf
A total of 143 institutions in 23 countries responded to one or more of the surveys. This analysis covers the 104 linked data projects or
services described by the 81 institutions which responded to the 2018 survey—those that publish linked data, consume linked data, or both.
This article provides an overview of the linked data projects or services institutions have implemented or are implementing; what data they
publish and consume; the reasons given for implementing linked data and the barriers encountered; and some advice given by respondents
to those considering implementing a linked data project or service. Differences with previous survey responses are noted, but as the
majority of linked projects and services described are either not yet in production or implemented within the last two years, these differences
may reflect new trends rather than changes in implementations.
FAIR Principles for Library, Archive and Museum Collections: A proposal for standards for reusable collections
Lukas Koster, Saskia Woutersen-Windhouwer
Many heritage institutions would like their collections to be open and reusable but fail to achieve that situation because of organizational,
legal and technological barriers. A set of guidelines and best practices is proposed to facilitate the process of making heritage collections
reusable. These guidelines are based on the FAIR Principles for scholarly output (FAIR data principles [2014]), taking into account a
number of other recent initiatives for making data findable, accessible, interoperable and reusable. The resulting FAIR Principles for
Heritage Library, Archive and Museum Collections focus on three levels: objects, metadata and metadata records. Clarifications and
examples of these proposed principles are presented, as well as recommendations for the assessment of current situations and
implementations of the principles.
Microdata in the IR: A Low-Barrier Approach to Enhancing Discovery of Institutional Repository Materials in
Google
Shayna Pekala
Georgetown University Library curates a multitude of open access resources in its institutional repository and digital collections portal,
DigitalGeorgetown. Over the last several years, the Library has experimented with methods for making these items increasingly visible in
search engine search results. This article describes the Library’s low-barrier approach to applying Schema.org vocabulary to its DSpace
2 of 7 7/11/19, 11:49 AM
The Code4Lib Journal – Search Results – rdf http://journal.code4lib.org/?s=rdf
institutional repository using microdata, as well as the challenges with and strategies used for assessing this work. The effects of the
application of Schema.org microdata to DigitalGeorgetown on Google search results were tracked over time using three different metrics,
providing new insights about its impact.
The Drawings of the Florentine Painters: From Print Catalog to Linked Open Data
Lukas Klic, Matt Miller, Jonathan K. Nelson, Cristina Pattuelli, and Alexandra Provo
The Drawings of The Florentine Painters project created the first online database of Florentine Renaissance drawings by applying Linked
Open Data (LOD) techniques to a foundational text of the same name, first published by Bernard Berenson in 1903 (revised and expanded
editions, 1938 and 1961). The goal was to make Berenson’s catalog information—still an essential information resource today—available in
a machine-readable format, allowing researchers to access the source content through open data services. This paper provides a technical
overview of the methods and processes applied in the conversion of Berenson’s catalog to LOD using the CIDOC-CRM ontology; it also
discusses the different phases of the project, focusing on the challenges and issues of data transformation and publishing. The project was
funded by the Samuel H. Kress Foundation and organized by Villa I Tatti, The Harvard University Center for Italian Renaissance Studies.
Catalog: http://florentinedrawings.itatti.harvard.edu
Data Endpoint: http://data.itatti.harvard.edu
Linked Data is People: Building a Knowledge Graph to Reshape the Library Staff Directory
Jason A. Clark and Scott W. H. Young
One of our greatest library resources is people. Most libraries have staff directory information published on the web, yet most of this data is
trapped in local silos, PDFs, or unstructured HTML markup. With this in mind, the library informatics team at Montana State University
(MSU) Library set a goal of remaking our people pages by connecting the local staff database to the Linked Open Data (LOD) cloud. In
pursuing linked data integration for library staff profiles, we have realized two primary use cases: improving the search engine optimization
(SEO) for people pages and creating network graph visualizations. In this article, we will focus on the code to build this library graph model
as well as the linked data workflows and ontology expressions developed to support it. Existing linked data work has largely centered
around machine-actionable data and improvements for bots or intelligent software agents. Our work demonstrates that connecting your
staff directory to the LOD cloud can reveal relationships among people in dynamic ways, thereby raising staff visibility and bringing an
increased level of understanding and collaboration potential for one of our primary assets: the people that make the library happen.
Recommendations for the application of Schema.org to aggregated Cultural Heritage metadata to increase
relevance and visibility to search engines: the case of Europeana
Richard Wallis, Antoine Isaac, Valentine Charles, and Hugo Manguinhas
Europeana provides access to more than 54 million cultural heritage objects through its portal Europeana Collections. It is crucial for
Europeana to be recognized by search engines as a trusted authoritative repository of cultural heritage objects. Indeed, even though its
portal is the main entry point, most Europeana users come to it via search engines.
Europeana Collections is fuelled by metadata describing cultural objects, represented in the Europeana Data Model (EDM). This paper
presents the research and consequent recommendations for publishing Europeana metadata using the Schema.org vocabulary and best
practices. Schema.org html embedded metadata to be consumed by search engines to power rich services (such as Google Knowledge
Graph). Schema.org is an open and widely adopted initiative (used by over 12 million domains) backed by Google, Bing, Yahoo!, and
Yandex, for sharing metadata across the web It underpins the emergence of new web techniques, such as so called Semantic SEO.
Our research addressed the representation of the embedded metadata as part of the Europeana HTML pages and sitemaps so that the re-
3 of 7 7/11/19, 11:49 AM
The Code4Lib Journal – Search Results – rdf http://journal.code4lib.org/?s=rdf
The practical objective of our work is to produce a Schema.org representation of Europeana resources described in EDM, being the richest
as possible and tailored to Europeana’s realities and user needs as well the search engines and their users.
Outside The Box: Building a Digital Asset Management Ecosystem for Preservation and Access
Andrew Weidner, Sean Watkins, Bethany Scott, Drew Krewer, Anne Washington, Matthew Richardson
The University of Houston (UH) Libraries made an institutional commitment in late 2015 to migrate the data for its digitized cultural heritage
collections to open source systems for preservation and access: Hydra-in-a-Box, Archivematica, and ArchivesSpace. This article describes
the work that the UH Libraries implementation team has completed to date, including open source tools for streamlining digital curation
workflows, minting and resolving identifiers, and managing SKOS vocabularies. These systems, workflows, and tools, collectively known as
the Bayou City Digital Asset Management System (BCDAMS), represent a novel effort to solve common issues in the digital curation
lifecycle and may serve as a model for other institutions seeking to implement flexible and comprehensive systems for digital preservation
and access.
Medici is a Web 2.0 environment integrating analysis tools for the auto-curation of un-curated digital data, allowing automatic processing of
input (CH) datasets, and visualization of both data and collections. It offers a simple user interface for dataset preprocessing, previewing,
automatic metadata extraction, user input of metadata and provenance support, storage, archiving and management, representation and
reproduction. Building on previous experience (Medici 1), NCSA, and CyI are working towards the improvement of the technical,
performance and functionality aspects of the system. The current version of Medici (Medici 2) is the result of these efforts. It is a scalable,
flexible, robust distributed framework with wide data format support (including 3D models and Reflectance Transformation Imaging-RTI) and
metadata functionality. We provide an overview of Medici 2’s current features supported by representative use cases as well as a
discussion of future development directions
Python, Google Sheets, and the Thesaurus for Graphic Materials for Efficient Metadata Project Workflows
Jeremy Bartczak, Ivey Glendon
In 2017, the University of Virginia (U.Va.) will launch a two year initiative to celebrate the bicentennial anniversary of the University’s
founding in 1819. The U.Va. Library is participating in this event by digitizing some 20,000 photographs and negatives that document
student life on the U.Va. grounds in the 1960s and 1970s. Metadata librarians and archivists are well-versed in the challenges associated
with generating digital content and accompanying description within the context of limited resources. This paper describes how technology
and new approaches to metadata design have enabled the University of Virginia’s Metadata Analysis and Design Department to rapidly and
successfully generate accurate description for these digital objects. Python’s pandas module improves efficiency by cleaning and
repurposing data recorded at digitization, while the lxml module builds MODS XML programmatically from CSV tables. A simplified
technique for subject heading selection and assignment in Google Sheets provides a collaborative environment for streamlined metadata
creation and data quality control.
Need Help with Your Code? Piloting a Programming and Software Development Consultation Service
Laura Wrubel, Daniel Kerchner, Justin Littman
In the Spring 2016 semester, George Washington University Libraries (GW Libraries) undertook a pilot to provide programming and
software development consultation services for the university community. The consultation services took the form of half hour appointments
conducted by librarians with software development expertise, similar to other reference services offered by GW Libraries. The purpose of
this paper is to provide an overview and assessment of the pilot project.
4 of 7 7/11/19, 11:49 AM
The Code4Lib Journal – Search Results – rdf http://journal.code4lib.org/?s=rdf
Bank (https://databank.illinois.edu/), to provide Illinois researchers with a free, self-serve publishing platform that centralizes, preserves, and
provides persistent and reliable access to Illinois research data. This article presents a holistic view of development by discussing our
overarching technical, policy, and interface strategies. By openly presenting our design decisions, the rationales behind those decisions,
and associated challenges this paper aims to contribute to the library community’s work to develop repository services that meet growing
data preservation and sharing needs.
Checking the identity of entities by machine algorithms: the next step to the Hungarian National Namespace
Zsolt Bánki, Tibor Mészáros, Márton Németh, András Simon
The redundancy of entities coming from different sources caused problems during the building of the personal name authorities for the
Petőfi Museum of Literature. It was a top priority to cleanse and unite classificatory records which have different data content but pertain to
the same person without losing any data. As a first step in 2013, we found identities in approximately 80,000 name records so we merged
the data content of these records. In the second phase a much more complicated algorithm had to be applied to show these identities. We
cleansed the database by uniting approximately 36,000 records. The workflow for automatic detection of authority data tries to follow
human intelligence. The database scripts normalize and examine about 20 kinds of data elements according to information about dates,
localities, occupation and name variations. The result of creating pairs from the database authority records, as potential redundant
elements, was a graph, which was condensed to a tree, by human efforts of the curators of the museum. With this, the limit of technological
identification was reached. For further data cleansing human intelligence that can be assisted by computerized regular monitoring is
needed, based upon the developed algorithm. As a result, the service containing about 620,000 authority name records will be an
indispensable foundation to the establishment of the National Name Authorities. This article shows the work process of unification.
Extracting, Augmenting, and Updating Metadata in Fedora 3 and 4 Using a Local OpenRefine Reconciliation
Service
Ruth Kitchin Tillman
When developing local collections, librarians and archivists often create detailed metadata which then gets stored in collection-specific
silos. At times, the metadata could be used to augment other collections but the software does not provide native support for object
relationship update and augmentation. This article describes a project updating author metadata in one collection using a local
reconciliation service generated from another collection’s authority records. Because the Goddard Library is on the cusp of a migration from
Fedora 3 to Fedora 4, this article addresses the challenges in updating Fedora 3 and ways Fedora 4’s architecture will allow for easier
updates.
Peripleo: a Tool for Exploring Heterogeneous Data through the Dimensions of Space and Time
By Rainer Simon, Leif Isaksen, Elton Barker, Pau de Soto Cañamares
This article introduces Peripleo, a prototype spatiotemporal search and visualization tool. Peripleo enables users to explore the geographic,
temporal and thematic composition of distributed digital collections in their entirety, and then to progressively filter and drill down to explore
individual records. We provide an overview of Peripleo’s features, and present the underlying technical architecture. Furthermore, we
discuss how datasets that differ vastly in terms of size, content type and theme can be made uniformly accessible through a set of
lightweight metadata conventions we term “connectivity through common references”. Our current demo installation links approximately half
a million records from 25 datasets. These datasets originate from a spectrum of sources, ranging from the small personal photo collection
with 35 records, to the large institutional database with 134.000 objects. The product of research in the Andrew W. Mellon-funded Pelagios
3 project, Peripleo is Open Source software.
Editorial Introduction: It’s All About Data, Except When It’s Not.
Carol Bean
5 of 7 7/11/19, 11:49 AM
The Code4Lib Journal – Search Results – rdf http://journal.code4lib.org/?s=rdf
Data capture and use is not new to libraries. We know data isn’t everything, but it is ubiquitous in our work, enabling myriads of new ideas
and projects. Articles in this issue reflect the expansion of data creation, capture, use, and analysis in library systems and services.
Building a Better Book in the Browser (Using Semantic Web technologies and HTML5)
Jason A. Clark and Scott W. H. Young
The library as place and service continues to be shaped by the legacy of the book. The book itself has evolved in recent years, with various
technologies vying to become the next dominant book form. In this article, we discuss the design and development of our prototype
software from Montana State University (MSU) Library for presenting books inside of web browsers. The article outlines the contextual
background and technological potential for publishing traditional book content through the web using open standards. Our prototype
demonstrates the application of HTML5, structured data with RDFa and Schema.org markup, linked data components using JSON-LD, and
an API-driven data model. We examine how this open web model impacts discovery, reading analytics, eBook production, and machine-
readability for libraries considering how to unite software development and publishing.
Using a case study at Oregon Health & Science University as a lens to examine the conceptual and technical challenges associated with
automated extraction of access points, we discuss using publically accessible API’s to extract entities (i.e. people, places, concepts, etc.)
from digital and digitized objects. We describe why Linked Open Data is not well suited for a use case such as ours. We conclude with
recommendations about how this method can be used in archives as well as for other library applications.
Training the Next Generation of Open Source Developers: A Case Study of OSU Libraries & Press’ Technology
Training Program
Evviva Weinraub Lajoie, Trey Terrell and Mike Eaton
The Emerging Technologies & Services department at Oregon State University Libraries & Press has implemented a training program for
our technology student employees on how and why they should engage in Open Source community development. This article will outline
what they’ve done to implement this program, discuss the benefits they’ve seen as a result of these changes, and will talk about what they
viewed as necessary to build and promote a culture of engagement in open communities.
Ebooks without Vendors: Using Open Source Software to Create and Share Meaningful Ebook Collections
Matt Weaver
The Community Cookbook project began with wondering how to take local cookbooks in the library’s collection and create a recipe
database. The final website is both a recipe website and collection of ebook versions of local cookbooks. This article will discuss the use of
open source software at every stage in the project, which proves that an open source publishing model is possible for any library.
6 of 7 7/11/19, 11:49 AM
The Code4Lib Journal – Search Results – rdf http://journal.code4lib.org/?s=rdf
and the information they contain is often not fully searchable outside the confines of the site. EgoSystem, developed at Los Alamos
National Laboratories (LANL), explores the problems associated with automated discovery of public online identities for people, and the
aggregation of the social, institution, conceptual, and artifact data connected to these identities. EgoSystem starts with basic demographic
information about former employees and uses that information to locate person identities in various popular online systems. Once identified,
their respective social networks, institutional affiliations, artifacts, and associated concepts are retrieved and linked into a graph containing
other found identities. This graph is stored in a Titan graph database and can be explored using the Gremlin graph query/traversal language
and with the EgoSystem Web interface.
This work is licensed under a Creative Commons Attribution 3.0 United States License.
7 of 7 7/11/19, 11:49 AM