0% found this document useful (0 votes)
82 views

RDF Journal Compilation

The document summarizes several articles from the Code4Lib Journal about using semantic web technologies like RDF to manage and share library metadata and collections. It discusses projects migrating metadata to RDF models, tools for transforming data into RDF graphs, prototypes for collecting and linking user-generated content to bibliographic records in RDF, and recommender systems using linked open data from music playlists. It also analyzes a survey of institutions publishing and consuming linked data and proposes standards for making library, archive, and museum collections reusable through linked open data principles.

Uploaded by

joko itc
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views

RDF Journal Compilation

The document summarizes several articles from the Code4Lib Journal about using semantic web technologies like RDF to manage and share library metadata and collections. It discusses projects migrating metadata to RDF models, tools for transforming data into RDF graphs, prototypes for collecting and linking user-generated content to bibliographic records in RDF, and recommender systems using linked open data from music playlists. It also analyzes a survey of institutions publishing and consuming linked data and proposes standards for making library, archive, and museum collections reusable through linked open data principles.

Uploaded by

joko itc
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

The Code4Lib Journal – Search Results – rdf http://journal.code4lib.org/?

s=rdf

Mission Editorial Committee Process and Structure Code4Lib

Search Results
Showing 30 articles matching "rdf"

The Semantics of Metadata: Avalon Media System and the Move to RDF
Juliet L. Hardesty and Jennifer B. Young
The Avalon Media System (Avalon) provides access and management for digital audio and video collections in libraries and archives. The
open source project is led by the libraries of Indiana University Bloomington and Northwestern University and is funded in part by grants
from The Andrew W. Mellon Foundation and Institute of Museum and Library Services.

Avalon is based on the Samvera Community (formerly Hydra Project) software stack and uses Fedora as the digital repository back end.
The Avalon project team is in the process of migrating digital repositories from Fedora 3 to Fedora 4 and incorporating metadata statements
using the Resource Description Framework (RDF) instead of XML files accompanying the digital objects in the repository. The Avalon team
has worked on the migration path for technical metadata and is now working on the migration paths for structural metadata (PCDM) and
descriptive metadata (from MODS XML to RDF). This paper covers the decisions made to begin using RDF for software development and
offers a window into how Semantic Web technology functions in the real world.

Data Munging Tools in Preparation for RDF: Catmandu and LODRefine


Christina Harlow
Data munging, or the work of remediating, enhancing and transforming library datasets for new or improved uses, has become more
important and staff-inclusive in many library technology discussions and projects. Many times we know how we want our data to look, as
well as how we want our data to act in discovery interfaces or when exposed, but we are uncertain how to make the data we have into the
data we want. This article introduces and compares two library data munging tools that can help: LODRefine (OpenRefine with the DERI
RDF Extension) and Catmandu.

The strengths and best practices of each tool are discussed in the context of metadata munging use cases for an institution’s metadata
migration workflow. There is a focus on Linked Open Data modeling and transformation applications of each tool, in particular how
metadataists, catalogers, and programmers can create metadata quality reports, enhance existing data with LOD sets, and transform that
data to a RDF model. Integration of these tools with other systems and projects, the use of domain specific transformation languages, and
the expansion of vocabulary reconciliation services are mentioned.

Using Semantic Web Technologies to Collaboratively Collect and Share User-Generated Content in Order to
Enrich the Presentation of Bibliographic Records–Development of a Prototype Based on RDF, D2RQ, Jena,
SPARQL and WorldCat’s FRBRization Web Service
Ragnhild Holgersen, Michael Preminger, David Massey
In this article we present a prototype of a semantic web-based framework for collecting and sharing user-generated content (reviews,
ratings, tags, etc.) across different libraries in order to enrich the presentation of bibliographic records. The user-generated data is
remodeled into RDF, utilizing established linked data ontologies. This is done in a semi-automatic manner utilizing the Jena and the D2RQ-
toolkits. For the remodeling, a SPARQL-construct statement is tailored for each data source. In the data source used in our prototype, user-
generated content is linked to the relevant books via their ISBN. By remodeling the data according to the FRBR model, and expanding the
RDF graph with data returned by WorldCat’s FRBRization web service, we are able to greatly increase the number of entry points to each
book. We make the social content available through a RESTful web service with ISBN as a parameter. The web service returns a graph of
all user-generated data registered to any edition of the book in question in the RDF/XML format. Libraries using our framework would thus
be able to present relevant social content in association with bibliographic records, even if they hold a different version of a book than the
one that was originally accessed by users. Finally, we connect our RDF graph to the linked open data cloud through the use of Talis’
openlibrary.org SPARQL endpoint.

Content Dissemination from Small-scale Museum and Archival Collections: Community Reusable Semantic

1 of 7 7/11/19, 11:49 AM
The Code4Lib Journal – Search Results – rdf http://journal.code4lib.org/?s=rdf

Metadata Content Models for Digital Humanities


Avgoustinos Avgousti, Georgios Papaioannou, Feliz Ribeiro Gouveia
This paper highlights the challenges in content dissemination in Cultural Heritage (CH) institutions by digital humanities scholars and small
Museums and Archival Collections. It showcases a solution based on Community Reusable Semantic Metadata Content Models (RM’s)
available for download from our community website. Installing the RM’s will extend the functionality of the state of the art Content
Management Framework (CMF) towards numismatic collections. Furthermore, it encapsulates metadata using the Resource Description
Framework in Attributes (RDFa), and the Schema.org vocabulary. Establishing a community around RM’s will help the development,
upgrading and sharing of RM’s models and packages for the benefit of the Cultural Heritage community. A distributed model for Community
Reusable Semantic Metadata Content Models will allow the community to grow and improve, serving the needs and enabling the
infrastructure to scale for the next generation of humanities scholars.

Automated Playlist Continuation with Apache PredictionIO


Jim Hahn
The Minrva project team, a software development research group based at the University of Illinois Library, developed a data-focused
recommender system to participate in the creative track of the 2018 ACM RecSys Challenge, which focused on music recommendation. We
describe here the large-scale data processing the Minrva team researched and developed for foundational reconciliation of the Million
Playlist Dataset using external authority data on the web (e.g. VIAF, WikiData). The secondary focus of the research was evaluating and
adapting the processing tools that support data reconciliation. This paper reports on the playlist enrichment process, indexing, and
subsequent recommendation model developed for the music recommendation challenge.

Analysis of 2018 International Linked Data Survey for Implementers


Karen Smith-Yoshimura
OCLC Research conducted an International Linked Data Survey for Implementers in 2014 and 2015. Curious about what might have
changed since the last survey, and eager to learn about new projects or services that format metadata as linked data or make subsequent
uses of it, OCLC Research repeated the survey between 17 April and 25 May 2018.

A total of 143 institutions in 23 countries responded to one or more of the surveys. This analysis covers the 104 linked data projects or
services described by the 81 institutions which responded to the 2018 survey—those that publish linked data, consume linked data, or both.
This article provides an overview of the linked data projects or services institutions have implemented or are implementing; what data they
publish and consume; the reasons given for implementing linked data and the barriers encountered; and some advice given by respondents
to those considering implementing a linked data project or service. Differences with previous survey responses are noted, but as the
majority of linked projects and services described are either not yet in production or implemented within the last two years, these differences
may reflect new trends rather than changes in implementations.

Wikidata: a platform for your library’s linked open data


Stacy Allison-Cassin and Dan Scott
Seized with the desire to improve the visibility of Canadian music in the world, a ragtag band of librarians led by Stacy Allison-Cassin set
out to host Wikipedia edit-a-thons in the style of Art+Feminism, but with a focus on addressing Canadian music instead. Along the way, they
recognized that Wikidata offered a low-barrier, high-result method of making that data not only visible but reusable as linked open data, and
consequently incorporated Wikidata into their edit-a-thons. This is their story.

FAIR Principles for Library, Archive and Museum Collections: A proposal for standards for reusable collections
Lukas Koster, Saskia Woutersen-Windhouwer
Many heritage institutions would like their collections to be open and reusable but fail to achieve that situation because of organizational,
legal and technological barriers. A set of guidelines and best practices is proposed to facilitate the process of making heritage collections
reusable. These guidelines are based on the FAIR Principles for scholarly output (FAIR data principles [2014]), taking into account a
number of other recent initiatives for making data findable, accessible, interoperable and reusable. The resulting FAIR Principles for
Heritage Library, Archive and Museum Collections focus on three levels: objects, metadata and metadata records. Clarifications and
examples of these proposed principles are presented, as well as recommendations for the assessment of current situations and
implementations of the principles.

Microdata in the IR: A Low-Barrier Approach to Enhancing Discovery of Institutional Repository Materials in
Google
Shayna Pekala
Georgetown University Library curates a multitude of open access resources in its institutional repository and digital collections portal,
DigitalGeorgetown. Over the last several years, the Library has experimented with methods for making these items increasingly visible in
search engine search results. This article describes the Library’s low-barrier approach to applying Schema.org vocabulary to its DSpace

2 of 7 7/11/19, 11:49 AM
The Code4Lib Journal – Search Results – rdf http://journal.code4lib.org/?s=rdf

institutional repository using microdata, as well as the challenges with and strategies used for assessing this work. The effects of the
application of Schema.org microdata to DigitalGeorgetown on Google search results were tracked over time using three different metrics,
providing new insights about its impact.

The Automagic of the LII’s eCFR


Charlotte Schneider and Sylvia Kwakye
The Legal Information Institute (LII) began providing access to federal legal materials in 1992. This article discusses their work expanding
and improving free public access to federal legal resources in the U.S., particularly developing their eCFR product for the Code of Federal
Regulations, and plans to integrate DocketWrench.

The Drawings of the Florentine Painters: From Print Catalog to Linked Open Data
Lukas Klic, Matt Miller, Jonathan K. Nelson, Cristina Pattuelli, and Alexandra Provo
The Drawings of The Florentine Painters project created the first online database of Florentine Renaissance drawings by applying Linked
Open Data (LOD) techniques to a foundational text of the same name, first published by Bernard Berenson in 1903 (revised and expanded
editions, 1938 and 1961). The goal was to make Berenson’s catalog information—still an essential information resource today—available in
a machine-readable format, allowing researchers to access the source content through open data services. This paper provides a technical
overview of the methods and processes applied in the conversion of Berenson’s catalog to LOD using the CIDOC-CRM ontology; it also
discusses the different phases of the project, focusing on the challenges and issues of data transformation and publishing. The project was
funded by the Samuel H. Kress Foundation and organized by Villa I Tatti, The Harvard University Center for Italian Renaissance Studies.

Catalog: http://florentinedrawings.itatti.harvard.edu
Data Endpoint: http://data.itatti.harvard.edu

Annotation-based enrichment of Digital Objects using open-source frameworks


Marcus Emmanuel Barnes, Natkeeran Ledchumykanthan, Kim Pham, Kirsta Stapelfeldt
The W3C Web Annotation Data Model, Protocol, and Vocabulary unify approaches to annotations across the web, enabling their
aggregation, discovery and persistence over time. In addition, new javascript libraries provide the ability for users to annotate multi-format
content. In this paper, we describe how we have leveraged these developments to provide annotation features alongside Islandora’s
existing preservation, access, and management capabilities. We also discuss our experience developing with the Web Annotation Model as
an open web architecture standard, as well as our approach to integrating mature external annotation libraries. The resulting software (the
Web Annotation Utility Module for Islandora) accommodates annotation across multiple formats. This solution can be used in various digital
scholarship contexts.

Linked Data is People: Building a Knowledge Graph to Reshape the Library Staff Directory
Jason A. Clark and Scott W. H. Young
One of our greatest library resources is people. Most libraries have staff directory information published on the web, yet most of this data is
trapped in local silos, PDFs, or unstructured HTML markup. With this in mind, the library informatics team at Montana State University
(MSU) Library set a goal of remaking our people pages by connecting the local staff database to the Linked Open Data (LOD) cloud. In
pursuing linked data integration for library staff profiles, we have realized two primary use cases: improving the search engine optimization
(SEO) for people pages and creating network graph visualizations. In this article, we will focus on the code to build this library graph model
as well as the linked data workflows and ontology expressions developed to support it. Existing linked data work has largely centered
around machine-actionable data and improvements for bots or intelligent software agents. Our work demonstrates that connecting your
staff directory to the LOD cloud can reveal relationships among people in dynamic ways, thereby raising staff visibility and bringing an
increased level of understanding and collaboration potential for one of our primary assets: the people that make the library happen.

Recommendations for the application of Schema.org to aggregated Cultural Heritage metadata to increase
relevance and visibility to search engines: the case of Europeana
Richard Wallis, Antoine Isaac, Valentine Charles, and Hugo Manguinhas
Europeana provides access to more than 54 million cultural heritage objects through its portal Europeana Collections. It is crucial for
Europeana to be recognized by search engines as a trusted authoritative repository of cultural heritage objects. Indeed, even though its
portal is the main entry point, most Europeana users come to it via search engines.

Europeana Collections is fuelled by metadata describing cultural objects, represented in the Europeana Data Model (EDM). This paper
presents the research and consequent recommendations for publishing Europeana metadata using the Schema.org vocabulary and best
practices. Schema.org html embedded metadata to be consumed by search engines to power rich services (such as Google Knowledge
Graph). Schema.org is an open and widely adopted initiative (used by over 12 million domains) backed by Google, Bing, Yahoo!, and
Yandex, for sharing metadata across the web It underpins the emergence of new web techniques, such as so called Semantic SEO.

Our research addressed the representation of the embedded metadata as part of the Europeana HTML pages and sitemaps so that the re-

3 of 7 7/11/19, 11:49 AM
The Code4Lib Journal – Search Results – rdf http://journal.code4lib.org/?s=rdf

use of this data can be optimized.

The practical objective of our work is to produce a Schema.org representation of Europeana resources described in EDM, being the richest
as possible and tailored to Europeana’s realities and user needs as well the search engines and their users.

Outside The Box: Building a Digital Asset Management Ecosystem for Preservation and Access
Andrew Weidner, Sean Watkins, Bethany Scott, Drew Krewer, Anne Washington, Matthew Richardson
The University of Houston (UH) Libraries made an institutional commitment in late 2015 to migrate the data for its digitized cultural heritage
collections to open source systems for preservation and access: Hydra-in-a-Box, Archivematica, and ArchivesSpace. This article describes
the work that the UH Libraries implementation team has completed to date, including open source tools for streamlining digital curation
workflows, minting and resolving identifiers, and managing SKOS vocabularies. These systems, workflows, and tools, collectively known as
the Bayou City Digital Asset Management System (BCDAMS), represent a novel effort to solve common issues in the digital curation
lifecycle and may serve as a model for other institutions seeking to implement flexible and comprehensive systems for digital preservation
and access.

Medici 2: A Scalable Content Management System for Cultural Heritage Datasets


Constantinos Sophocleous, Luigi Marini, Ropertos Georgiou, Mohammed Elfarargy, Kenton McHenry
Digitizing large collections of Cultural Heritage (CH) resources and providing tools for their management, analysis and visualization is
critical to CH research. A key element in achieving the above goal is to provide user-friendly software offering an abstract interface for
interaction with a variety of digital content types. To address these needs, the Medici content management system is being developed in a
collaborative effort between the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign,
Bibliotheca Alexandrina (BA) in Egypt, and the Cyprus Institute (CyI). The project is pursued in the framework of European Project “Linking
Scientific Computing in Europe and Eastern Mediterranean 2” (LinkSCEEM2) and supported by work funded through the U.S. National
Science Foundation (NSF), the U.S. National Archives and Records Administration (NARA), the U.S. National Institutes of Health (NIH), the
U.S. National Endowment for the Humanities (NEH), the U.S. Office of Naval Research (ONR), the U.S. Environmental Protection Agency
(EPA) as well as other private sector efforts.

Medici is a Web 2.0 environment integrating analysis tools for the auto-curation of un-curated digital data, allowing automatic processing of
input (CH) datasets, and visualization of both data and collections. It offers a simple user interface for dataset preprocessing, previewing,
automatic metadata extraction, user input of metadata and provenance support, storage, archiving and management, representation and
reproduction. Building on previous experience (Medici 1), NCSA, and CyI are working towards the improvement of the technical,
performance and functionality aspects of the system. The current version of Medici (Medici 2) is the result of these efforts. It is a scalable,
flexible, robust distributed framework with wide data format support (including 3D models and Reflectance Transformation Imaging-RTI) and
metadata functionality. We provide an overview of Medici 2’s current features supported by representative use cases as well as a
discussion of future development directions

Python, Google Sheets, and the Thesaurus for Graphic Materials for Efficient Metadata Project Workflows
Jeremy Bartczak, Ivey Glendon
In 2017, the University of Virginia (U.Va.) will launch a two year initiative to celebrate the bicentennial anniversary of the University’s
founding in 1819. The U.Va. Library is participating in this event by digitizing some 20,000 photographs and negatives that document
student life on the U.Va. grounds in the 1960s and 1970s. Metadata librarians and archivists are well-versed in the challenges associated
with generating digital content and accompanying description within the context of limited resources. This paper describes how technology
and new approaches to metadata design have enabled the University of Virginia’s Metadata Analysis and Design Department to rapidly and
successfully generate accurate description for these digital objects. Python’s pandas module improves efficiency by cleaning and
repurposing data recorded at digitization, while the lxml module builds MODS XML programmatically from CSV tables. A simplified
technique for subject heading selection and assignment in Google Sheets provides a collaborative environment for streamlined metadata
creation and data quality control.

Need Help with Your Code? Piloting a Programming and Software Development Consultation Service
Laura Wrubel, Daniel Kerchner, Justin Littman
In the Spring 2016 semester, George Washington University Libraries (GW Libraries) undertook a pilot to provide programming and
software development consultation services for the university community. The consultation services took the form of half hour appointments
conducted by librarians with software development expertise, similar to other reference services offered by GW Libraries. The purpose of
this paper is to provide an overview and assessment of the pilot project.

Overly Honest Data Repository Development


Colleen Fallaw, Elise Dunham, Elizabeth Wickes, Dena Strong, Ayla Stein, Qian Zhang, Kyle Rimkus, Bill Ingram, Heidi J.
Imker
After a year of development, the library at the University of Illinois at Urbana-Champaign has launched a repository, called the Illinois Data

4 of 7 7/11/19, 11:49 AM
The Code4Lib Journal – Search Results – rdf http://journal.code4lib.org/?s=rdf

Bank (https://databank.illinois.edu/), to provide Illinois researchers with a free, self-serve publishing platform that centralizes, preserves, and
provides persistent and reliable access to Illinois research data. This article presents a holistic view of development by discussing our
overarching technical, policy, and interface strategies. By openly presenting our design decisions, the rationales behind those decisions,
and associated challenges this paper aims to contribute to the library community’s work to develop repository services that meet growing
data preservation and sharing needs.

Checking the identity of entities by machine algorithms: the next step to the Hungarian National Namespace
Zsolt Bánki, Tibor Mészáros, Márton Németh, András Simon
The redundancy of entities coming from different sources caused problems during the building of the personal name authorities for the
Petőfi Museum of Literature. It was a top priority to cleanse and unite classificatory records which have different data content but pertain to
the same person without losing any data. As a first step in 2013, we found identities in approximately 80,000 name records so we merged
the data content of these records. In the second phase a much more complicated algorithm had to be applied to show these identities. We
cleansed the database by uniting approximately 36,000 records. The workflow for automatic detection of authority data tries to follow
human intelligence. The database scripts normalize and examine about 20 kinds of data elements according to information about dates,
localities, occupation and name variations. The result of creating pairs from the database authority records, as potential redundant
elements, was a graph, which was condensed to a tree, by human efforts of the curators of the museum. With this, the limit of technological
identification was reached. For further data cleansing human intelligence that can be assisted by computerized regular monitoring is
needed, based upon the developed algorithm. As a result, the service containing about 620,000 authority name records will be an
indispensable foundation to the establishment of the National Name Authorities. This article shows the work process of unification.

How to Party Like it’s 1999: Emulation for Everyone


Dianne Dietrich, Julia Kim, Morgan McKeehan, and Alison Rhonemus
Emulated access of complex media has long been discussed, but there are very few instances in which complex, interactive, born-digital
emulations are available to researchers. New York Public Library has made 1980-90’s era video games from 5.25″ floppy disks in the
Timothy Leary Papers accessible via a DosBox emulator. These games appear in various stages of development and display the work of at
least four of Leary’s collaborators on the games. 56 disk images from the Leary Papers are currently emulated in the reading room. New
York University has made late 1990s-mid 2000’s era Photoshop files from the Jeremy Blake Papers accessible to researchers. The Blake
Papers include over 300 pieces of media. Cornell University Library was awarded a grant from the NEH to analyze approximately 100 born-
digital artworks created for CD-ROM from the Rose Goldsen Archive of New Media Art to develop preservation workflows, access
strategies, and metadata frameworks. Rhizome has undertaken a number of emulation projects as a major part of its preservation strategy
for born-digital artworks. In cooperation with the University of Freiburg in Germany, Rhizome recently restored several digital artworks for
public access using a cloud-based emulation framework. This framework (bwFLA) has been designed to facilitate the reenactments of
software on a large scale, for internal use or public access. This paper will guide readers through how to implement emulation. Each of the
institutions weigh in on oddities and idiosyncrasies they encountered throughout the process — from accession to access.

Extracting, Augmenting, and Updating Metadata in Fedora 3 and 4 Using a Local OpenRefine Reconciliation
Service
Ruth Kitchin Tillman
When developing local collections, librarians and archivists often create detailed metadata which then gets stored in collection-specific
silos. At times, the metadata could be used to augment other collections but the software does not provide native support for object
relationship update and augmentation. This article describes a project updating author metadata in one collection using a local
reconciliation service generated from another collection’s authority records. Because the Goddard Library is on the cusp of a migration from
Fedora 3 to Fedora 4, this article addresses the challenges in updating Fedora 3 and ways Fedora 4’s architecture will allow for easier
updates.

Peripleo: a Tool for Exploring Heterogeneous Data through the Dimensions of Space and Time
By Rainer Simon, Leif Isaksen, Elton Barker, Pau de Soto Cañamares
This article introduces Peripleo, a prototype spatiotemporal search and visualization tool. Peripleo enables users to explore the geographic,
temporal and thematic composition of distributed digital collections in their entirety, and then to progressively filter and drill down to explore
individual records. We provide an overview of Peripleo’s features, and present the underlying technical architecture. Furthermore, we
discuss how datasets that differ vastly in terms of size, content type and theme can be made uniformly accessible through a set of
lightweight metadata conventions we term “connectivity through common references”. Our current demo installation links approximately half
a million records from 25 datasets. These datasets originate from a spectrum of sources, ranging from the small personal photo collection
with 35 records, to the large institutional database with 134.000 objects. The product of research in the Andrew W. Mellon-funded Pelagios
3 project, Peripleo is Open Source software.

Editorial Introduction: It’s All About Data, Except When It’s Not.
Carol Bean

5 of 7 7/11/19, 11:49 AM
The Code4Lib Journal – Search Results – rdf http://journal.code4lib.org/?s=rdf

Data capture and use is not new to libraries. We know data isn’t everything, but it is ubiquitous in our work, enabling myriads of new ideas
and projects. Articles in this issue reflect the expansion of data creation, capture, use, and analysis in library systems and services.

Building a Better Book in the Browser (Using Semantic Web technologies and HTML5)
Jason A. Clark and Scott W. H. Young
The library as place and service continues to be shaped by the legacy of the book. The book itself has evolved in recent years, with various
technologies vying to become the next dominant book form. In this article, we discuss the design and development of our prototype
software from Montana State University (MSU) Library for presenting books inside of web browsers. The article outlines the contextual
background and technological potential for publishing traditional book content through the web using open standards. Our prototype
demonstrates the application of HTML5, structured data with RDFa and Schema.org markup, linked data components using JSON-LD, and
an API-driven data model. We examine how this open web model impacts discovery, reading analytics, eBook production, and machine-
readability for libraries considering how to unite software development and publishing.

Improving Access to Archival Collections with Automated Entity Extraction


Kyle Banerjee and Max Johnson
The complexity and diversity of archival resources make constructing rich metadata records time consuming and expensive, which in turn
limits access to these valuable materials. However, significant automation of the metadata creation process would dramatically reduce the
cost of providing access points, improve access to individual resources, and establish connections between resources that would otherwise
remain unknown.

Using a case study at Oregon Health & Science University as a lens to examine the conceptual and technical challenges associated with
automated extraction of access points, we discuss using publically accessible API’s to extract entities (i.e. people, places, concepts, etc.)
from digital and digitized objects. We describe why Linked Open Data is not well suited for a use case such as ours. We conclude with
recommendations about how this method can be used in archives as well as for other library applications.

Training the Next Generation of Open Source Developers: A Case Study of OSU Libraries & Press’ Technology
Training Program
Evviva Weinraub Lajoie, Trey Terrell and Mike Eaton
The Emerging Technologies & Services department at Oregon State University Libraries & Press has implemented a training program for
our technology student employees on how and why they should engage in Open Source community development. This article will outline
what they’ve done to implement this program, discuss the benefits they’ve seen as a result of these changes, and will talk about what they
viewed as necessary to build and promote a culture of engagement in open communities.

A Metadata Schema for Geospatial Resource Discovery Use Cases


Darren Hardy and Kim Durante
We introduce a metadata schema that focuses on GIS discovery use cases for patrons in a research library setting. Text search, faceted
refinement, and spatial search and relevancy are among GeoBlacklight’s primary use cases for federated geospatial holdings. The schema
supports a variety of GIS data types and enables contextual, collection-oriented discovery applications as well as traditional portal
applications. One key limitation of GIS resource discovery is the general lack of normative metadata practices, which has led to a
proliferation of metadata schemas and duplicate records. The ISO 19115/19139 and FGDC standards specify metadata formats, but are
intricate, lengthy, and not focused on discovery. Moreover, they require sophisticated authoring environments and cataloging expertise.
Geographic metadata standards target preservation and quality measure use cases, but they do not provide for simple inter-institutional
sharing of metadata for discovery use cases. To this end, our schema reuses elements from Dublin Core and GeoRSS to leverage their
normative semantics, community best practices, open-source software implementations, and extensive examples already deployed in
discovery contexts such as web search and mapping. Finally, we discuss a Solr implementation of the schema using a “geo” extension to
MODS.

Ebooks without Vendors: Using Open Source Software to Create and Share Meaningful Ebook Collections
Matt Weaver
The Community Cookbook project began with wondering how to take local cookbooks in the library’s collection and create a recipe
database. The final website is both a recipe website and collection of ebook versions of local cookbooks. This article will discuss the use of
open source software at every stage in the project, which proves that an open source publishing model is possible for any library.

EgoSystem: Where are our Alumni?


James Powell, Harihar Shankar, Marko Rodriguez, Herbert Van de Sompel
Comprehensive social search on the Internet remains an unsolved problem. Social networking sites tend to be isolated from each other,

6 of 7 7/11/19, 11:49 AM
The Code4Lib Journal – Search Results – rdf http://journal.code4lib.org/?s=rdf

and the information they contain is often not fully searchable outside the confines of the site. EgoSystem, developed at Los Alamos
National Laboratories (LANL), explores the problems associated with automated discovery of public online identities for people, and the
aggregation of the social, institution, conceptual, and artifact data connected to these identities. EgoSystem starts with basic demographic
information about former employees and uses that information to locate person identities in various popular online systems. Once identified,
their respective social networks, institutional affiliations, artifacts, and associated concepts are retrieved and linked into a graph containing
other found identities. This graph is stored in a Titan graph database and can be explored using the Gremlin graph query/traversal language
and with the EgoSystem Web interface.

This work is licensed under a Creative Commons Attribution 3.0 United States License.

7 of 7 7/11/19, 11:49 AM

You might also like