0% found this document useful (0 votes)
6 views

AdrianBrown_2013__PracticalDigitalPrese

This document discusses the challenges and practicalities of digital preservation for various organizations, particularly smaller institutions like libraries and archives. It emphasizes the importance of motivation, resources, and the dispelling of myths surrounding digital preservation, highlighting that it can be achievable even with limited budgets and expertise. The book aims to provide guidance and case studies to help organizations develop effective digital preservation strategies and capabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

AdrianBrown_2013__PracticalDigitalPrese

This document discusses the challenges and practicalities of digital preservation for various organizations, particularly smaller institutions like libraries and archives. It emphasizes the importance of motivation, resources, and the dispelling of myths surrounding digital preservation, highlighting that it can be achievable even with limited budgets and expertise. The book aims to provide guidance and case studies to help organizations develop effective digital preservation strategies and capabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

1

Introduction

1.1 Introduction
Picture a scene: in a county record office somewhere in England, a young
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.

archivist is looking through the morning post. Among the usual enquiry
letters and payments for copies of documents is a mysterious padded
envelope. Opening it reveals five floppy disks of various sizes, accompanied
by a brief covering letter from the office manager of a long-established local
business, explaining that the contents had been discovered during a recent
office refurbishment; since the record office has previously acquired the
historic paper records of the company, perhaps these would also be of
interest? The disks themselves bear only terse labels, such as ‘Minutes, 1988-
90’ or ‘customers.dbf’. Some, the archivist recognizes as being 3.5” disks,
while the larger ones seem vaguely familiar from a digital preservation
seminar she attended during her training. On one point she is certain: the
office PCs are not capable of reading any of them. How can she discover what
is actually on the disks, and whether they contain important business records
or junk? And even if they do prove of archival interest, what should the
record office actually do with them?
Meanwhile, a university librarian in the mid-west USA attends a faculty
meeting to discuss the burgeoning institutional repository. Introduced a few
years ago to store PDF copies of academic preprints and postprints, there is
increasing demand from staff to store other kinds of content in a much wider
range of formats, from original research data, to student dissertations and
theses, teaching materials and course notes, and to make that content
available for reuse by others in novel ways. How, the librarian ponders, does
the repository need to be adapted to meet these new requirements, and what
must the library do to ensure the long-term preservation of such a diverse
digital collection?
Finally, in East Africa, a national archivist has just finished reading a report
2013. Facet Publishing.

EBSCO Publishing: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD 766206; Adrian
Brown; Practical Digital Preservation : A How-to Guide for Organizations of Any Size Account:eds.
2 PRACTICAL DIGITAL PRESERVATION

from a consultant commissioned to advise on requirements for preserving


electronic records. The latest in a series of projects to develop records
management within government, he knows that this work is crucial to
promoting transparency, empowering citizens by providing them with access
to reliable information, reducing corruption and improving governance
through the use of new technologies. The national archives has achieved
much in recent years, putting in place strong records management processes
and guidance. But how to develop the digital preservation systems necessary
to achieve the report’s ambitious recommendations, with limited budgets and
staff skills, and an unreliable IT infrastructure?
This book is intended to help these people, and the countless other
information managers and curators around the world who are wrestling
with the challenges of preserving digital data, to answer these questions. If I
had been writing it only a few years ago, my first task would have been to
explain the need for digital preservation at length, illustrated no doubt with
celebrated examples of data loss such as the BBC Domesday disks, or NASA’s
Viking probe.1 Today, most information management professionals are all
too aware of the fact that, without active intervention, digital information is
subject to rapid and catastrophic loss – the warnings of an impending
‘Digital Dark Ages’ have served their purpose. Hopefully, they are equally
alive to the enormous benefits of digital preservation, in unlocking the
current and long-term value of that information. Instead, their principal
concern now is how to respond in a practical way to these challenges. There
is a sense that awareness of the solutions has not kept pace with appreciation
of the potential and the problems.
Such solutions as are widely known are generally seen as being the
preserve of major institutions – the national libraries and archives – with
multi-million pound budgets and large numbers of staff at their disposal.
Even if reality often doesn’t match this perception – many national memory
institutions are tackling digital preservation on a comparative shoestring –
there is no doubt that such organizations have been at the vanguard of
developments in the field.
The challenges can sometimes appear overpowering. The extraordinary
growth in the creation of digital information is often described using rather
frightening or negative analogies, such as the ‘digital deluge’ or ‘data
tsunami’. These certainly reflect the common anxieties that information
curators and consumers have about their abilities to manage these
gargantuan volumes of data, and to find and understand the information

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
INTRODUCTION 3

they need within. These concerns are compounded by a similarly


overwhelming wave of information generated by the digital preservation
community: no one with any exposure to the field can have escaped a certain
sense of despair at ever keeping up to date with the constant stream of
reports, conferences, blogs, wikis, projects and tweets.
In writing this book, my goal has been to demonstrate that, in reality, it is
not only possible but eminently realistic for organizations of all sizes to put
digital preservation into practice, even with very limited resources and
existing knowledge. I have sought to do so through a combination of practical
guidance, and case studies which reinforce that guidance, illustrating how it
has already been successfully applied in the real world.

1.2 Who is this book for?


This book is intended to be of value to anyone with an interest in the practice
of digital preservation, but is primarily aimed at existing and prospective
practitioners in:

• smaller memory institutions, such as libraries, archives, museums and


galleries, which have a core mission to collect, preserve and provide
access to information or artefacts
• institutional archives and libraries, which collect, preserve and provide
access to the information resources created or used by their organizations
in support of their core mission; examples include business archives and
institutional repositories.

In other words, it is written for the vast range of organizations outside the
national cultural memory institutions that want and need to develop the
ability to collect, preserve and provide access to digital information.
Although it should be of interest to policy makers within these organizations,
it is intended primarily for those who are, or are hoping to be, responsible for
digital preservation at a practical level.
The underlying aim of digital preservation can be stated very simply:

To maintain the object of preservation for as long as required, in a form which


is authentic, and accessible to users.

This book shows how you can build practical solutions to achieve that aim. It

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
4 PRACTICAL DIGITAL PRESERVATION

begins by looking at how to approach developing a digital preservation


capability, from raising initial awareness, and gaining the necessary mandate
and resources, to beginning an organized programme of work to put in place
the appropriate people, systems and processes. It then examines in detail
what the practice of digital preservation actually involves, from initially
acquiring content to making it available to users. It should not be assumed
that this requires monolithic IT systems; one of the central arguments of this
book is that digital preservation is an outcome, which can be achieved by
many different means, and at varying levels of complexity, to suit the needs
and resources of the organization in question.

1.3 Minimum requirements


The entry level for digital preservation is actually very low – indeed, the
premise of this book is that it is entirely realistic for small organizations to
implement credible services. However, it must be recognized that there are
minimum requirements for an organization to build a digital preservation
service. These are:

• Motivation: First and foremost, an organization must have the desire to


address the digital preservation challenge. Doing so is likely to be a
lengthy process, by turns as frustrating as it is rewarding, and a
substantial level of motivation is essential to persevere through this.
• Means: Second, an organization must have the wherewithal to turn that
desire into reality. This may take the form of:
□ expertise: to establish the detailed case for digital preservation, define
the organization’s requirements, and oversee their implementation and
future operation
□ financial resources: to fund staff, services and infrastructure
□ infrastructure: to underpin an operational digital preservation
capability.

Of the three, either expertise or financial resources are the most critical:
expertise can make best use of limited resources and help to secure more
resources in future, while money can be used to buy in expertise. The
minimum infrastructure required is very variable but, as will be
demonstrated later in this book, should be within the reach of most
organizations.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
INTRODUCTION 5

1.4 Some digital preservation myths


There are a number of widespread myths and misconceptions about digital
preservation, which together serve to foster the image that it is too scary,
complex and difficult to be contemplated as a practical proposition by smaller
organizations. In particular, it is often perceived that digital preservation:

• can only be tackled by national bodies


• requires huge budgets
• requires deep technical knowledge
• can be left until next year to tackle.

This book serves to counter those myths with some digital preservation realities:

Digital preservation can only be tackled by national bodies


While such institutions have undoubtedly taken the lead in developing
digital preservation as a discipline, the existence of mature, affordable,
practical tools and services means that it is now not only realistic, but also
imperative, for organizations of every size and type to address the issue.

Digital preservation requires huge budgets


You can spend as much or as little on digital preservation as resources allow.
While the US National Archives and Records Administration has spent an
estimated $500 million on building its Electronic Records Archive,2 a working
digital repository was developed at the English Heritage Centre for
Archaeology at the cost of a few hundred pounds and the author’s time (see
Chapter 4, ‘Models for implementing a digital preservation service’). This
book demonstrates how much can be achieved using readily available tools
and resources, as well as with more complex systems.

Digital preservation requires deep technical knowledge


While it can undoubtedly lead into very technical territory, especially at the
cutting edge of research, digital preservation practice does not require deep
technical knowledge. Practitioners today come from hugely varied backgrounds,
ranging from traditional library and archives roles and IT, to astronomy and
archaeology. Adaptability and enthusiasm are the most important characteristics

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
6 PRACTICAL DIGITAL PRESERVATION

for any would-be digital archivist. While most have developed their skills on the
job, there are now an excellent range of training opportunities to suit all needs
and budgets, from online tutorials, through seminars and conferences, to longer
training courses and postgraduate qualifications. Digital preservation is also
becoming established as a vital professional skill within information
management training courses. Couple this with a very supportive and
collaboration-minded community, and no one should have cause to fear that
digital preservation skills are inaccessible or difficult to acquire. The
opportunities for training and professional development are discussed in detail
in Chapter 4, ‘Models for implementing a digital preservation service’.

Digital preservation can be left until next year to tackle


This is an issue that organizations need to address urgently, if they are to
realize the enormous benefits, and avoid substantial legal, financial,
operational and reputational risks, as well as the loss of information of great
historical and business value. This is not to say that you must do everything
at once, or that your requirements will be the same as another organization’s
– the maturity model introduced in Chapter 4, ‘Models for implementing a
digital preservation service’, and expanded in Chapter 8, ‘Preserving digital
objects’, illustrates how you can develop your capabilities over time, and to a
level that suits your needs. However, now is the time to begin tackling digital
preservation at a practical level.

1.5 The current situation


So what challenges do small organizations currently face? A survey in 2008
provided an interesting snapshot of the state of readiness across local
authority archives in the UK to preserve digital records.3 There is little to
suggest that the situation has changed greatly since, and it is worth looking
at the results of this survey in some detail, as they illustrate the challenges
faced globally by smaller organizations in general.
Although most archives demonstrated a basic awareness of digital
preservation, and knew (74%) about basic sources of support such as the Digital
Preservation Coalition, the level of more detailed knowledge dropped off very
noticeably beyond that. Around half were aware of the seminal international
standard, the Open Archival Information Systems (OAIS) Reference Model, and
of key initiatives run by national memory institutions such as the British Library

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
INTRODUCTION 7

and The National Archives (TNA). Two-thirds were unaware of other key
international standards, such as PREMIS or METS, and a similar proportion
were unfamiliar with projects of particular relevance to UK local archives, such
as the East of England Digital Archive Regional Pilot4 and Paradigm.5
Nearly half (47%) had a digital preservation policy, which conforms to the
findings of other surveys before and since (see Chapter 2, ‘Making the case
for digital preservation’). However, relatively few had taken the next step of
introducing detailed standards and working practices, such as guidelines for
depositors (16%) or ingest procedures (11%).
Most archives (79%) considered themselves to be reacting to the demands
of depositors, rather than proactively building their digital records capability,
although almost all held some digital material, and only 5% were actually
turning away digital records because of a lack of facilities. Despite their
nascent digital collections, they frequently lacked even basic information
about the nature of that material, such as detailed volumes or file counts. The
information supplied by respondents about the file formats they held is
illuminating: in addition to the ubiquitous image formats resulting from
digitization initiatives, and the expected Office-type formats, there was a
wide range of obsolete formats, such as Lotus 1-2-3 and Claris Filemaker, as
well as specialized formats such as computer-aided design (CAD) and
genealogy data. Many archives also reported holding digital audiovisual
collections. Although unsurprising, given the wide-ranging collecting
policies of many local authority archives, this diversity highlights some
significant preservation challenges. As a result of fairly minimal information
gathering activities at ingest, most archives did not have the information
necessary to undertake any form of preservation planning.
The majority had some form of backed-up, server-based storage, although
87% also had some material on optical media such as CD or DVD; 42% simply
stored the data on its original media, although around half did at least
perform basic checks on ingest, such as testing whether the media could be
read. Only a tiny proportion was undertaking more sophisticated actions,
such as generating checksums or normalizing formats. Only one respondent
had use of a content management system, and one was outsourcing its storage.
Access is a fundamental requirement for any archive, but two-thirds of
respondents were relying on purely ad hoc arrangements, rather than any
formal user access system. Such online delivery facilities as did exist were
mainly limited to image galleries, and therefore did not support access to
other types of digital material.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
8 PRACTICAL DIGITAL PRESERVATION

Interestingly, less than half of respondents reported close involvement in


the implementation of electronic records management systems, even though
these are likely to be one of the principal sources of digital records for such
archives in the future.
A particularly noteworthy aspect of the survey was the section on barriers
to digital preservation, in which respondents were asked about the main
perceived obstacles. The report identified three groups of these from the
responses: cultural, resource and skills. Perhaps unsurprisingly, funding was
seen as the main barrier, followed jointly by IT support and skills, then
political support. On the other hand, staff motivation, leadership, time and
strategic partnerships were all seen as less significant barriers. While one
should be cautious about drawing too many conclusions from this, it does
suggest that costs and skills are at least perceived as the major obstacles – the
spirit is willing but the funding is weak.
Those respondents who suggested how these barriers might be overcome
were most concerned with gaining institutional buy-in, and developing and
embedding policies and procedures. These essential steps are discussed in
detail in Chapter 2, ‘Making the case for digital preservation’.
More detailed questions about the skills gap yielded a range of
development requirements, from generic management and IT skills to very
specific digital preservation knowledge. These highlight the importance of
access to practical and affordable training.
Another key issue highlighted was the disconnect between archivists and
information and communications technology (ICT) support services, with
relationships in some cases being described as poor or antagonistic. Allied to
a lack of budget provision for, and experience of managing, major IT projects,
this means that although most archives have access to ICT support services,
including developer resources, few are in a position to take advantage of
these facilities to develop digital preservation capabilities.
There was remarkably little consensus among archivists when asked what
their preferences were for providing digital preservation services in future.
Although an in-house repository or regional consortium was preferred by the
greatest number of respondents, almost as many ranked the in-house
solution their least favoured. The only point of consensus was a general
rejection of outsourcing to a commercial provider, although it was unclear
whether this was motivated primarily by perceived budget constraints, a
paucity of plausible commercial services, or as a point of principle.
So what can we conclude about the situation faced by smaller

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
INTRODUCTION 9

organizations today? First, the main barriers to developing digital


preservation capabilities are practical – money, skills, leveraging available
resources – rather than the more fundamental obstacles of awareness and
will, although the latter may still apply to parent bodies and other funders.
Second, most organizations have some of the basic building blocks of a
capability in place, and are not allowing the lack of a more comprehensive
capability to stop them from beginning to collect digital material. While such
an approach needs to be taken cautiously – it would be irresponsible to accept
material that one is fundamentally unequipped to preserve – it must also be
encouraged: trying to develop a complete and perfected solution in one step
can only lead to disappointment, and practical experience is essential for
learning.

1.6 A very brief history of digital preservation


Interest in the longevity of digital information and curatorial approaches to
its management have been evident since the early years of the digital
information age, and can be traced back at least to the 1960s, when the first
data archives were established. Designed to manage scientific research data,
and make it accessible to the scholarly community, archives such as the Inter-
University Consortium for Political and Social Research (1962)6 and UK Data
Archive (1967)7 laid much of the groundwork for digital curation as we know
it today.
The advent of personal computers and the internet triggered an explosion
in the creation and use of digital information, which started in earnest in the
early 1980s, gained enormous momentum in the 1990s as a result of the
emergence of the world wide web, and continues unabated to the present
day. Suddenly, the world was producing a plethora of new types of digital
information – from office documents to multimedia, web pages to 3D models,
e-mails to e-books – in hitherto unimaginable quantities. Digital information
had moved from being the preserve of big business and major research
institutions to a fact of everyday life for billions of people.
Concerns about the fragility of digital information crystallized in the
formation in 1994 of the Task Force on Archiving of Digital Information. After
two years of deliberation, this US group published a seminal report,8 which
laid the foundations for most subsequent work in the field, and continues to
shape the agenda even today. Concepts and concerns such as certification of
trusted digital repositories, format registries, cost models, and integrity and

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
10 PRACTICAL DIGITAL PRESERVATION

authenticity – which this report first articulated as a coherent set of challenges


– remain the focus of daily discussion within the digital preservation
community today, at conference, in blogs and on Twitter.
This should not be taken to indicate that the discipline has failed to
progress since the report was published, or to find answers to the searching
questions that it posed. Far from it: digital preservation today is the focus of
an enormously vibrant, active and collaborative community. Indeed, it is
instructive to look briefly at how far that community has come in such a short
period of time. In 1996, when I first began developing a digital archiving
programme at the English Heritage Centre for Archaeology,9 it was possible
to assemble and read virtually everything of note written on the subject in a
few pages of bibliography;10 16 years later, even maintaining awareness of
developments in the field is a constant challenge, reading all their published
outputs an impossibility.
As with any emerging discipline, two strands of activity are required to
progress: the development of strong theoretical underpinnings and
standards, and the establishment of a diverse and active pool of practitioners,
who can advance and expand the theory through practical application.
The publication of the OAIS Reference Model has proved to be one of the
seminal moments in the development of a coherent conceptual framework for
digital preservation. Originally developed by the space science community in
the 1990s, and released as a draft recommendation by the Consultative
Committee for Space Data Systems in 1996, it rapidly became accepted as a de
facto standard. It was formally published as a full recommendation in 2002,
before being issued as an international standard (ISO 14721: 2003), and most
recently updated in 2012.11 It sets out a detailed model of the functions and
processes required of a digital repository, as well as introducing a set of
terminology that has become established as the lingua franca of the digital
preservation community.
Another key area of standardization has been in relation to metadata.
Thanks to the emergence of internationally recognized schemes such as
METS (2001) and PREMIS (2003), the community is well served by a range of
standards tailored to the needs of digital preservation (these are discussed in
detail in Chapter 7, ‘Describing digital objects’).
While OAIS provides a conceptual model for what digital repositories
should do, the widespread development of operational digital preservation
services has led to much discussion about the detailed standards to which
they should adhere in practice. From this has emerged the concept of ‘trusted

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
INTRODUCTION 11

digital repositories’; this trend is examined in Chapter 4, ‘Models for


implementing a digital preservation service’.
The more practical development of the discipline has been driven equally
by the efforts of individual institutions in building their own preservation
solutions, and through collaborative research. Projects such as CEDARS
(CURL Exemplars in Digital ARchiveS) in 1998,12 and the Dutch Nationaal
Archief’s Digital Preservation Testbed (2000),13 were highly influential,
applying rigorous scientific principles to the development and testing of
practical digital preservation methods.
The first major digital preservation repositories were built by national
cultural memory institutions, such as the National Library of Australia (2001),
the Koninklijke Bibliotheek, the National Library of the Netherlands (2002)
and the UK National Archives (2003). Today, they are no longer the exclusive
province of such institutions, with repositories proliferating among many
other types and scales of organization, including university libraries, local
archives and business archives.
This has been enabled by the emergence of production-quality digital
repository systems, which provide the technological platforms on which to
build digital preservation services. A number of open-source solutions have
emerged, of which Fedora (1997), EPrints (2000) and DSpace (2002) are the
most widely adopted examples today. In parallel, commercial products
such as Safety Deposit Box (2003) and Rosetta (2008) have been brought to
market, often borne out of initial funding from national memory
institutions. Most recently, cloud-based services such as DuraCloud (2011)
and Preservica (2012) offer a new paradigm for providing digital
Preservation-as-a-Service (PraaS), which may be of particular interest to
smaller organizations. These technologies and the options for building
digital repositories are discussed in detail in Chapter 4, ‘Models for
implementing a digital preservation service’, and Appendix 3.
Alongside repository software, the emergence of widely available,
practical preservation tools and services such as the PRONOM technical
registry (2002), JHOVE characterization tool (2003) and DROID format
identification tool (2005) has played an essential role in making digital
preservation a practical proposition for many organizations.
The specialized discipline of web archiving has a history almost as long as
the Web itself. From the foundation of the Internet Archive (1996) and the
Nordic Web Archive (1997) to the wealth of local, national and international
web archiving programmes we see today, the huge volumes of data acquired

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
12 PRACTICAL DIGITAL PRESERVATION

have spurred the development of digital repositories capable of managing


and preserving them.14
Since the early 2000s, many advances have come as a result of major
research projects, such as those funded through the Library of Congress’
National Digital Information Infrastructure and Preservation Program
(NDIIPP) (2000),15 and the European Commission’s various research funding
programmes.16 While space does not permit a detailed account of these,
projects such as ERPANET (2001), DELOS (2004), DigitalPreservationEurope
(2006) and Planets (2006) have all had a huge impact on the development of
the state of the art, and this momentum is being carried forward in the current
crop of projects, which are discussed in Chapter 10, ‘Future trends’. Similarly,
NDIIPP has funded the development of major tools and services, including
JHOVE2, LOCKSS and the MetaArchive.
We have also begun to see the emergence of organizations dedicated to the
advancement of the discipline, such as the UK’s Digital Preservation Coalition
(2002)17 and Digital Curation Centre (2004),18 the Dutch Nationale Coalitie
Digitale Duurzaamheid (2008)19 and the international Open Planets
Foundation.20 This last, together with projects such as SPRUCE (Sustainable
PReservation Using Community Engagement),21 signals a growing
movement towards the development of nationally and internationally based
practitioner communities. Agile and enthusiastic, and centred more around
community activities such as hackathons, rather than traditional project and
institutional structures, these have the potential to advance the discipline in
new and exciting ways (as discussed in Chapter 10, ‘Future trends’).
An excellent visual overview of the history of digital preservation,
alongside key IT developments, is provided by the timeline developed by
Cornell University Library, as part of its online tutorial on digital preservation
management.22

1.7 A note on terminology


Digital preservation provides a fertile breeding ground for new terminology,
as well as finding new uses for that which is established. As a young
discipline, its specialist nomenclature has yet to mature and settle – in some
cases, a number of alternative terms have been applied to the same, or
similar, concepts. Furthermore, it bridges a number of long-established
fields, each with their own unique vocabularies. All of this can appear
calculated to confuse newcomers and seasoned practitioners alike. I have

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
INTRODUCTION 13

therefore attempted to be clear and consistent in my own use of terminology,


and have provided a glossary in part to clarify the sense in which I have
chosen to use it.
Two terms in particular appear constantly throughout the book, referring
to the subject and means of preservation respectively: digital object and digital
repository. These are so fundamental as to justify exploring them in a little
more detail now.

What is a digital object?


In this book I am using the term ‘digital object’ to signify the thing that we are
seeking to preserve, but what does this phrase really mean? It is worth taking
a moment to consider the nature of these ‘digital objects’, to really understand
what they are, and how they compare with their analogue counterparts.
Indeed, the analogue world is a good place to start. We have little difficulty
identifying and understanding the nature of physical collection objects,
whether they be printed books, parchment rolls or stone sculptures. Their
very physicality provides a natural structure for describing and arranging
them. For example, it is easy to see that there is a different relationship
between the individual pages of a book and the book as a whole, as opposed
to that between two different books by the same author; the volume provides
a natural atomic point of reference. Of course, even in the physical world it is
important to acknowledge differences in approach between the curatorial
disciplines. The hierarchical nature of archival description, for example, is
very different from the more discrete, unitary world of the library catalogue.
However, the physical nature of the material does impose structures that are
much more clearly and rigidly defined than in the digital realm.
This may not be immediately apparent. If we consider a PDF version of a
paper publication, there is a straightforward one-to-one correspondence
between the digital and physical object. However, this represents the simplest
possible case, and conceals a frequently overlooked complication. At one
level, the digital world has a very obvious and simple atomic unit: the file, but
in reality the file is a purely technological artefact, having no direct
relationship with the structure or nature of the information content.
This can be illustrated by considering the varying ways in which the same
information object might be technically represented. We can simplify this by
taking an example that is analogous to an object in the physical world – a
book. An electronic book could exist in a plethora of forms (I deliberately

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
14 PRACTICAL DIGITAL PRESERVATION

avoid referring to ‘formats’, for reasons that should become apparent). There
might be the author’s finished ‘manuscript’ version, in Microsoft Word 2000
format. Depending on authorial practice, this might comprise a single Word
file, or multiple Word files – one for each chapter. The Word 2000 files might
subsequently be updated to Word 2007 format. This is a fundamentally
different creature – each file is actually a container format, comprising a series
of separate XML documents. The printed version of the book might be
digitized, resulting in a set of TIFF image files, one for each page. These might
then be amalgamated into a single PDF file, for ease of access. An e-book
version could be created for use on devices such as the Kindle, in specialized
formats such as EPUB or Amazon’s KF8. Finally, we might envisage a web
version of the book, where each page or chapter of the book becomes a
separate web page. In this case, the book is represented as a series of HTML
files, together with a range of additional files, such as cascading stylesheets
and images, which are required to render the pages in a web browser. These
representations are summarized in Table 1.1.

Table 1.1 Alternative representations of a book


Version Technical representation
Physical 1 printed volume (comprising 12 chapters and 700
pages)
Word 2000 12 DOC files
Word 2007 12 DOCX files (each containing various XML files)
Digitized masters 700 TIFF files
Digitized access copy 1 PDF file
e-Book 1 EPUB container file (containing various XML, XHTML
and image files)
Web 12 HTML files, 1 CSS file and 15 GIF images

We can therefore see that our digital object is much more complex and
variable than its physical analogue, which has a very clear cut, discrete
existence. It can comprise one or many files, in the same or different formats;
it can comprise files contained within other files; even the relationship
between the constituent files varies – in some cases, such as with individual
Word documents for chapters, each file serves an equivalent function; in
others, such as the website, a single stylesheet file might be used by every
HTML file, and has a very different function.
And this represents the simpler end of the spectrum; something like a

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
INTRODUCTION 15

Geographic Information System (GIS) is a very complex entity, with many


component parts in sophisticated and dynamic relationships, and no real-
world counterpart.
My use of the term ‘digital object’ therefore serves as shorthand to cut
through some of this complexity. Chapter 8, ‘Preserving digital objects’,
delves deeper into the fascinating implications of the digital information
environment, and examines how we can manage these complexities through
the separation of digital objects into information objects (representing the
underlying entity, such as a book) and data objects (the technical components
of that entity, such as files), and the use of concepts such as multiple
manifestations.

What is a digital repository?


The term ‘digital repository’ conjures visions of vast, complex, expensive and
forbidding IT systems, only viable for major institutions to consider building.
This is very far from the case: as this book will demonstrate, a digital
repository is a concept, capable of being realized in many different forms, to
suit all levels of budget and expertise. For the purposes of this book, the
following definition will be applied:

A digital repository is a combination of people, processes, and technologies


which together provide the means to capture, preserve, and provide access to
digital objects.

In general, the term is therefore used in this book to refer to the body
providing the digital repository function, rather than just the systems
employed at a given point in time to help realize this. In the cases where it is
employed in the narrower sense, this should be apparent from the context.
As previously mentioned, there is a detailed, formal definition of what is
required to provide those means: the OAIS Reference Model. However,
although widely cited, and undoubtedly of great value, especially in
providing a common vocabulary for expressing these concepts, the
complexity and terminology of OAIS can be off-putting. Fundamentally, the
core functions of a digital repository are the same as any memory
organization, and can be expressed very simply: it must be able to acquire
control of new content, make that content available to its designated user
community, and perform the various preservation and management activities

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
16 PRACTICAL DIGITAL PRESERVATION

required to continue doing so for as long as required. This is illustrated in


Figure 1.1.

Figure 1.1 Functions of a digital repository

This book describes in detail how smaller organizations can develop the
practical means to perform each of these functions, with relevant case studies
throughout. It begins by looking at what is involved in building a digital
preservation capability, from making the case and securing the necessary
mandate and resources (Chapter 2, ‘Making the case for digital
preservation’), to defining your requirements (Chapter 3, ‘Understanding
your requirements’), and identifying an appropriate model for turning them
into reality (Chapter 4, ‘Models for implementing a digital preservation
service’). It then examines in detail the core repository functions:

• Capture: A repository must have a means to capture new content, and


bring it within its control. This is discussed in Chapters 5, ‘Selecting and
acquiring digital objects’; 6, ‘Accessioning and ingesting digital objects’;
and 7, ‘Describing digital objects’.
• Preservation management: The repository must be able to manage its
content so that it remains available in an accessible and authentic form.
This is addressed in Chapter 8, ‘Preserving digital objects’.
• Access: Any repository must provide a means for its users to discover
and access its content. Chapter 9, ‘Providing access to users’, covers this.

Digital preservation is a fast-moving world, with practitioners and


researchers continually evolving new ideas, techniques and tools. The final
chapter therefore takes a look at how some of these may develop over the
next few years (Chapter 10, ‘Future trends’). Lastly, the appendices include a

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
INTRODUCTION 17

number of useful templates, as well as examples of a wide range of tools and


services which may be of value to digital archivists, with links to further
information.

1.8 Getting the most from this book


Even when focusing on smaller organizations, the diversity of resources,
skills, needs and organizational contexts represented there make it very
difficult to offer practical guidance useful to all: what might seem simplistic
or familiar for one may be overly technical or simply irrelevant to others. I
have therefore tried to provide guidance which is sufficiently detailed to
provide genuine substance for the more technically minded, but which can
also be dipped into by those requiring an overview. At the end of each
chapter, a series of key points summarizes the main recommendations.
No single book can hope to offer a comprehensive account of such a vast
and varied subject. The present volume is no exception, and claims to be no
more than a starting point, an initial guide to the strange, compelling and
rewarding world of digital preservation. However, it includes pointers to
further information at every turn, with links to online sources wherever
possible, so that readers can explore particular subjects in much greater
depth, according to their inclination.
I have also included a large number of case studies throughout, for two
reasons: first, I firmly believe that practical exposition is the best form of
explanation, and second, I hope that demonstrating how smaller
organizations of all kinds have built practical digital preservation solutions
will reinforce my central thesis – digital preservation is a practical
proposition for all.

1.9 Notes
1 Waller and Sharpe (2006) provide further information about these and other
examples.
2 US Government Accountability Office (2010).
3 Boyle, Eveleigh and Needham (2009).
4 MLA East of England and East of England Regional Archive Council (2006) and
MLA East of England (2008).
5 See www.paradigm.ac.uk/.
6 See www.icpsr.umich.edu/icpsrweb/landing.jsp.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
18 PRACTICAL DIGITAL PRESERVATION

7 See www.data-archive.ac.uk/.
8 Garrett and Waters (1996).
9 Brown (2000).
10 See, for example, the bibliography in Brown (2002a).
11 Consultative Committee on Space Data Systems (2012).
12 Two snapshots of the project website are preserved in the UK Web Archive at
www.webarchive.org.uk/ukwa/target/99695/.
13 See, for example, Potter (2002) and the Testbed website, as archived by the
Internet Archive at
http://wayback.archive.org/web/*/http://www.digitaleduurzaamheid.nl.
14 For an overview of the history of web archiving, see Brown (2006), 8–21.
15 See www.digitalpreservation.gov/.
16 See Strodl, Petrov and Rauber (2011) for a detailed history of EC-funded digital
preservation research.
17 See www.dpconline.org/.
18 See www.dcc.ac.uk/.
19 See www.ncdd.nl/en/index.php.
20 See www.openplanetsfoundation.org/.
21 See www.dpconline.org/advocacy/spruce and http://wiki.opf-labs.org/display/
SPR/Home.
22 See www.dpworkshop.org/dpm-eng/timeline/popuptest.html.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
2
Making the case for digital preservation

2.1 Introduction
Building a digital preservation service requires resources, including staff time
and skills, budget and technical infrastructure. More fundamentally, it
requires an understanding from the organization that digital preservation is
a high priority, and a commitment to the principles and practice. Securing
such a mandate is therefore critical: with it, you have taken a crucial first step
towards delivering a successful service; without, it will be an uphill battle to
achieve anything.
This chapter describes the drivers for implementing a digital preservation
service, and strategies that you can adopt for making an effective business
case to secure senior management buy-in and resources. It advocates the
development of a digital preservation policy as a first step in building this
case, including a discussion of techniques for quantifying the financial and
non-financial benefits of implementing a successful preservation solution,
and introduces the concept of a digital asset register. Finally, it considers the
essential elements of the business case itself.
Building an effective business case may initially seem daunting, but can be
broken down into a series of simple steps, as illustrated in Figure 2.1.
This chapter considers each of these stages in detail, illustrated with
examples, from understanding the fundamental arguments to use, to
developing a comprehensive business case.

2.2 Understanding the drivers


A good understanding of the drivers for digital preservation is obviously a
prerequisite for developing a compelling business case. Every organization
has its own unique imperatives, but there are many generic arguments which
should be considered.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
20 PRACTICAL DIGITAL PRESERVATION

Collection development
The growing ubiquity of digital ways of working in
our business, cultural and social lives is reflected in
Start

the increasing desire by many organizations to


acquire digital content. Whether it be the library
moving from print to electronic journals, or the Understand the
gallery displaying new forms of digital art, digital
business drivers

information is becoming a fundamental aspect of


collection development.
Develop a digital
preservation
policy
Corporate memory
Most organizations retain information as part of
their ‘corporate memory’, in the form of institutional
archives and libraries. Besides the obvious historical
Develop a digital
asset register
purpose, these have a wider role in maintaining the
accumulated knowledge and expertise of the
institution. This may be especially important if
highly specialized knowledge is critical to the
Develop a

organization over long time periods, for example in


business case

the fields of aerospace engineering or


pharmaceuticals. It may also be crucial if an
organization is restructured, or in the commercial
Resources

environment when companies merge or are


secured

acquired. Needless to say, in a digital world, the


long-term viability of corporate memory depends on
Figure 2.1

digital preservation.
Making the case

User access
Many organizations besides libraries and archives provide information
access to specific groups of users, whether internal staff, specialist com -
munities or the general public. Those users require the information to be
available in an accessible form, and have expectations about its longevity. If
that information access is dependent on technology, digital preservation
facilities will be required to ensure that those expectations can continue to be
met over the long term.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MAKING THE CASE FOR DIGITAL PRESERVATION 21

Information reuse
Organizations are increasingly expecting to reuse and add value to the digital
information resources which they have invested so heavily in creating,
whether public bodies opening up their data to be exploited by third parties,
academic institutions publishing research data, or oil exploration companies
reanalysing old geophysical survey results in the light of modern extraction
technologies. Maintaining these resources in accessible forms is a prerequisite
for such reuse, and a digital preservation service provides the capability and
confidence to achieve this.

Reputational protection
The reputation of an organization may be hard to quantify or value, but
damage to it can have catastrophic consequences. The loss of digital assets
entrusted to its care, or on which its business depends, can have reputational
implications which far exceed the operational impact. Organizations may
also care about how their digital preservation policies and practices compare
with their peers or competitors. For example, many national libraries and
archives, together with universities, have already implemented digital
repositories, and this has undoubtedly provided the impetus for others to
follow suit. While there are advantages to not being at the cutting edge, not
least because those institutions can then benefit from the experiences of
others, there are significant reputational risks in being seen to be failing to
adopt current good practice.

Legal and regulatory compliance


All organizations are subject to legal and regulatory regimes, which require
them to manage their digital information appropriately, and to sustain that
information for as long as required. For example, freedom of information and
privacy legislation require relevant information to be maintained in an
accessible form, while corporate transparency and accountability measures
such as the Sarbanes-Oxley Act, 2002 in the USA and the international Basel
III Accord, as well as financial and health and safety laws, determine how
long public and private organizations are required to retain certain types of
information, and whether such information must be publicly disclosed. In
some cases, such as pension information or asbestos records, retention will be
required for many decades. Enabling legislation for cultural memory

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
22 PRACTICAL DIGITAL PRESERVATION

institutions, such as libraries and archives, may create detailed statutory


obligations for preservation, which may or may not explicitly reference
digital materials. As an example, the Legal Deposit Libraries Act, 2003 in the
UK includes provision for future regulations governing the deposit of non-
print publications; such regulations are currently being developed for
websites and electronic journals. Equally, intellectual property legislation
may circumscribe the preservation strategies available to collecting bodies,
for example by preventing the creation of copies or migration to new formats.
There is a very high risk that without proactive intervention to implement
a digital repository and associated digital preservation processes, digital
information will become inaccessible and the organization will be unable to
meet its statutory and regulatory obligations.

Business continuity
The risk of losing access to vital digital information assets is a very real, if
often overlooked, threat to business continuity in the modern world, and
appropriate digital preservation facilities should feature as a vital part of any
business continuity plan. A frequently cited statistic is that 90% of businesses
suffering a major data loss go out of business within two years.1

Efficiencies and savings


Digital preservation can support more efficient ways of working, and
therefore provide attendant savings. This can be particularly important for
smaller organizations needing to make the most effective use of limited
resources. As part of a comprehensive information management strategy, it
can minimize duplication and maximize ease of access; if the authoritative
version of a digital asset is preserved and accessible in a digital repository, the
duplicate copies that tend to proliferate in any organization’s systems can
safely be deleted, reducing storage costs and the scope for confusion for
users. Very significantly, it can avoid nugatory expenditure as a result of data
loss, and enable reuse.

Protecting investment
Organizations may invest very significant resources in the creation and
acquisition of digital information. Many libraries and archives have been

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MAKING THE CASE FOR DIGITAL PRESERVATION 23

allocating substantial budgets to the digitization of their collections, in order


to broaden access and, potentially, generate additional income. In the private
sector, digital information assets may have huge commercial value. These
investments are at risk, unless protected by active preservation intervention.
Any loss or damage to that information is likely to incur substantial future
costs to either re-create it, or engage in expensive ‘digital archaeology’ to
rescue it. In the worst cases, assets may be irreplaceable or unrecoverable, in
which case the investment would be lost forever.
The future costs of preserving digital information are substantially
reduced, the earlier sustainability is addressed; indeed, for new information,
it should be factored in from the point of creation. This will allow the
organization to minimize future preservation costs – it is much more
expensive to retrofit systems to meet preservation requirements than to
incorporate these standards from the outset.

Supporting digital ways of working


The implementation of a digital repository and other preservation services is
a fundamental prerequisite for any organization to manage its information
electronically; without the assurance that we can organize and preserve
digital information into the future as effectively as we can paper, we can
never fully realize the benefits of digital ways of working. Digital
preservation may support the future realization of benefits and savings from
other activities. For example, many public sector bodies, which currently
publish large amounts of information on paper, are looking to make
substantial savings by moving to online-only publication. Such a transition
must be supported by the digital preservation policies and procedures
necessary to ensure that the electronic publications can be maintained in
accessible form. Equally, these are required to underpin activities such as
digitization, or the implementation of new information management
systems, such as corporate Electronic Document and Records Management
Systems (EDRMS) or Enterprise Content Management Systems (ECMS).
Within the wider context of electronic records management, digital
preservation can also support transparency and accountability within public
administrations.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
24 PRACTICAL DIGITAL PRESERVATION

Rationalizing data storage and implementing shared services


Digital preservation may enable existing arrangements for the storage of
digital resources to be rationalized, for example by replacing a plethora of
niche storage areas with a single repository, realizing efficiency savings in
storage costs and in how solutions are implemented for disaster recovery,
refreshing storage media, and technology migration. It can do so by:

• eliminating unnecessary data duplication


• ensuring that data is not retained for longer than required
• ensuring that data is stored in the environment most suited to its
management requirements.

Archival information has different storage requirements to information in


current business use. While the latter typically demands high-availability,
high-performance storage, the former has much lower requirements in both
of these areas, emphasizing integrity and reliability instead. This can allow
information of long-term value, but limited demand, to be moved to lower-
cost storage environments.
Not all of the drivers discussed above apply in every case, and the detail
varies from organization to organization. A little time spent considering
which drivers are most relevant to a given situation, with concrete examples,
will be amply repaid when it comes to framing first the policies, and
subsequently the business case for implementing a practical digital
preservation solution.

2.3 Developing a policy


Any organization with a serious desire to address digital preservation
should aspire to developing a digital preservation policy as soon as
possible. Not only does this provide a basis from which detailed
requirements can then be identified, and a solid, consistent intellectual
foundation for practical solutions, it also forms an important step in
securing organizational buy-in to the principles and practice. The following
section provides guidance on how to develop an effective, realistic digital
preservation policy.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MAKING THE CASE FOR DIGITAL PRESERVATION 25

The need for a policy


A 2005 survey, carried out on behalf of the Digital Preservation Coalition,
found that only 18% of organizations surveyed in the UK had a digital
preservation policy or strategy document.2 It is tempting to make a direct
correlation between this and another finding of the survey – that only 20% of
organizations had provided adequate funding for digital preservation. This is
confirmed by a contemporaneous survey by the Museums, Libraries and
Archives Council (MLA),3 which found that only 23% of respondents had a
digital preservation policy; the survey report went on to note that
‘[organizations were] asked specifically about whether there was funding
allocated for ongoing maintenance of the digital materials created. The reply
was almost unanimously “no”.’4
Today, the situation has undoubtedly improved, as illustrated by a much
larger survey (with more respondents and a greater breadth of geographical
coverage) conducted by the EU-funded Planets project in 2010.5 This
surveyed over 200 organizations from around the world, the majority
European, and found that 48% had a policy, with 47% having the
corresponding budget to begin implementing their policy. The Planets survey
is particularly relevant in having investigated the impact of a policy in some
detail. It is worth quoting its conclusions in full:

There exists a digital preservation divide between the policy haves and the policy
have-nots.
Organizations with a digital preservation policy are more likely to include
digital preservation in their operational, business continuity and financial
planning. They are three times more likely to have secured a budget for digital
preservation, four times more likely to be investing in a solution now and three
times more likely to have a long-term solution already in place.
By contrast, organizations without a digital preservation policy are four times
more likely to have no experience or be unaware of the challenges presented by
digital preservation, three times more likely to have no plans for the long-term
management of digital information, and more than twice as likely to put off
investing in a digital preservation solution for more than two years. The existence
of a digital preservation policy is therefore a vital first step towards
implementing a solution.6

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
26 PRACTICAL DIGITAL PRESERVATION

Getting started
While having a policy is therefore a
fundamental building block for
Start

building practical solutions, dev-


eloping one may seem a daunting
prospect. Fortunately, there is a
Understand the

wide range of very useful recent


business drivers

guidance – as well as some excellent


examples of actual policies – avail-
able to draw on, and some of these
Develop a digital
Analyse what
are discussed later in this chapter.
preservation
already exists
policy
Furthermore, a standard app-
roach to developing a policy can be
adopted by any organization, as Establish scope

illustrated in Figure 2.2.


and phasing

The individual steps are dis-


cussed in more detail below.
Develop the
policy

Analyse what already exists


The policy cannot and should not
exist in a vacuum; it needs to fit
Develop a digital Communicate

within an existing organizational


asset register the policy

context, and take account of current


resources and practice. Before
embarking on policy development it
Develop a

is therefore necessary to review ex -


business case

isting policies and strategies within


the organization and externally.
These might include business plans,
Resources

IT strategies, information manage -


secured

ment policies, and corporate finance


and staffing policies. Some will be
Figure 2.2 Developing a policy

specific to particular kinds of


organization: for memory institutions, these might include collection policies
and user access policies, whereas higher education institutions may have
research strategies, and teaching and learning strategies.
It is also necessary to consider the existing information and IT

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MAKING THE CASE FOR DIGITAL PRESERVATION 27

infrastructure; although the policy should be technology-independent, being


a statement of principles rather than tied to specific IT products, it does need
to take account of current and planned IT provision and information
management systems. For example, if your IT is outsourced, you need to
consider what services can be called on, and what constraints this imposes.

Establish the scope and phasing


It is essential to be clear about the scope of the policy – which information
resources will be collected and preserved. Consideration should also be given
to the required level of detail, and the timetable for implementing the policy.
For example, a ‘big bang’ approach, where all the different, interdependent
elements of the solution need to be implemented at the same time, is not only
expensive but also complex, and therefore risky. Rather, a staged approach
may be better, starting with a high-level policy, and gradually adding more
detail over time, as part of a phased programme of implementation. It may be
more effective to begin with some ‘quick wins’, which can then build a
momentum for introducing more radical or wholesale changes. For example,
the UK Parliamentary Archives introduced a programme to archive its
websites periodically as a means to acquire important content, gain practical
experience, and raise awareness of digital preservation within Parliament in
advance of work to build a digital repository.

Develop the policy


See ‘What should go into the policy?’ below for a discussion of what to
include when writing your policy.

Communicate the policy


A policy which is not adopted is worthless. It is therefore essential to plan,
implement and review the means by which the policy will be communicated to
those who need to be aware of it. Having such a plan is key whatever the size of
the organization: larger organizations have a range of formal communication
channels to be considered, while in smaller organizations, where informal
communication may be the norm, important information and lines of
responsibility may be lost or unclear without a more formal communication
plan. It may be helpful to begin by assessing the current level of awareness

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
28 PRACTICAL DIGITAL PRESERVATION

among stakeholders. This assessment might be undertaken formally as a survey,


or through informal soundings among colleagues. A clear plan for how the policy
will be disseminated should then be developed. This should identify everyone
who needs to know about it, bearing in mind that this may include people not
directly involved in information management, and indeed possibly outside the
organization, such as end-users and peers. Approaches to communicating with
stakeholders are considered in more detail in the next chapter.
The process of raising awareness will then continue throughout the
development of the policy and beyond. As part of this process, the
effectiveness of the communications strategy should be reviewed
periodically. This might be measured directly (for example through a periodic
survey) or indirectly (for example, by looking at changes in the volume of
content submitted for preservation).

What should go into the policy?


The exact content of a digital preservation policy is dictated by the particular
organizational context, but many elements are common to all policies, and it
is therefore possible to consider a generic model.
First, there are some common principles to consider:

• Longevity: It is to be hoped that a digital preservation policy will remain


relevant and in use for many years. A process of periodic review and
revision is required to ensure this but, to minimize the need for constant
updating, it is desirable that the policy should be as future-proof as
possible. This should be supported by ensuring that the policy is focused
on principles rather than specific implementation details; the policy should
be a statement of ‘what’ and ‘why’, rather than ‘how’.
• Effectiveness: The policy must demonstrate the benefits it will provide,
and the risks it will mitigate.
• Clarity: The policy should be written in language that is accessible to its
intended audience, avoiding unnecessary jargon, and should be well
organized and logical. Its requirements should be unambiguous.
• Practicality: Implementation of the policy must be achievable, given the
resources and expertise available to the organization. It is pointless and
counterproductive to develop a policy that cannot be achieved in any
realistic timeframe. This is especially important for smaller bodies, which
typically work with tighter budget and staff constraints.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MAKING THE CASE FOR DIGITAL PRESERVATION 29

A standard digital preservation policy might include the following sections:

• purpose • roles and responsibilities


• context • communication
• scope • audit
• policy principles • review
• policy requirements • glossary.
• standards

These are considered in more detail below.

Purpose
The document should begin with a clear statement of purpose, establishing
the function of the document.

Context
The background and context to the policy should be described. This should
align the policy with organizational objectives and other relevant policies,
strategies and initiatives, as identified through your initial analysis. This is
also an appropriate place to provide some background, summarizing
relevant previous work within the organization, such as digitization or
electronic records management programmes, and anticipated next steps.

Scope
It is essential to clearly establish the scope of the policy. For example, does it
relate only to internally produced content, or material acquired from external
sources? Does it cover born-digital material, digitized documents, or both?
Does it apply to records, publications or other types of digital resource? It
may also be helpful at this point to highlight the diversity of content which
may be covered; this can help to counter misconceptions that the policy
applies more narrowly than is in fact the case, e.g. only to ‘office’ documents
such as e-mails, spreadsheets and word processed text.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
30 PRACTICAL DIGITAL PRESERVATION

Policy principles
The detailed requirements of the policy should be prefaced by the underlying
principles that inform them. This should state the key commitments which
the policy supports, including statements about archival authenticity and
accessibility, define the organization’s preservation objectives, introduce any
key preservation concepts, and discuss acceptable preservation strategies.
The policy must make clear the custodial status of archived content – who
owns it, and who is legally responsible for management, preservation and
providing access. This is essential for supporting legislative and regulatory
compliance (e.g. defining responsibilities under data protection and freedom
of information laws), and is becoming an increasingly complex issue in a
world where some or all of an organization’s digital information management
and preservation services may be contracted out, and where its data may be
hosted externally, for example in the Cloud (see Chapter 8, ‘Preserving digital
objects’, and Chapter 10, ‘Future trends’).

Policy requirements
Perhaps the single most important section of the policy is a statement of the
underpinning policy requirements for digital preservation. The following
areas are likely to be standard:

• Creation and management: Where this is within the control of the


institution, the policy should define principles governing the creation
and management of digital information before preservation. In particular,
it is essential to underline the importance of addressing sustainability
from the outset, for example by making informed choices about file
formats, and setting minimum documentation standards for creators. The
document should refer to any relevant standards, such as records
management policies.
• Appraisal, selection and acquisition: The policy should reference any
relevant appraisal, selection and acquisition policies and procedures, or
records disposal schedules. It should also refer to procedures for
transferring custody from creator to archive, and effecting the physical
ingest of content into a repository. For organizations that acquire external
content, this may include standards for depositors, specifying acceptable
formats, transfer media, documentation etc. (see Chapter 5, ‘Selecting and
acquiring digital objects’).

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MAKING THE CASE FOR DIGITAL PRESERVATION 31

• Preservation: The policy must set out the high-level requirements for
achieving preservation. This can usefully be subdivided into
requirements for bitstream and logical preservation, which are discussed
in more detail in Chapter 8, ‘Preserving digital objects’.
• Access and reuse: The policy should state the kinds of access that are
required. Will there be public access, or will it be limited to internal
users? Is networked or online access required? Does access need to be
integrated with other business systems, such as an electronic document
and records management system? How will online access integrate with
any wider organizational website? What degree of reuse must be
supported: is the emphasis on providing human-readable versions,
editable versions or machine-readable data? What access restrictions
apply, including copyright implications?
• Infrastructure: Sustainability requirements need to inform the design,
procurement and management of an organization’s IT infrastructure, for
example, when deciding which office software product to move to next;
the policy is a useful place to assert this principle. There is also a more
specific requirement that the infrastructure required for digital
preservation, such as a digital repository, must itself be sustainable for as
long as the digital resources it manages. This has an impact on the IT
architecture adopted. For example, rather than building systems where
the component parts are heavily dependent on one another, it is
preferable to keep them loosely coupled, with well defined, standard
interfaces between them; this can reduce the impact on the overall system
when individual components need to be replaced or upgraded.

Standards
The policy should identify all internal and external standards that will apply.
These may include formal international standards, such as OAIS, de facto
standards such as PREMIS or METS, sector-specific standards such as MARC
or ISAD(G), or internal documents such as technology standards.7

Roles and responsibilities


The clear assignment and acceptance of roles and responsibilities is critical to
achieving an effective policy. The document should therefore define where
responsibility lies for all aspects of the policy. This will certainly include the

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
32 PRACTICAL DIGITAL PRESERVATION

organizational units responsible for curation, IT provision and content


creation. It may also cover external suppliers and service providers.
In smaller organizations, individual staff may need to combine a number
of different roles, and it may not be possible to have dedicated posts with
responsibility for digital preservation. This is not an issue, provided
responsibilities are clearly defined, and sufficient time allowed within job
descriptions to undertake them.
An organization must ensure that its digital preservation activities are
carried out by sufficient staff with the appropriate skills. The document
should therefore identify how the organization will provide training
opportunities to allow staff to develop, maintain or enhance their digital
preservation expertise. This may include participation on courses, self-
directed learning, attendance at national and international seminars,
workshops and conferences, study visits, internships and working exchanges
with other institutions and professionals. The benefits of joining advocacy
organizations such as the UK’s Digital Preservation Coalition8 and the
Nationale Coalitie Digitale Duurzaamheid in the Netherlands9 should also be
considered. Roles and training are discussed in more detail in Chapter 4,
‘Models for implementing a digital preservation service’.

Communication
It is essential to describe the methods that will be used to communicate the
policy, as discussed earlier. The policy may also commit to raising awareness
of, and providing training in, digital preservation issues within the wider
organization and its user community.

Audit and certification


A policy is only as effective as the extent to which it is followed. It is therefore
important to set up some form of audit mechanism to monitor adoption. This
should be seen as a constructive process, intended not only to measure levels
of compliance, but also to solicit valuable feedback, which can be used to
improve the policy. Audits can be used to assess the effectiveness of a policy’s
implementation identify future priorities, and inform future reviews of the
policy (see below). The formality of the audit process will vary considerably
from organization to organization; while larger bodies may well have
existing internal audit teams which can be drawn on, smaller organizations

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MAKING THE CASE FOR DIGITAL PRESERVATION 33

may take a more ad hoc approach. In all cases the principles remain the same.
There are a number of emerging certification standards for digital
repositories, which offer a range of audit regimes, including self-certification.
These are discussed in more detail in Chapter 4, ‘Models for implementing a
digital preservation service’.

Review
Any policy document requires periodic review, to ensure that it remains up
to date and relevant. It is therefore important to define the frequency of
review and a mechanism by which this will be achieved. As discussed above,
a policy should not be subject to very frequent change; as a guide, a review
frequency of every two years would be typical. Reviews will also be required
as a result of major organizational or technology changes.
Consideration should be given to how the review will be undertaken. Who
should be involved? Is external participation desirable? It may be extremely
valuable to invite professional colleagues from other institutions to assist
with this.

Glossary
The policy should be easily understood by non-specialists. In a technical,
jargon-laden field such as digital preservation, a glossary is therefore always
helpful. If intended for an external audience, any organization-specific
acronyms and terms should also be included.

Models and sources for digital preservation policies


The ERPANET project published a policy tool in 2003,10 while a 2008 study
commissioned by the Joint Information Systems Committee (JISC) in the UK11
looks at what should go into an effective digital preservation policy, and
provides very useful practical guidance. Although focused on the UK higher
and further education sectors, it draws widely on policies and implementations
from other sectors and countries and is therefore much more widely applicable.
It provides a model policy and framework, which organizations can easily
adapt to their own circumstances. It considers both high-level policy, and
implementation issues, and includes exemplars of individual clauses.
The UK’s Digital Curation Centre has published a template for digital

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
34 PRACTICAL DIGITAL PRESERVATION

preservation policies,12 which includes example policy statements for each


section. It has also developed an online tool for creating the data management
plans required by research funding bodies of their grant-holders.13 While
specific to the higher education research sector, the tool includes much that is
relevant to the development of digital preservation policies more generally. A
US equivalent of the tool is also available.14 The Inter-University Consortium
for Political and Social Research (ICPSR), based at the University of Michigan,
has also developed an outline for a policy framework,15 which is intended as
a model for any organization to use.
The international OpenDOAR directory of open access repositories
website includes a tool for creating repository policies (Figure 2.3).16
Specifically, it provides support for creating metadata, data, content,
submission and, most relevantly, preservation policies. It is worth noting that

Figure 2.3 Web form for creating a repository policy in OpenDOAR (University of
Nottingham)

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MAKING THE CASE FOR DIGITAL PRESERVATION 35

OpenDOAR uses a very narrow definition of ‘preservation policy’ – in the


context of this discussion, all five of the policy types covered by the tool
would be relevant to a digital preservation policy. The tool provides a simple
web user interface for creating policies. For each type of policy, a range of
options can be configured, using check boxes. For example, the preservation
policy covers retention periods, functional (logical) preservation, file
(bitstream) preservation, withdrawal of items, version control and closure of
content; each of these can be customized to individual needs. Once the
policies have been defined, they can be saved in HTML format for the web,
plain text, and even as a configuration file for the EPrints digital repository
software, enabling automated execution of the policies. Although
OpenDOAR is aimed specifically at the open access repository community,
the tool is more widely applicable, and could certainly be helpful as a starting
point for any organization that is planning a policy.
A number of institutional digital preservation policies for libraries,
archives and other kinds of organization are available online, and may be
helpful as models. A very non-exclusive list, biased towards smaller
organizations, includes:

• Archives:
□ Hampshire Record Office (2010)
□ Swiss Federal Archives (2009)
□ UK Parliamentary Archives (2009)17
□ West Yorkshire Archive Service (2007)
• Data services:
□ Arts and Humanities Data Service (2004)18
□ ICPSR (2007)19
□ UK Data Archive (2011)
• Libraries:
□ Columbia University Libraries (2006)
□ National Library of Australia (2013)
□ Yale University Library (2007)
• Museums and galleries:
□ National Museum of Australia (2012)
• Multi-disciplinary organizations:
□ Guildhall Library Manuscripts and London Metropolitan Archives
(2008)
□ Libraries and Archives Canada (2008)

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
36 PRACTICAL DIGITAL PRESERVATION

□ Wellcome Library (2007)20


• Commercial services:
□ Online Computer Library Center (OCLC) (2006).

Once created, the policy needs to be ratified and adopted. The process for this
will vary from organization to organization, but the policy should be
endorsed by a group with sufficient seniority and influence to carry real
weight; if at all possible, it should be approved at board or equivalent level.
This also provides an excellent opportunity to publicize and promote digital
preservation within the organization, and to begin to engage with data
creators and owners, helping them to understand their responsibilities. For
example, the policy could be published on the organization’s website, and
highlighted in relevant promotional materials, presentations and literature.
Having a digital preservation policy provides a firm foundation for
beginning to develop a practical service, and demonstrates organizational
commitment to the principles involved, but it is not usually sufficient in itself
to secure the resources required to put the policy into practice. For this, a more
detailed plan of action and business case will typically be needed. The
remaining sections of this chapter examine how to build such a case.

2.4 Developing a digital asset register


The most powerful arguments are usually those which are supported by
concrete evidence. The case for digital preservation is undoubtedly
strengthened if real examples of digital assets that are important to an
organization and at risk of loss can be found; the benefits of ensuring their
preservation, and the consequential impact arising from their loss, can then
be measured in terms that will resonate most strongly with the organization.
For example, in developing its business case for digital preservation, the
Parliamentary Archives quantified the creation cost and usage of a number of
important digital collections, including Historic Hansard21 and conservation
photographs of works of art,22 to illustrate the value – financial and otherwise
– of the assets that would be protected by a digital repository.
A digital asset register can be a very useful tool for illustrating tangible
risks and benefits. In essence, it is a list of an organization’s digital assets that
analyses the risk of loss, and the impact of that loss, in each case. It should:

• identify digital assets requiring long-term access

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MAKING THE CASE FOR DIGITAL PRESERVATION 37

• identify the threats to future accessibility


• quantify the risks of those threats materializing
• quantify the costs and other impacts that would be incurred if the threats
materialized
• quantify the benefits to be derived from continued access
• determine a priority for action.

Methods for doing so are described below.


The level of detail applied can be varied to meet individual requirements,
and the work involved in developing and maintaining a register can be as
great or little as the situation demands, and resources permit. Even a very
high-level register, however, describing assets in the broadest terms, can
provide tremendous support when making the case for preservation. This
value is multiplied by the variety of uses to which a digital asset register can
be put, once created – some of these are considered at the end of this section.
The register should include:

• basic information about the asset, such as its name, a brief description,
and an identified business owner
• a basic categorization of the type of asset (e.g. digitized images, database,
office documents, or a website); this will help to inform some of the
generic risks that may need to be considered
• volume information, including the current volume of the asset and,
where applicable, estimates of future growth rates; this helps to give a
sense of scale
• identification of the main vulnerabilities; it is essential to identify the key
threats to the future accessibility of the resource
• identification of the benefits of continued access, and the potential for
reuse; for example, if a particular set of documents is heavily used by
researchers, or a dataset could have future commercial value, this should
be highlighted
• identification of the likely impact if the asset were to be lost or damaged;
this might be reputational, operational or commercial impact
• an estimate of the financial value or other economic impact of the asset;
this might be calculated in a number of ways, including:
□ the original cost of creation of the asset
□ the cost of rescuing the asset, if damaged
□ the cost of re-creating the asset, if lost

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
38 PRACTICAL DIGITAL PRESERVATION

□ the potential revenue from its commercial use


• a risk assessment, which should use a numeric assessment of the
probability, impact and proximity of the risk of loss to calculate an
overall risk score. The probability and proximity will be informed by the
vulnerability assessment, while the impact can be determined from the
financial value and non-financial impacts. Although a degree of
subjectivity remains in any risk assessment, this can be minimized
through the use of a standard scoring mechanism.

An example template for a digital asset register is shown in Appendix 1. Once


the register has been created, it can be used in a number of ways. First, it
should provide as comprehensive and accurate a list of assets requiring
preservation as possible. This is not only a practical tool for the digital
archivist, but also provides concrete evidence of the reality of the threat and
its likely impact; this will prove invaluable in putting together a business
case. Its other functions can include:

• helping to prioritize content for preservation, using the risk score; in the
first instance, this may be used to identify any assets requiring urgent
treatment, in advance of a full digital preservation solution; once a digital
repository has been implemented, it can then form the basis for a
programme to ingest content
• helping to prioritize the development of future preservation strategies
• providing the basis for calculating savings that will accrue from
implementing a digital repository, and the costs of doing nothing; for
example, it can be used to calculate a profile of the opportunity costs for
re-creating or rescuing the identified assets over the life of the project; for
each asset, a proportion of the re-creation cost can be included, based on
the probability of loss; the year(s) in which these costs are assigned can
then be based on the proximity of the threat; this method is explained in
more detail in Appendix 1
• predicting demand for future storage growth
• identifying new potential for reuse of digital assets
• illustrating the breadth and depth of content requiring preservation for
stakeholders, including potential suppliers and users.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MAKING THE CASE FOR DIGITAL PRESERVATION 39

2.5 Developing a business case


The final step in the process of making the case for digital preservation is to
secure a concrete commitment from senior management, and the necessary
resources to establish a functioning service. Within most organizations, this
will require a formal business case of some description.
Organizations have their own procedures and templates for business
cases. In addition, the level of detail required may vary considerably.
Nonetheless, any business case needs to address certain standard questions.
This section describes a generic template for a business case, and examines
how this can be applied to the specific challenge of digital preservation. Much
of the information from the policy and digital asset register can be reused
here. The SPRUCE project is developing a generic business case for digital
preservation, which may also prove useful.23
A typical business case includes the following sections:

• executive summary
• introduction
□ objectives
□ deliverables
• strategic intent
□ benefits and risks
□ critical success factors
• context
• options assessment
• dependencies
• project organization
• project risks.

These are considered in more detail below.

Executive summary
The main document is likely to be long and detailed, and will need to be
reviewed and approved by a diverse audience, from senior management to
technical staff. A succinct statement (no more than one or two pages) of why
the project is required, what resources are being requested, and what it will
deliver in return is therefore invaluable.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
40 PRACTICAL DIGITAL PRESERVATION

Introduction
The opening section of the business case should establish the background to
the project, and define its raison d’être. This is the place to include a basic
introduction to the challenges of digital preservation, and why they matter to
the organization – the analysis of the organizational drivers, discussed at the
beginning of this chapter, can be used to inform this. It should also clearly
define the fundamental objectives of the project, and concrete deliverables
which will be produced. Example objectives and deliverables for a generic
digital preservation project are given below.

Objectives
• ‘Take urgent action to safeguard the organization’s most vulnerable
digital information.’
• ‘Meet the organization’s legal and regulatory responsibilities, e.g. for data
protection and freedom of information.’
• ‘Ensure that access to digital resources is maintained, throughout their
planned life cycle, preserving both active business information and
information of permanent historical value for future users.’
• ‘Ensure that processes are implemented across the organization, to
ensure that newly created information adheres to digital preservation
standards.’
• ‘Safeguard the organization’s investment in the creation and maintenance
of digital resources, enabling full benefits realization and avoiding
wasted expenditure in the future (e.g. on expensive digital archaeology).’
• ‘Provide input to other information-related projects to ensure that digital
preservation issues are considered in their planning, thus avoiding or
reducing future costs.’
• ‘Support and underpin all the organization’s activities, programmes and
projects which create or receive material in digital format by ensuring
that access to it can be guaranteed for as long as is needed.’
• ‘Contribute to the reduction of data storage costs by using the most
efficient archival storage technologies.’

Deliverables
• ‘Establish a digital repository for content identified for long-term
preservation, including work flows for ingest of content from a range of

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MAKING THE CASE FOR DIGITAL PRESERVATION 41

environments, and a baseline for user access to archived content.’


• ‘Implement a technology watch and preservation planning process to
identify and mitigate threats to future access.’
• ‘Introduce technology standards and policies to support the
sustainability of future information.’
• ‘Develop and deliver a range of training and guidance for stakeholders.’

Finally, it will be helpful to summarize any progress to date, including any


‘quick wins’ already achieved.

Strategic intent
This section sets out why the proposed work is necessary to the organization
at a strategic level. It needs to put forward the case for why the work is
required now, and cannot be deferred to some future date. It must explain
how the work would fit into the broader strategic context. If possible, the
relationship to corporate objectives, business plans or strategies should be
defined. This definition should draw on previous work to define the context
for the digital preservation policy.
This section should also analyse the benefits that the project will bring and
the risks it will address, together with the ‘critical success factors’ for the
project. These are considered in more detail below.

Benefits and risks


It is essential to identify and quantify the benefits of digital preservation from
the outset. Clearly, these will be central to building a case for action. More
broadly, they will provide a basis for defining detailed requirements, and a
benchmark against which to assess the success of any solution. These benefits
may take many forms and will differ between organizations.
You should also clearly state the risks to the organization that will be
mitigated by carrying out the work. In many cases, these will be the
inverse of a benefit. Although it is usually preferable to emphasize the
positive reasons for taking action, it is important to recognize that there
are also powerful arguments to be made from the risks of not taking
action. Whether it is more effective to emphasize the risks or the benefits
is a matter of judgement, and will depend on the culture of the
organization in question.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
42 PRACTICAL DIGITAL PRESERVATION

Both risks and benefits derive from the business drivers, discussed at the
beginning of this chapter. Thus the imperative to maintain a corporate
memory would give rise to a benefit – ensuring that the organization has
persistent access to its digital resources – and mitigate a risk – damage or loss
of corporate information.
Some benefits may give rise to savings, e.g. from more efficient use of
electronic storage or improved working practices, and also avoided future
costs, such as having to rescue or re-create lost data. It helps to be as specific
as possible in defining these financial benefits, while remaining realistic –
over-promising savings is never a good idea. It is also useful to include details
of how the figures have been calculated. If a digital asset register is being
used (see above) it can provide the basis for detailing avoided costs.
The JISC-funded project Keeping Research Data Safe (KRDS) in the UK has
developed a toolkit which can help organizations to understand and
demonstrate the benefits, value and impact of digital preservation. This may
prove very helpful when it comes to articulating this part of the business
case.24

Critical success factors


The business case should define the critical success factors for the project.
These are the main criteria against which the success of the project can be
measured, and should be framed as statements describing what a successful
outcome would look like. These are examples of typical critical success
factors:

• ‘An affordable, flexible and scalable digital preservation solution is


implemented to enable persistent access to the organization’s current
digital resources and those which will be created in future years.’
• ‘Content owners can deposit historically significant data for preservation
easily, and users can access archived data effectively.’
• ‘Users trust the organization’s preserved digital assets as being authentic
and reliable.’
• ‘Organizational change is managed and staff are supported as they
develop new skills in creating, managing and accessing digital resources.’

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MAKING THE CASE FOR DIGITAL PRESERVATION 43

Context
The document should describe the context for digital preservation within and
beyond the organization, including the types of digital resources that need to
be preserved, and reference to other relevant internal and external projects
and programmes. Ideally, it should also provide an overview of the current
market for digital preservation solutions. This will provide a good
introduction to the detailed assessment of the options.

Options assessment
An assessment of the available options, together with a recommended
approach, is one of the crucial elements of any business case. In seeking
management approval for a project, it is essential to demonstrate that all
realistic options have been considered, and to show the basis on which the
recommended option has been selected. A ‘do nothing’ option should always
be included, to provide a comparison for all the positive options.
Each option should be analysed in detail, with a description of the option,
and an assessment of the advantages, disadvantages and predicted
resourcing implications, including staffing and non-staff costs. The range of
options that are typically available to an organization is discussed in depth in
Chapter 4, ‘Models for implementing a digital preservation service’.
As part of this, you need to assess potential sources of funding. These may
include internal funding, partnership funding, or external grants from bodies
such as funding councils or the EU. It is important to include any proposals
for revenue generation, such as charges for depositors or end-users, which
may offset the start-up and running costs.
It may be helpful to provide some form of sensitivity analysis for the
options. This is a way of showing how susceptible each option is to changes
in external factors. It requires two steps: first, identify the key factors that
might vary, for example changing timetables for the project (e.g. to balance
the needs of other projects), variations in the available resources (e.g. arising
from cuts in government funding) and different levels of demand to ingest
content into the repository. Second, identify the implications of these changes
for each option. For example, using an external service provider might enable
you to cope better with changing levels of demand, but offer little flexibility
on budget. An in-house solution might offer the reverse.
The final part of this section should be a clear statement of the
recommended option for which the business case is seeking approval. It

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
44 PRACTICAL DIGITAL PRESERVATION

should clearly and simply set out the rationale for choosing the favoured
option, and rejecting the others.

Dependencies
It is essential to identify any dependencies with other projects or operational
activities. Dependencies can go in either direction. For example, having an
operational digital repository might be a dependency for enabling future
content creation projects, such as digitization, to proceed. Equally, a digital
repository project might rely on another project to redesign the
organizational website, to enable public access to archived content. When
describing dependencies, it is important to define the nature and the timing
of the dependency.

Project planning
It may be helpful to give some indication of the approach that will be taken to
structuring the programme of work, if known. For example, the project may
naturally divide into a number of discrete work streams. An outline timetable is
also useful, together with an indication of any governance structure, such as a
project board and the composition of the project team. It is often helpful to adopt
some form of project management methodology. This should very much be
tailored to the time and expertise you have available, and the size of the project
– it is very counterproductive to impose over-elaborate project management
processes on a simple project – but it is always useful to employ a project
management mindset: plan the tasks that need to be undertaken, and the order
in which they must occur, identify the people and resources required to achieve
them, and monitor how events actually unfold in relation to that plan.

Project risks
The business case should identify the key risks to the success of the project.
These are entirely separate from the risks of not undertaking the project,
discussed above. At this stage, it may only be possible to identify high-level
risks – developing and maintaining a detailed project ‘risk register’ will be a
standard project management task once the project itself gets under way, but
these risks can still be expressed in similar form at this stage. Typically, this
includes the following elements:

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MAKING THE CASE FOR DIGITAL PRESERVATION 45

• a description of the risk


• a risk score
• a risk owner: the person or group responsible for managing the risk
• a summary of the proposed mitigation for the risk.

Appendices
The appendices can be used to provide supporting information. This might
include a copy of the digital asset register, and a detailed options investment
appraisal, analysing the proposed budgets for each option, with costs,
revenues and savings, and discounted cash flows to assess the true value of
the investment over time.

2.6 Next steps


The combination of a clearly defined policy for digital preservation, and a
well argued business case for the resources necessary to implement that
policy, should secure an organization’s commitment in both principle and
practice. The time and effort which may be required to achieve this must not
be underestimated; however, the reward will be a practical, achievable route
to developing a digital preservation solution appropriate to the needs of the
organization.
With that mandate secured, you can move on to the next step: defining
your detailed requirements for a digital preservation solution. This process is
described in the next chapter.

2.7 Key points


• Understand the drivers for your organization to undertake digital
preservation: Understanding the risks and benefits that apply to your
particular circumstances will help you to build a compelling argument
for action.
• Create a digital preservation policy: This provides the intellectual
framework for developing your business case and understanding your
requirements.
• Build a digital asset register: This provides the evidential basis for your
business case.
• Develop a detailed business case: This represents the culmination of

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
46 PRACTICAL DIGITAL PRESERVATION

your argument, and is intended to secure your mandate.


• Reuse what others have done: There are many excellent examples of
policies and business cases available, which you can draw on.

2.8 Notes
1 Despite its frequent citation, it has proven difficult to confirm the source for this
statistic. While it should therefore be treated with caution, there does appear to
be good evidence to support it.
2 Waller and Sharpe (2006).
3 Simpson (2005).
4 Simpson (2005, 14).
5 Sinclair (2010).
6 Sinclair (2010, 9).
7 A wide range of relevant standards is discussed elsewhere in this book, and listed
in the bibliography.
8 See www.dpconline.org/.
9 See www.ncdd.nl/.
10 ERPANET (2003).
11 Beagrie et al. (2008).
12 See www.dcc.ac.uk/webfm_send/236.
13 See https://dmponline.dcc.ac.uk/.
14 See https://dmp.cdlib.org/.
15 See www.icpsr.umich.edu/files/ICPSR/curation/preservation/policies/dp-policy-
outline.pdf.
16 See www.opendoar.org/tools/en/policies.php.
17 Parliamentary Archives (2009).
18 See James (2004) for the preservation policy, although the full range of policies
available at www.ahds.ac.uk/about/reports-and-policies/index.html are of
interest.
19 McGovern (2007).
20 Checkley-Scott and Thompson (2007).
21 See http://hansard.millbanksystems.com/.
22 See www.parliament.uk/about/art-in-parliament/.
23 See http://wiki.opf-labs.org/display/SPR/
The+SPRUCE+Business+Case+for+Digital+Preservation.
24 See www.beagrie.com/krds.php.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
3
Understanding your requirements

3.1 Introduction
This chapter provides guidance on how to identify and understand your
requirements for digital preservation services, from high-level needs to the
detailed documentation necessary to enable systems to be developed or
procured.
The importance of understanding your requirements as a precursor to
implementing any kind of solution cannot be overstated. To omit this step
should be as unthinkable as attempting to build a house without detailed
architectural plans. Taking the time to do this properly will improve the
quality of the end result. This is even more critical for smaller organizations
with limited resources, where maximizing value for money is vital;
understanding what matters most ensures that those resources can be
invested to achieve the greatest possible impact.
This chapter begins by examining how to develop a set of requirements,
including identifying and engaging with everyone who can and should
contribute, modelling business processes, and drawing on existing work. It
then considers how requirements should be articulated and documented, and
the types of requirement that need to be considered. Finally, it looks at how
the resulting requirements can be applied in practice, as a basis for
developing actual systems and services.

3.2 Identifying stakeholders


An organization’s requirements for digital preservation will not derive from
any individual or single group; a wide variety of parties will have an interest.
These may be key individuals, or those representing the interests of wider
groups with whom it is not possible to engage directly. They may also be
decision makers, who need ultimately to authorize the adoption of those

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
48 PRACTICAL DIGITAL PRESERVATION

requirements as a basis for implementation. Identifying these various


stakeholders, and what role they need to play, is therefore an essential first
step in understanding one’s requirements. They will, of course, vary
considerably from case to case, but a number of common categories of
stakeholder should be considered:

• Content creators and managers: The people who actually create or


manage the content to preserve are clearly an essential group. These may
be internal staff or external depositors and publishers, and this
distinction will be a major factor in determining by what means, and
indeed to what extent, they can be identified in advance and consulted.
Understanding the nature of the content that needs to be preserved and
the processes by which it is managed prior to archiving is essential to
inform the design of any solution. Equally, you almost certainly need to
influence this group to adopt working practices and technology
standards which are as conducive to preservation as possible.
• Information managers: Those with responsibility for information
management and curation within the organization are of self-evident
importance as stakeholders. This is the group that will have
responsibility for managing a future digital preservation service, and be
accountable for its successful operation. Defining a solution that meets
their needs should be a given. This may also be the group that will be
required to undergo the most significant and fundamental changes to
their working practices and expertise; managing that change successfully
is therefore vital.
• IT providers: The IT function within an organization will be a key
stakeholder. IT providers will need to understand, and be happy with,
the technical impact of any proposed solution, including its effect on
other systems, and its support needs. They will almost certainly be
required to provide key resources during the design and implementation
of the solution, which may include project managers, architects and
analysts; where an in-house route is being taken (see Chapter 4, ‘Models
for implementing a digital preservation service’), they might also include
developers. In many cases they will also be expected to support any
solution, once it is operational. They will almost certainly play a major
role in defining the technical architecture of the solution, and setting any
technology standards which will need to be followed.
• End-users: The people who will eventually use the preserved content can

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
UNDERSTANDING YOUR REQUIREMENTS 49

arguably be considered the most important stakeholders of all, at least in


the long term – they represent the ultimate motivation for undertaking
digital preservation. Like data creators, these end-users may be internal
or external. Their interests will lie principally in the use and reuse of
archived content, and should shape requirements for how that content
can be discovered and most usefully made available to them.
• Decision makers and funders: Those who ultimately make decisions and
control budgets are clearly a key stakeholder group. It is vital to
understand what information they will require to enable them to make
these decisions, and what arguments are likely to prove most persuasive.
Ideally, a sponsor within this group should be found, who can champion
the cause of digital preservation at a senior level as well as advising on
how best to manage communications with this group.
• Potential suppliers: In many cases, the future solution will rely at least in
part on external suppliers, such as software vendors or other
organizations offering tools or services. While it may be difficult, or
indeed inappropriate, to enter into detailed dialogue with potential
suppliers, and care must be taken not to compromise any future
procurement exercise, some consideration must be taken of their
capabilities. The market for digital preservation solutions is still small
(see Chapter 4, ‘Models for implementing a digital preservation service’),
and an understanding of the current state of the art is important to
inform one’s requirements. At this stage, engagement with suppliers
might be limited to desktop research into the market, and perhaps
talking to them informally at events such as conferences.

There may be additional, or more specific, groups of stakeholders in


particular organizational contexts, but the categories above should provide a
good starting point. Having identified the relevant categories of stakeholder,
these should be translated into actual groups, whether individuals or teams.
Ultimately, named individuals need to be identified and approached. Care
should be taken to ensure that the identified stakeholders are genuinely
representative. For example, a local archive may have a very large number of
external depositors, and it becomes important to strike a balance between
limiting numbers to a manageable level and adequately covering likely
variations in requirements. It is also essential to make sure that personalities
or politics do not dominate, to the detriment of requirements.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
50 PRACTICAL DIGITAL PRESERVATION

3.3 Talking to stakeholders


The appropriate means of communicating with stakeholders varies,
depending on who they are and what information needs to be elicited from,
or communicated to, them. Some types of stakeholder, such as data creators,
users, IT staff and information managers, may need to be involved in detailed
discussions about requirements, while others, such as decision makers, may
only require periodic, high-level updates on progress. For each of your
stakeholders, you should identify the kind of communication required.
It is essential to remember that communicating with stakeholders is a two-
way process – you must think not only about what you need to find out from
them, but also what you want to tell them. Depending on this, a number of
strategies can be used, including:

• questionnaires and surveys


• structured interviews
• workshops
• collaborative authoring of documents (for example using an institutional
wiki)
• informal conversations.

Table 3.1 illustrates the types of communication that are typically needed
with different types of stakeholder. It shows both the types of information
which may be required from them, and which should be communicated to
them.

3.4 Modelling your processes


Many requirements relate to business processes. For example, ingest is a
process (see Chapter 6, ‘Accessioning and ingesting digital objects’), and the
requirements for ingest functionality in a digital repository derive from the
sequence of activities that make up that process. Analysing the processes
required can therefore be a useful tactic for defining requirements. Processes
themselves derive from policies, and requirements can therefore be seen as
the final expression of organizational will, which is defined at the highest
level in policy and strategy documents, distilled from these into specific
processes and standard operating procedures, before finally being articulated
as requirements for the systems that will support and implement those
processes (Figure 3.1).

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
UNDERSTANDING YOUR REQUIREMENTS 51

Table 3.1 Stakeholder communications


Communication
Stakeholder types Ask them about... Tell them about...
Content creators and • what content they create • information management
managers • formats, systems and standards
processes for creation and • progress reports
management
• data volumes
• business use
Information managers • descriptive standards • potential changes in
• information management practice
standards • progress reports
• working practices
• hybrid collections
IT providers • technology standards • technical requirements
• IT strategy • data volumes
• storage • information management
• infrastructure requirements standards
• progress reports
End-users • what content they use • progress reports
• formats, systems and • content availability
processes for use
• reuse
• frequency of use
Decision makers and • formal approvals • business case
funders • funding decisions • resource implications,
including costs
• milestone achievements
Potential suppliers • capabilities • procurement process
• outline costs • agreed requirements

The following example illustrates how this sequence might be put into
practice. We might start with a statement in a digital preservation policy as
follows:

All ingested objects will be in acceptable formats, accompanied by metadata


which meets the repository minimum standard, and free from viruses.

Figure 3.1 From policies to requirements

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
52 PRACTICAL DIGITAL PRESERVATION

This might be expanded into a series of processes such as:

• validate formats
□ identify formats
□ check against acceptable list
• validate metadata
• virus check
□ perform initial virus check
□ quarantine
□ perform second virus check.

This in turn might yield the following requirements:

• The repository must provide an automated means to identify the formats


of all files submitted for ingest, to compare the results against a
configurable list of acceptable formats, and reject any files in non-
preferred formats, generating an explanatory message for the
administrator.
• The repository must provide an automated means to validate all
metadata submitted for ingest against a defined repository schema, for
accuracy and completeness. It must report any validation errors to the
administrator.
• The repository must provide a means to check all files submitted for
ingest for malicious software. This check must be performed before and
after a quarantine period of configurable duration, using the latest
available malware definitions on each occasion.

If possible, it may well be fruitful to undertake some detailed modelling


of the underlying business processes; this can range in sophistication and
complexity from the use of simple flow diagrams to formal methodologies
such as the Unified Modelling Language (UML).1 To get the most from
this requires the specialist skills of an experienced business analyst, but it
can still be a valuable exercise without access to such expertise. Though it
may sound daunting, the principle is simply to identify each process
involved in operating a digital preservation service, and then to break it
down into its component activities, analysing in each case who (or which
system) performs the task, where in the sequence that task occurs, and any
prerequisites or outcomes it may have. A process model for the virus

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
UNDERSTANDING YOUR REQUIREMENTS 53

check element of the example above is shown in Figure 6.3 (p. 135).
The resultant models can greatly simplify the task of defining
requirements, as well as provide a means of checking their consistency
and completeness. Each activity in a given process will have an associated
requirement; this might be for a system to undertake an automated step,
such as virus checking, or to enable human input, such as approving a
preservation plan. If the models are complete, and requirements defined
for each step in those models, it is reasonable to assume that the
requirements have been fully captured.
In future, the definition of formal process models or rules may become
even more significant. A growing area of research within the digital
preservation community is the use of rules to automate preservation
processes. The premise is simple: if we can derive explicit, unambiguous
rules from our policies, those rules can then be implemented in software.
If policies change, these changes are articulated as new or modified rules.2

3.5 Learning from other people’s requirements


Many organizations have already been through this process, and
documented their requirements, and these can therefore provide good
examples to draw on. The following is a non-exhaustive list of examples
that may be helpful:

• Rosenthal et al (2005) offers an interesting introduction to defining


requirements from the bottom up.
• In 2009, the EU-funded SHAMAN project published an analysis of
requirements for three user communities: memory institutions,
industrial design and engineering, and e-science.3
• The US National Library of Medicine issued a statement of
requirements for a digital repository in 2007, structured around the
classic OAIS functions.4
• In 2010 the US National Archives and Records Administration
published the requirements document for its Electronic Records
Archive programme.5
• Between 2008 and 2010, the Wellcome Library issued a number of
invitations to tender, including detailed statements of requirements,
for its digital repository6 and an associated workflow tracking
system.7

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
54 PRACTICAL DIGITAL PRESERVATION

• The JISC-funded Repositories Support Project in the UK issued a


briefing paper in 2008 called ‘Specifying repository requirements’.8
• The various standards for trusted digital repositories (discussed in the
next chapter) provide a useful basis for thinking about requirements.

These examples cover organizations of all sizes. While it may be useful to


start by looking at those most similar to your own context, you should not be
limited by this; for example, a small organization can very usefully adapt
requirements from a national body. Fundamental requirements for digital
preservation are very similar at all scales – most differences appear only in
the detailed implementation. For example, repositories of all sizes have
requirements for virus checking and quarantine.

3.6 Documenting your requirements


Having consulted with stakeholders, looked at what analogous organizations
have done, and analysed the underlying business processes and policies, you
should be well placed to begin formulating a set of requirements. This
process may be best delegated to a small group, with the necessary time and
expertise available.

Getting the right level


When defining requirements, it is important to focus on the desired result,
rather than the means of achieving it. This is sometimes referred to as an
‘outcome-based’ approach.
One of the most difficult aspects of analysing your requirements is to select
the correct level of detail. If requirements are defined at too high a level (e.g.
‘the system must preserve digital objects’) then they become meaningless. If
they are defined at too low a level (e.g. ‘the system must perform integrity
checking with the SHA-1 algorithm, using the SHA-1Generator 1.2 software’)
then they begin to stray into the realms of system design, placing a
straitjacket on potential solutions.
Although highly subjective, a requirement can be considered to be defined
at the right level if it articulates the outcome that needs to be achieved with
enough detail for a potential supplier to suggest a concrete solution, but
doesn’t actually specify how the solution might be provided. A good
requirement defines the problem, but leaves the solution open. Thus, an

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
UNDERSTANDING YOUR REQUIREMENTS 55

integrity checking requirement might be expressed as follows:

• The solution must provide a facility to automatically monitor the


integrity of digital objects and their metadata, including audit trails. The
frequency of integrity checking should be user-configurable.
• The method of integrity checking must be modular, configurable, and
capable of being changed without impact on the wider solution. The
integrity monitoring solution must be scalable, in accordance with the
rate of growth of the repository.
• The solution must detect and repair data integrity errors, and maintain
an audit trail which records the date/time and method used for all
integrity checks, any errors detected, and the date/time, method, and
success or failure of any repair.

Types of requirement
Requirements for systems are usually subdivided into ‘functional’ and ‘non-
functional’. Functional requirements describe the desired functionality of a
system – they define what it should do. These are usually accompanied by
non-functional requirements, which define the overall characteristics of the
system, and any constraints or standards that apply to its design. For
example, a functional requirement might state that the system needs to
provide a means to characterize the formats of all digital objects during
ingest. A non-functional requirement might be that the system must support
100 concurrent end-users.
In some cases, organizations also define ‘service’ requirements, for services
required to support the system, such as user training. In other cases these
may be incorporated within the non-functional requirements.
In addition to the requirements themselves, it is helpful to provide
background information to place them in their context. This might briefly
describe the organization and its goals, and the background to the project,
and set out the overarching approach and philosophy that the requirements
reflect.

Developing a requirements catalogue


The ultimate expression of your requirements will be a requirements
catalogue, a document that collates and describes the requirements in a

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
56 PRACTICAL DIGITAL PRESERVATION

consistent, detailed manner. Many approaches to documenting requirements


are possible, ranging from the narrative to the highly structured. This is
largely a matter of choice and organizational preference. The following
information will always be required:

• a unique identifier for each requirement, for ease of reference


• a description of each requirement
• a statement of whether each requirement is mandatory, desirable,
optional or for future development
• a statement about the source or derivation of the requirement; this is
important for tracing a requirement back to the underlying processes and
policies from which it arose (see above).

What is essential is to be precise in the phrasing of requirements, so they are


unambiguous. In particular, the following points should be borne in mind:

• Consistency: Be consistent in the use of words such as ‘must’, ‘may’ and


‘shall’. The convention9 is that ‘must’ or ‘shall’ are used for mandatory
requirements, and ‘may’ for optional ones, with ‘must not’ or ‘shall not’
used to define prohibited behaviour.
• Precision: Be as precise as possible in the language used. Try to avoid
ambiguity, or vague statements, unless you expect to clarify them at a
later stage.
• Level: Frame each requirement at the right level (see above).
• Clarity: Define all specialist terms and acronyms using a glossary. Be
especially aware of those that can have well understood but very
different meanings to different audiences. Classic examples include ‘file’
and ‘archive’, which are very differently understood by archivists and IT
professionals.

Functional requirements
The functional requirements for any digital repository will be complex, and
it is important to structure them in a way that aids understanding.
Requirements are normally organized into thematic groups, relating to
specific areas of repository functionality. One approach might be to use the
OAIS functional areas:

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
UNDERSTANDING YOUR REQUIREMENTS 57

• ingest
• data management
• archival storage
• preservation planning
• administration
• access.

As these may not be entirely self-explanatory, especially beyond the digital


preservation practitioner community, some organizations have opted for a
simpler approach. As an example, the Parliamentary Archives divided its
functional requirements into the follow headings:

• ingest
• cataloguing and metadata
• bitstream preservation
• logical preservation
• access
• administration
• capacity.

Whatever structure is used, it should provide a logical place to fit every


requirement – if you find you have lots of ‘miscellaneous’ requirements, this
is probably a sign that the structure is not quite right.

Non-functional requirements
These cover requirements that are not directly related to the capabilities of the
system. They tend to be similar for any IT system, and typically include areas
such as:

• Usability and accessibility: This may include any web accessibility or


user interface standards to be followed, and any requirements to meet
disability legislation. It might also include specifying meaningful error
reporting.
• Desktop infrastructure: This defines requirements relating to any
desktop client provided by the system. Typically, this will need to be
compatible with the organization’s standard desktop operating system, or
be browser-based.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
58 PRACTICAL DIGITAL PRESERVATION

• Server infrastructure: This describes any server architecture on which


the system would operate. This may define requirements not only for the
server environment, including standard operating systems, but also for
any database components, and for storage and network infrastructure. It
may include any constraints or requirements around the use of
technologies such as virtualization, the Cloud, or grid computing.
• Performance: This defines how quickly the system must perform certain
tasks, such as ingesting data, or providing access.
• Resilience: This sets the expected resilience of the system, including
availability, and recovery times in the event of failures.
• Operational: This covers a range of features which may be required to
support day-to-day operation and administration of the system. These
might include supporting a test environment, back-up, storage
management, and upgrade and patch management.
• Application integration: This defines generic features required to
support integration with other systems, such as provision of an
Application Programming Interface (API) or Software Development Kit
(SDK).
• Enterprise search: This details any requirements to support enterprise
search – a single means for users to search across multiple systems.
• Sustainability: This covers requirements to ensure that the system is
itself sustainable. This is essential for future-proofing the repository, and
might include specifying a modular architecture with clearly defined
interfaces and the ability to export the complete content of the repository,
including all metadata in open formats.
• Compliance: This defines any legal and regulatory requirements that the
system must support, for example on the processing of personal
information.

Non-functional requirements are typically much more detailed when


procuring a fully fledged repository software platform; for more basic
implementations they can be fairly simple.

Service requirements
These describe requirements for services to support the implementation and
use of the system, rather than the system itself. They also tend to be generic,
and typically include:

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
UNDERSTANDING YOUR REQUIREMENTS 59

• Consultancy: This defines any specialist consultancy services which may


be required. These might include creating new ingest workflows,
integrating with other systems, or adding additional preservation tools
such as format converters or characterization utilities.
• Implementation: This describes how the system will be implemented,
including the project management approach in use. This may include
defining key points of contact with suppliers, and arrangements for
reporting progress and issues.
• Ongoing support and maintenance: This defines the required
arrangements for supporting the solution, including technical support.
A supplier’s support systems will need to integrate with any in-house IT
support. For example, first-line support may be provided by an in-house
helpdesk, with more technical support calls being passed to the supplier.
As many solutions may make use of third-party tools, it will be essential
to clarify how these will be supported.
• Change management: This describes changes to culture and working
practices within the organization which are required for the successful
adoption of the system, and identifies any support for this expected from
a supplier.
• Training: This defines any training needed, for example for IT staff,
curators or end-users, including any that a supplier will be expected to
provide. This should include start-up and ongoing training; it may be
useful to specify a train-the-trainer approach, to reduce future costs and
dependence on a supplier.
• Documentation: This defines the documentation required, such as user
guides, maintenance guides, technical documentation and training
materials. It should also cover rights to use and modify documents
provided by suppliers, and how they will be updated.
• Design and configuration: This describes how the supplier will be
expected to contribute to any system design, and to configure the system
for operational use.
• Testing: This covers all types of testing required before a system is ready
to go live, including system testing and user acceptance testing. This
should include any documents that a supplier will be expected to
provide, such as test scripts.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
60 PRACTICAL DIGITAL PRESERVATION

3.7 How to use your requirements


The requirements catalogue forms the basis for identifying, designing and
implementing an appropriate solution. The range of options for this are
discussed in detail in the next chapter; however, all will require either the
procurement of products or services, in-house development, or some
combination thereof. As the formal articulation of an organization’s needs,
the requirements catalogue provides an invaluable tool for communicating
those needs to suppliers, developers, funders and others, and a benchmark
against which the developing reality of a system can continually be tested.
The next section shows how it can be applied in either scenario.

Procurement: developing an invitation to tender


If systems or services are to be procured, the requirements catalogue will
form a key part of the formal documentation needed for this. Specifically, it
will be the centrepiece of a ‘call for bids’ or ‘invitation to tender’ (ITT),
detailing the requirements against which potential suppliers will be
evaluated and selected. An ITT typically adds two things to a requirements
catalogue: first, it includes information about the tender process, and detailed
terms and conditions that will apply to the resultant contract; second, it
includes evaluation criteria for every requirement, so that tenderer’s
responses can be assessed consistently. These essentially turn the requirement
into a question: if there is a requirement for x, the accompanying evaluation
criterion might be ‘The tenderer should explain how it will achieve x’.
The actual procurement process is described in Chapter 4, ‘Models for
implementing a digital preservation service’. Once a supplier has been
selected and awarded a contract to supply the solution, the implementation
stage will begin with the development of a detailed design for the solution,
followed by development, installation, integration and configuration, testing
and, finally, deployment of the live system. The requirements catalogue
should be referred to throughout the design stage, to ensure that the final
system design actually meets all the requirements. It normally also forms the
basis for ‘user acceptance testing’ – the process by which the customer
satisfies themselves that the finished solution really does fulfil the
requirements they originally specified. The implementation stage is
discussed in more detail in Chapter 4.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
UNDERSTANDING YOUR REQUIREMENTS 61

Developing an in-house specification


The requirements catalogue is no less important if a solution is to be
developed in-house. The only difference is that there will be no need to
incorporate it into an ITT, or go through a procurement exercise. However, a
similar evaluation process could still be used to consider alternative options.
The requirements catalogue can then be used directly to guide the design,
building and testing of the system.

3.8 Conclusion
Defining a set of requirements is a fundamental prerequisite for developing a
digital preservation capability. Given its importance, it is essential to take the
time necessary to ensure that the requirements catalogue is comprehensive,
sufficiently detailed and – most crucially of all – accurately reflects the
individual needs of the organization. Having such a statement of
requirements will contribute immensely to the likelihood of developing a
useful, practical and sustainable digital preservation capability.

3.9 Key points


• Take time to develop your requirements catalogue: Don’t rush this step,
or try to second guess requirements. Consult as widely as possible, and
use process modelling to drill down into the detail.
• Don’t reinvent the wheel: Your requirements catalogue should be based
on your policies, rather than standing in isolation. In addition, you can
draw on the wealth of readily available requirements documents which
other organizations have developed.
• Think about outcomes rather than solutions: Requirements say what
you need, not how to achieve it.
• Use your requirements catalogue to its full advantage: Having invested
so much time and energy in its creation, you should seek to extract every
last drop of value from it. Use it to choose the best solution, help build
that solution, and verify that the results really do fulfil your needs.

3.10 Notes
1 For more information on UML see www.uml.org/.
2 Examples include the iRods system (see https://www.irods.org/) and the

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
62 PRACTICAL DIGITAL PRESERVATION

SHAMAN project (see http://shaman-ip.eu/).


3 Innocenti et al. (2009).
4 National Library of Medicine Digital Repository Working Group (2007).
5 Electronic Records Archives Program Management Office (2010).
6 Wellcome Library (2008).
7 Wellcome Library (2010).
8 Repositories Support Project (2008).
9 Bradner (1997).

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
4
Models for implementing a digital
preservation service

4.1 Introduction
Many different models are possible for operating a digital preservation service –
there are options to suit every size and type of organization, from national bodies
with substantial dedicated budgets and teams, to the smallest organization
seeking to achieve something practical at minimal cost, and without specialist
skills. This chapter analyses the range of possible options, including bespoke, in-
house and outsourced solutions. It assesses the pros and cons of the alternatives,
and considers which elements of a service may be most suited to certain
approaches, and under what circumstances. It also considers the current and
developing market for providing these solutions. Next, it looks at the process of
implementing the chosen solution, and some of the practicalities of operating a
digital repository, including the roles required, and availability of suitable
training. It then examines the notion of ‘trusted’ digital repositories and proposes
a ‘maturity model’ for digital preservation, which enables organizations to assess
their capabilities and create a realistic roadmap for developing them to the
required level. The alternative models are illustrated by a series of case studies.

4.2 Options
Digital preservation is a comparatively new discipline, and models for good
practice, including the technologies and services required to support them,
therefore exist at varying levels of maturity. Approaches to providing the
fundamental elements of a digital repository are now well established, but
some of the techniques and technologies required to deliver advanced
preservation functions, especially for newer and more complex types of
digital content, remain in their infancy.
This section analyses the available options in detail, assessing the
respective strengths and weaknesses of each.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
64 PRACTICAL DIGITAL PRESERVATION

Do nothing
Any analysis of options should always include the status quo – not only does
this provide a baseline against which other, more positive, options can be
assessed, but it also allows a true comparison of the implications of not taking
action. The costs of doing nothing include the continued burden of maintaining
archival data on inappropriate storage infrastructure, and the costs of re-
creating, or failing to preserve, digital resources that would be lost as a result
of inaction, as described in Chapter 2, ‘Making the case for digital preservation’.
This option assumes no development of digital preservation services,
including zero investment and staffing. Table 4.1 sets out the pros and cons of
the do nothing option.

Table 4.1 The do nothing option


Pros Cons
• no investment • does not meet any of the business objectives for digital
required preservation, or the expectations of key stakeholders
• no organizational • loss of digital resources in the short to medium term, and
change required therefore the failure to provide users with the information
• keeps options open they need
for future investment • loss of corporate records in digital form, with the associated
reputational, governance and heritage damage that would
result, including failure to comply with legislative, regulatory
and information security requirements
• additional cost of continuing to manage archival data on
high-cost storage infrastructure
• loss of investment in digitization projects which it may not be
possible to replicate for financial or conservation reasons
• additional costs incurred by undertaking digital archaeology
to recover lost resources or in repeat digitization to replace
them
• additional costs incurred as individual IT systems reach
obsolescence and valuable content needs to be preserved in
a reactive, ad hoc fashion
Costs
Although the cost of investment is obviously zero, the net costs are potentially very high,
especially over the longer term – the costs and other impacts of repairing, re-creating or
losing information are generally much higher than those of preserving it.

The minimal repository


It is possible to build a functioning digital repository without any elaborate
tools or systems, and indeed this will be the most realistic option for many
smaller organizations. Such a repository will typically use simple, readily
available tools and existing infrastructure:

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MODELS FOR IMPLEMENTING A DIGITAL PRESERVATION SERVICE 65

• Ingest: This could be a manual, or semi-manual, process, using a variety


of free tools (as described in Chapter 6, ‘Selecting and acquiring digital
content’, and Appendix 3).
• Metadata: All or part of this might be stored in a simple spreadsheet or
database. Alternatively, descriptive metadata could be stored in an
existing catalogue system, with technical metadata stored as text files
alongside the content.
• Storage: This might use existing network storage space, with some form
of back-up, or removable media.
• Preservation: This will also typically be a manual, or only partially
automated process, using free tools and services of the kinds described in
Chapter 8, ‘Preserving digital objects’, and Appendix 3.
• Access: This will typically be provided on demand, via a terminal in the
reading room, or remotely on removable media.

The 2012 Future Proofing project, carried out by the University of London with
JISC funding, explores the possibilities of such a minimal approach in detail,
using open-source tools to perform a variety of ingest, preservation, access and
other repository management workflows;1 an OCLC report of the same date2
and a series of blog posts by Chris Prom3 provide further recommendations for
implementing a basic repository. The case study of the Centre for Archaeology
at the end of this chapter provides an example of this approach in practice.
Table 4.2 sets out the pros and cons of the minimal repository option.

Table 4.2 The minimal repository option


Pros Cons
• low cost • may be difficult to
• begins to develop the infrastructure, skills and provide the full range of
experience needed for digital preservation within the functionality
organization • not scalable to large
• flexible and customizable to local need volumes – entails a
• can be developed incrementally comparatively high level
• allows action to be taken in the short to medium term, of manual input
providing a flexible solution, which can be adapted over • may require technical
time to take advantage of maturing markets and service skills to set up
providers, and advances in digital preservation research • minimal or no support is
• there is no reliance on a single company or service available for many of the
provider tools
Costs
By using only free and existing software, and existing hardware, the costs of this approach
should be minimal, with staff time and training likely to be the most significant elements.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
66 PRACTICAL DIGITAL PRESERVATION

Developing a bespoke solution


Some organizations develop their own bespoke repositories, either using in-
house development teams of programmers or by commissioning external
software developers. The result will be a solution tailored to your precise
requirements, but this is likely to be the highest-cost option – software
development is expensive. In addition, developing a digital repository platform
from scratch is a complex exercise, and is likely to require several generations of
the technology to achieve a mature, stable product. This approach was quite
common among large memory institutions in the early days of digital
preservation, when third-party solutions were not as readily available; today, it
is unlikely to be appropriate unless you have very unique requirements which
cannot be met by current technologies. Even then, in most cases it is more
economical to adapt existing tools than build something entirely new.
As an example, when the UK National Archives (TNA) began developing
its digital repository in 2001, the only viable option to meet its requirements
was to procure a bespoke solution, and it commissioned a commercial
software developer accordingly. The resultant product was successful,
winning the inaugural Digital Preservation Award in 2004 and the 2011
Queen’s Award for Enterprise in Innovation, but required substantial
investment. TNA subsequently licensed the technology back to the developer,
to use as a basis for creating a commercial product – Safety Deposit Box (SDB)
(discussed later in this chapter and in Appendix 3). Table 4.3 sets out the pros
and cons of the bespoke option.

Table 4.3 The bespoke option


Pros Cons
• offers most flexible • very expensive and time consuming to develop
solution, tailored • very complex
exactly to the • very high risk – technology is unlikely to be stable or mature
organization’s needs for several generations
• less opportunity to collaborate or learn from others
Costs
This is a very expensive option in terms of development and ongoing support costs.

Using open-source software


Repositories can be developed entirely using existing open-source tools (as
described later in this chapter and in Appendix 3). Such tools are typically
free to use, but this does not mean that this approach is without cost: you are

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MODELS FOR IMPLEMENTING A DIGITAL PRESERVATION SERVICE 67

likely to require some development resource (either in-house or external) to


adapt, configure and support the tools. Your organization will also bear the
entire risk, rather than sharing it with suppliers. This approach is widely
adopted in universities and the larger cultural memory institutions, which
typically have access to in-house developers. The case studies of the LSE
Library and Burritt Library at the end of this chapter provide good examples
of using open-source software, albeit within hybrid solutions. Table 4.4 sets
out the pros and cons of the open-source software option.

Table 4.4 The open-source software option


Pros Cons
• offers a very flexible solution, • potentially costly in terms of staff
adapted to the organization’s needs • can require a significant element of bespoke
• hands-on experience develops software development and enhancement
organization’s ability to act as an • high risk, because of the immaturity of some
intelligent customer technologies
• there may benefits to be gained • minimal or no support is available for many of
from becoming part of a user the products
community for the chosen solution • substantial management overhead
with the potential for shared service • organization may not have expertise in, or
developments, training, support and currently support, many of the technologies
recruitment in future involved
Costs
Highly variable, depending on the availability and cost of system development and
support resources.

Procuring a commercial solution


A common option is to procure a commercial off-the-shelf digital repository
solution, with contracted-out support, and services. The small, but growing
market for such products is discussed later in this chapter. This can be
characterized as a high cost, low risk option, and much of its success depends
on the development of a strong relationship with the supplier. An example of
this approach is given in the case study of the Wellcome Library at the end of
this chapter. Table 4.5 sets out the pros and cons of the commercial option.

Outsourcing the service


This approach involves contracting a third-party provider to supply a full
digital preservation service to meet your requirements. This is potentially a
low risk option, since it places all responsibility for implementation with

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
68 PRACTICAL DIGITAL PRESERVATION

Table 4.5 The commercial option


Pros Cons
• flexible and customizable • typically high cost
• provides all functionality in a single, • commercial products may not meet all
supported package requirements ‘out of the box’, and may
• high level of support available require configuration or customization
• may provide access to an • creates a dependency on an external supplier
established user community, • may introduce proprietary aspects to the
whereby you can benefit from repository
others’ experience • may require a lengthy procurement process
Costs
Commercial products are likely to come with a comparatively high price tag. You typically
need to pay a one-off licence fee and annual support costs. On top of this, you should
expect the supplier to charge for customization and configuration of the product to suit
your requirements – in some cases, this may account for the single largest cost element.

the service provider. At the same time, it may introduce risks associated
with the lack of direct control over the repository. Customers are wholly
dependent on the supplier for the service; for institutions where
preservation is a core business, this may be undesirable. Services may also
be provided under very short-term contracts, with equally brief notice
periods on either side; if the supplier chooses to terminate the contract, or
indeed to leave the market, their customers may have very little time to
source an alternative provider. In a still very new market, there is a real
danger that there may not even be any credible alternative suppliers, which
could leave an institution in a very difficult situation.
At present, this is perhaps the least mature sector of the market, except in
specialist areas, such as web archiving. However, more comprehensive
Preservation-as-a-Service (PraaS) models are also beginning to emerge; some
of the early players in this market are discussed later in this chapter, while the
development of this trend is addressed in Chapter 10, ‘Future trends’.
Outsourcing is more commonplace for providing elements of the repository
infrastructure. For example, many organizations host their servers in third-
party data centres, or outsource their storage to a managed service, or the
Cloud. The implications of the latter are considered in more detail in Chapter
8, ‘Preserving digital objects’.
This option minimizes the direct impact on the organization, insulating it
from the changes required for implementation, and the need to support
particular technologies. It also has a very low barrier to entry, making it
particularly attractive to smaller organizations; however, it does require the

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MODELS FOR IMPLEMENTING A DIGITAL PRESERVATION SERVICE 69

organization to have the capacity to oversee management of the contract,


which may require new resources and expertise, for example within the ICT
and information management functions.
A good example of a widely used outsourced service, albeit one which
only addresses the capture and hosting of a single type of content, is Archive-
It, a web archiving facility provided by the Internet Archive, and described in
Chapter 5, ‘Selecting and acquiring digital content’. This is used by a wide
variety of small organizations from around the world to carry out web
archiving, as exemplified by the case study on the Greater Manchester
Archivists Group at the end of this chapter. Table 4.6 sets out the pros and
cons of the outsourced option.

Table 4.6 The outsourced option


Pros Cons
• outsourcing accesses specialist • service providers may not be available to fulfil
skills and experience which may not all the organization’s requirements
be available within the organization, • the market is immature, with a small number
and allows action to be taken of potential service providers
pending the development of internal • it would be difficult for the organization to
capability maintain an intelligent customer capability
• defers or avoids the need for the without practical experience
organization to develop or manage • outsourcing may not be an option for
infrastructure for digital preservation sensitive content
• for specific activities, such as web • the costs of contracting out may be higher in
archiving, contractors can offer the medium term than implementing a
excellent value for money service in-house
• outsourced services can be very • there is a danger of losing or failing to
quick to set up, and potentially very develop the skills base within the
easily scaled up or down to meet organization
changing demands • a lack of adequate exit strategies may result
• outsourcing to a service provider in being locked into a contract for longer than
who is actively undertaking research is desirable
in the field means that the • risk of overdependence on a single supplier
organization could benefit from that until the market develops further
innovation without incurring • services are often offered on very short-term
additional development costs contracts, with limited notice periods
• as future demand grows, further • governance arrangements are likely to be
service providers may well appear more complex
Costs
Outsourcing requires no up-front capital investment, operating either on an annual charge
or a pay-per-use basis. Unless the unit costs of the service decline over time at a similar
rate to alternative options, this can prove an expensive option over the long term, but it
can also significantly lower the financial barrier to entry.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
70 PRACTICAL DIGITAL PRESERVATION

Partnership approaches
Another option is the collaborative approach, whereby a number of
organizations with a common set of requirements establish a partnership to
develop and share services. These partnerships may be set up in various
ways, with different degrees of formality:

• Informal arrangements: At the least formal end of the spectrum, there


might be a simple understanding, without any written agreement,
perhaps based on a long-standing existing arrangement. However, such
an approach is unlikely to be suitable as a basis for the long-term
preservation of digital material, since it offers no guarantees or
safeguards for future preservation or access.
• Formal agreements: There are various kinds of formal written agreement
which can be defined between two or more parties. These include
memoranda of understanding (MoUs) and letters of agreement, which
usually stop short of forming a legally binding contract, the consortium
agreement, which is normally legally binding, and a formal contract. An
example model for consortium agreements from the UK is provided by
the Lambert Toolkit, intended for universities and companies that wish to
undertake collaborative research projects with each other.4
• Establishing a separate legal entity: Perhaps the most complex form of
partnership is to establish a new legal entity which represents the
shared interests of the partners. This might take the form of a non-
profit entity, such as a charity, trust, foundation or private company
limited by guarantee. The precise forms of non-profit organization
allowed vary from country to country, but most enjoy tax exempt
status. An existing legal entity might also play host to a partnership
entity which has an independent existence from its members, but is not
itself legally constituted. An example of this approach is the
MetaArchive Cooperative, discussed in detail in a case study at the end
of this chapter.

Partnerships can also operate in different ways:

• The partners may jointly procure a service from a third party, using any
of the other options discussed in this section.
• They may establish a distributed infrastructure. This might involve each
partner hosting a copy of all, or part, of the system, using technologies

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MODELS FOR IMPLEMENTING A DIGITAL PRESERVATION SERVICE 71

such as LOCKSS (see later in this chapter), or partners providing


different services, according to their skills and resources.
• One partner may operate the service on behalf of the others.

Table 4.7 sets out the pros and cons of the partnership option.

Table 4.7 The partnership option


Pros Cons
• economies of scale • misalignment of partners’ objectives
• greater leverage with suppliers • withdrawal of partners
• pooling of resources • lowest common denominator applies
• sharing of best practice and expertise • exit strategies may be more difficult
• potentially sustainable beyond the life of • formal agreements may be complex to
individual partners negotiate
• potential for distributed infrastructure,
improving resilience
Costs
Because of the economies of scale, a partnership approach should result in lower costs
per partner to achieve a given result than a single partner could achieve in isolation.

Hybrid approaches
In many cases, a hybrid approach may actually be the most appropriate option.
In this scenario, various elements of the solution are developed in-house,
procured as commercial off-the-shelf products, or outsourced as appropriate.
This can provide the most flexible and cost-effective solution, but may be
complex to develop and operate, requiring careful planning and integration to
ensure that the components work together correctly. The case studies on the
LSE Library and Burritt Library at the end of the chapter exemplify this
approach. Table 4.8 sets out the pros and cons of the hybrid option.

Table 4.8 The hybrid option


Pros Cons
• As for individual options above • As for individual options above
Costs
Highly variable, depending on the mix of elements used, but offers the opportunity to
create the most cost-effective long-term solution.

4.3 The current market


The market for digital preservation solutions is still comparatively new, but
also growing and evolving rapidly. There has been relatively little analysis of

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
72 PRACTICAL DIGITAL PRESERVATION

the sector, although the Planets project published a white paper based on a
survey of vendors in 2009.5 The potential size of the market was estimated in
2011 as being worth in excess of $1 billion,6 which supports the view that this
growth will continue.
There are already a large number of solutions available, capable of
fulfilling a wide range of requirements, and suited to a variety of scenarios.
Many of these have released a number of stable, production-quality versions,
and can be considered mature tools; some have been available for over a
decade.
It is not the role of this book to recommend particular products,
commercial or otherwise. Nonetheless, it is instructive to mention some of the
solutions currently available, with the proviso that inclusion here does not
constitute an endorsement. The range of available tools is described in much
more detail in Appendix 3.

Commercial products
There is a small, but well established community of commercial suppliers,
offering customized off-the-shelf solutions. In many cases, these represent
commercial versions of systems originally developed for specific national
libraries and archives. With a comparatively small amount of integration and
customization, these can offer a generic digital preservation solution, which
can then be extended through further development to provide whatever
additional functionality the customer may wish. Increasingly, these solutions
are being enhanced to comply with developing international standards, and
to use emerging third-party services. It is also possible to contract the
majority of support and administration to the supplier, minimizing the
impact on in-house IT support, and other IT-enabled programmes. The main
commercial products, including Rosetta from Ex Libris, and Tessella’s SDB
(Figure 4.1), are discussed in more detail in Appendix 3.

Open-source products
A number of open-source technologies have been developed, primarily
within the higher education community. Although few of these individually
provide a complete digital preservation solution, some institutions are using
them as the building blocks to develop their own systems. Open-source tools
can also be used to complement commercial solutions.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MODELS FOR IMPLEMENTING A DIGITAL PRESERVATION SERVICE 73

Figure 4.1 Safety Deposit Box (Tessella Ltd)

A number of complete open-source digital repository management systems


are available, although these vary in the level of preservation functionality
which they offer directly. The most widely used systems include DSpace,
EPrints, Fedora and LOCKSS, but a number of new platforms, offering more
advanced preservation functionality, are now emerging. A more detailed
description of all of these is provided in Appendix 3.
In addition to these repository systems, a number of toolkits and
individual utilities have been developed, either by individual organizations,
or under the auspices of collaborative projects. These can potentially be used
to add preservation functionality to existing repository systems. They include
characterization tools such as DROID and JHOVE, ingest tools like the
Curator’s Workbench, forensic tools, including BitCurator, metadata tools
such as Archivists’ Toolkit, and migration tools like Xena. Again, a wide range
of current tools are discussed in Appendix 3.

Service providers
A nascent community of providers is beginning to offer a range of digital
preservation services on both a commercial and a free to use basis. These

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
74 PRACTICAL DIGITAL PRESERVATION

services vary considerably in scope and maturity, and this area of the market
is subject to particularly rapid development (trends in this space are
discussed in Chapter 10). Some services, such as the UK National Archives’
PRONOM registry,7 have become widely adopted as de facto standards, and
are likely to form part of many solutions. Service providers may specialize in
one particular service, or type of content, or may offer more comprehensive
preservation services, which seek to provide all the functions of a digital
repository. Others provide consultancy services tailored to the needs of
individual clients. These permutations are discussed below.

Specialist services
Perhaps the largest range of suppliers can be found providing services
relating to specific types of content, such as web archiving and audiovisual
material. They also tend to specialize in certain types of activity, such as
capture or format migration. For example, non-profit organizations such as
the Internet Archive and Internet Memory Foundation, and commercial
entities such as Hanzo Archives all offer services to capture, store and
provide public access to web content (see Chapter 6, ‘Selecting and acquiring
digital objects’). Other services focus on particular repository functions,
exemplified by technical registries such as PRONOM and the Library of
Congress Digital Formats site.8

Comprehensive services
A growing number of suppliers are beginning to offer a full range of digital
repository services. In some cases, suppliers have emerged to meet the needs
of specific communities. For example, the UK Data Archive (UKDA)9 and the
ICPSR10 provide data archive services to the international social sciences
community. Furthermore, UKDA is part of a network of specialist data
services established to support the archaeology, history, literature,
performing arts and visual arts communities within UK higher and further
education.11 The international library community is served by services such
as Portico, which preserves e-journals, e-books and digitized historical
collections on behalf of publishers and libraries,12 and the OCLC Digital
Archive, which is primarily designed to preserve the outputs from library
digitization projects.13
More generic services are also now appearing. For example, Chronopolis

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MODELS FOR IMPLEMENTING A DIGITAL PRESERVATION SERVICE 75

is a digital repository service provided under the management of the San


Diego Supercomputer Center.14 For a standard annual charge ($2,200 per
terabyte in 2012), Chronopolis provides ingest, validation, replication to three
sites within the USA, integrity monitoring, reporting and transmission of
content back to the data provider when and if needed.
A number of cloud-based commercial services have also emerged.
DuraCloud is provided by DuraSpace, the non-profit organization which also
maintains the DSpace and Fedora Commons repository technologies (see
above), and provides ingest, storage, integrity checking and online access,
including image serving and media streaming for audiovisual content.15
DuraCloud uses multiple cloud storage providers, acting as a broker on
behalf of customers, and offers four service plans, with annual pricing (as of
2012) starting at $1,500 per terabyte. In 2012, the Preservica service was
launched by Tessella. Preservica offers PraaS, built on the SDB platform and
hosted in the Cloud on Amazon’s S3 service.16 It is offered on a pay-as-you-
go model, although pricing details have not been made public at the time of
writing. The Preservica literature suggests that customers will be able to
specify which territory their data will be hosted in, which will mitigate some
of the legal and governance issues raised by the Cloud. However, the use of
a single, US-headquartered cloud provider may still be problematical for
some potential customers. Preservica is unusual in providing SDB’s logical
preservation functionality, including preservation planning and format
migration – most other service providers leave this to the customer, although
they may offer some support at extra cost.

Consultancy services
A number of providers offer consultancy services to assist organizations with
their preservation needs. Typically, these involve targeted projects, for
example to audit existing holdings, elicit requirements, develop policies and
procedures, or advise on standards. To benefit from consultancy – which can
be a costly option – it is essential to have a very clear, focused brief and to
choose both the project and the consultant with great care. However, at its
best such consultancy can bring an impartial and expert perspective to issues.

Partnerships
There are some excellent examples of the partnership model in practice,

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
76 PRACTICAL DIGITAL PRESERVATION

many based on LOCKSS technology (see Appendix 3). To illustrate:

• Eight US universities, which form part of the Association of Southeastern


Research Libraries, are collaborating to preserve each others’ electronic
theses and dissertations using a private LOCKSS network.17
• CLOCKSS (Controlled LOCKSS) is an international, not-for-profit
community partnership between libraries and publishers, which uses a
private network based on LOCKSS technology to provide a distributed
archive for electronic scholarly content.18 As of 2012, CLOCKSS has over
150 participating libraries, and provides 12 archive nodes, hosted by
libraries in Asia, Australia, Europe and North America. Content is
harvested from over 100 publishers and stored across the archive nodes;
when a ‘trigger event’ occurs, such as the publisher going out of business,
or ceasing to provide access to a title or its back issues, the CLOCKSS
network makes that content freely available to everyone. CLOCKSS is
funded through the annual subscriptions paid by participating libraries
and publishers, but is seeking to raise an endowment, which would
secure its long-term operation.
• The Alabama Digital Preservation Network (ADPNet) provides a low-
cost distributed digital preservation service for cultural heritage
organizations in Alabama. Founded in 2006, and using a private LOCKSS
network, it is operated by seven higher education institutions.19
• Between 2004 and 2006, two UK local authorities (Bedfordshire and
Hertfordshire) worked with the UKDA on a pilot project to investigate
the issues involved in establishing a regional archive and records
management service for digital records.20 This aimed to identify possible
solutions, and establish the basis for a business case for a digital
preservation strategy in the east of England. It was followed in 2007–8 by
a second phase of work, to survey the use of digital media by
organizations regularly depositing records with the Bedfordshire and
Luton Archives and Records Service (BLARS).21 The studies provided a
useful analysis of the challenges of collaborative approaches in the UK
archival sector, but have not so far led to any operational partnerships.

Another partnership initiative, the MetaArchive, is considered in detail in a


case study at the end of this chapter.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MODELS FOR IMPLEMENTING A DIGITAL PRESERVATION SERVICE 77

4.4 Approaches to procurement


If your preferred option entails purchasing a commercial product or service,
it is likely that you will need to undertake some form of procurement
exercise. Public sector organizations are likely to be more constrained in their
procurement options than the private sector. For example, within the EU,
public bodies are required to comply with a number of EU directives, most
notably Directive 2004/18, which covers contracts for public works, public
supply and public service.22 In the USA, federal procurement is principally
governed by the Federal Acquisition Regulation.23 Other countries typically
have equivalent legislation.
Private sector organizations are free to approach procurement in whatever
manner they wish, within the bounds of the law, but most still have clearly
defined procurement policies, which share many of the features of public
sector procurement. It is beyond the scope of this book to describe in detail
the procurement rules that may apply in particular circumstances or
jurisdictions. However, it is possible to make some general observations
about the kinds of approach that may be considered.
Your requirements catalogue (see Chapter 3, ‘Understanding your
requirements’) should be your starting point. The procurement process is
then essentially a case of asking potential suppliers to define how they would
meet those requirements, and evaluating their responses. Depending on the
rarity of the service or product being procured, it may be necessary either to
take steps to ensure that as many potential suppliers as possible respond, or
to limit the respondents to a manageable number. It is common practice to
issue some form of prior notice, alerting suppliers to an upcoming
procurement. Where the number or quality of suppliers may be an issue, an
initial pre-selection stage may also be taken to restrict the number of bidders;
since the evaluation of full tenders is the most complex and time-consuming
part of the process, this can considerably reduce costs and simplify the overall
process.
The main phase of procurement involves the issue of an ITT – a formal and
detailed definition of the service or product required, the format and nature
of responses required, the process by which tenders will be evaluated, and
the terms and conditions under which a contract will be awarded. At the core
of this is therefore the requirements catalogue. Bidders respond by
submitting tenders, which are evaluated against the statement of
requirements, including a financial evaluation of the cost of each proposal.
At the conclusion of the tender evaluation, a preferred supplier will be

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
78 PRACTICAL DIGITAL PRESERVATION

identified. It will also often be necessary to provide feedback to unsuccessful


suppliers.

4.5 Implementation
Having selected an option and, if appropriate, undertaken any necessary
procurement, the final step is to actually implement the chosen solution,
which typically involves the following stages:

• Detailed design: As discussed in Chapter 3, ‘Understanding your


requirements’, the requirements which formed the basis for choosing a
solution should not themselves constitute the design for that solution.
Consequently, once a solution has been chosen, it is usually necessary to
undertake some degree of design to finalize the details of how it will
work in practice. This is necessary even if using an off-the-shelf product,
since this still needs to be configured to your particular circumstances,
and may require a degree of customization. In many cases, design will be
undertaken by your chosen supplier, or your IT support function. The
design stage may not be required if the service is to be entirely
outsourced, since the shape of that solution may either be predefined or
have been satisfactorily specified in the supplier’s response to any
procurement exercise.
• Development: Almost every option requires some activity to put in place
the relevant tools and structure. This may be as simple as installing some
simple tools and setting up some folder structures, or as complex as
undertaking software development and configuration.
• Testing: No system works exactly as planned from the outset, and
thorough testing is a critical stage of implementation. Your requirements
catalogue should once again be central to this process: you need to test
whether or not each requirement identified has been successfully met by
the solution. Such testing can be relatively informal, or fairly elaborate,
involving written test scripts; however you approach it, you should at
least document whether or not each requirement has been successfully
tested, and how this was achieved. This testing against user requirements
is often referred to as user acceptance testing (UAT). For IT systems,
additional system testing may be required to ensure that any new
hardware and software will operate correctly within your IT
environment, and will not have an adverse impact on other systems.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MODELS FOR IMPLEMENTING A DIGITAL PRESERVATION SERVICE 79

• ‘Go live’: Once the solution has passed testing, it is ready to be put into
operation. Precisely what this entails depends on the nature of the
solution, and the organizational context: for many small organizations
this may simply be a decision point, but if you have a supported IT
infrastructure any change to it is likely to require some kind of formal
process.
• Review: It is important to remember that going live is not the end of the
story – you should also plan to review how well the operational system is
performing at regular intervals thereafter. The purpose of such reviews is
twofold: at a practical level it is a means to identify issues, suggest and
make improvements, and learn lessons; strategically, it is an opportunity
to demonstrate the practical benefits being achieved to stakeholders (as
identified in Chapter 3, ‘Understanding your requirements’). This is
crucial to ensuring their continued support.

The benefit of managing the whole development of a digital preservation


service, from initiation to implementation, as a project has already been
discussed in Chapter 2, ‘Making the case for digital preservation’. Such an
approach is likely to prove its worth most especially during the
implementation stage, since this is the point at which it will be most crucial
to marshall staff and resources in order to achieve tasks to specific deadlines.

4.6 Operating a digital repository


You need to prepare for the practicalities of going live by putting in place the
procedures and staff required to operate the digital preservation service. The
detail of what is likely to be involved in this occupies most of the remainder
of this book. One of the most critical considerations is to ensure that you have
identified the staff and roles required, and that those staff have the necessary
skills.

Staff roles
While the exact nature and number of staff roles required to operate a digital
repository, and how you choose to provide them, will be very specific to your
particular circumstances, there are a number of generic role types which you
are likely to need:

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
80 PRACTICAL DIGITAL PRESERVATION

• Ingester: Manages individual accession and ingest processes from start to


finish, including any necessary liaison with depositors. This role is
usually performed by librarians or archivists with suitable training.
• Cataloguer: Ensures that descriptive metadata is created and captured to
appropriate standards, either during or post-ingest. This role is usually
undertaken by existing cataloguing staff. In some cases, it may be
combined with the ingester role.
• Repository manager: Manages the repository function, including ingest,
preservation and access. This is usually the most specialized repository
role, and may be filled by suitably trained curatorial staff, or a specialist
digital archivist.
• System support: Supports users of the repository. This is normally
referred to as first-line support – more complex issues may be referred to
system administrators or suppliers. Where possible, this should be
integrated with any existing IT helpdesk support but, in a small
organization, it might be combined with the repository manager role.
• System administrator: Manages the IT systems and infrastructure on
which the repository depends. Tasks may include second-line support,
database administration, and managing storage and user accounts. This
role is normally performed by IT staff.

Training
The widespread availability of high-quality, affordable training in digital
preservation theory and practice, at a variety of levels, is essential to
organizations of all sizes, but especially for smaller bodies with limited
training budgets and a particular need to develop existing staff with new
skills. While the provision of such training varies greatly from country to
country, it is becoming ever more widely available.

Graduate and postgraduate training


Digital preservation is increasingly addressed to some extent in graduate and
postgraduate information management training courses. Long considered a
marginal element, there is now growing recognition that digital preservation
awareness and techniques are a core element of the modern information
management professional’s skill set, and one which employers expect when
recruiting new staff.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MODELS FOR IMPLEMENTING A DIGITAL PRESERVATION SERVICE 81

The majority of postgraduate library and archives courses now explicitly


cover digital preservation and digital information management in some
depth. Courses with a focus on digital collections management are also
beginning to appear. For example, the Humanities Advanced Technology and
Information Institute (HATII) at Glasgow University specifically addresses
digital preservation as part of its undergraduate degree in digital media and
information studies, as well as postgraduate courses in computer forensics
and e-discovery, information management and forensics, information
management and preservation, and museum studies,24 while the University
of Arizona’s Graduate Certificate in Digital Information Management
includes a specific course on digital preservation.25 There are also increasing
opportunities to research digital preservation at a doctoral level, such as
King’s College London’s doctoral programme for digital humanities
research.26

In-career training
Substantive training courses are also required for existing staff who need to
develop new expertise. A number are available, typically lasting between two
and four days, which provide a thorough introduction to the principles and
practice of digital preservation, and are highly recommended for anyone
seeking to develop their practical skills and knowledge.
These may include online teaching materials which are accessible to all,
and very valuable for those unable to attend the face-to-face teaching.
Notable training courses include:

• Digital Preservation Management Workshops and Tutorial: Currently


hosted by MIT Libraries, but based on the seminal training programme
developed by Cornell University and subsequently ICPSR, this offers a
combination of three-day workshops, an online tutorial, and
supplementary one or two-day topical workshops. While the workshops
will primarily be of interest to US institutions, the award-winning online
tutorial, which is available in English, French and Italian, is highly
recommended for all.27
• The Digital Preservation Training Programme (DPTP): The DPTP is one
of the longest-running digital preservation training courses. Operated by
the University of London Computer Centre, the two or three-day course
is usually taught twice a year, at a variety of locations around the UK.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
82 PRACTICAL DIGITAL PRESERVATION

Centred on the OAIS model and modular in form, it uses a mixture of


lectures, practical exercises and discussions to provide a thorough
grounding in the basics. The Digital Preservation Coalition frequently
offers a limited number of scholarships to support its members in
attending the course.28
• Digital Preservation in a Box: An online training toolkit, developed by
the US National Digital Stewardship Alliance’s Outreach Working Group,
this acts as a portal to a wide range of training resources.29
• Data Intelligence 4 Librarians: Designed primarily for librarians, this
new course is operated by the 3TU.Datacentrum, a Dutch scientific data
archive, in conjunction with a number of Dutch universities. The four-
day course covers data management, technical skills, and acquisition and
selection through a combination of online and face-to-face teaching.30

Short (one or two-day) introductory courses are also available for staff
requiring a basic familiarity with the concepts, but not the in-depth
knowledge provided by longer courses, or for those wishing to develop their
knowledge in particular areas. These are often run by advocacy and training
bodies such as such as the UK’s Digital Preservation Coalition31 and Digital
Curation Centre,32 the Dutch Nationale Coalitie Digitale Duurzaamheid33
and the international Open Planets Foundation.34 The Library of Congress
maintains an online calendar of digital preservation training opportunities35
as part of its Digital Preservation Outreach and Education programme.
Within this initiative, it has also developed a baseline curriculum and is
building a train-the-trainer network.

4.7 Trusted digital repositories


The concept of ‘trust’ has assumed a position of critical importance for many
in the digital preservation community, and the notion of trusted digital
repositories is widely used and, perhaps on occasion, abused. It is therefore
useful to understand how this concept is being applied, and what its practical
implications may be.
Digital repositories provide services in two directions – to those who
deposit content with them for preservation, and to those who consume that
preserved content. Both communities rely on the repository to fulfil certain
obligations, which may be explicitly stated or implicitly understood, in order
to provide those services; the effectiveness of a repository depends on the

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MODELS FOR IMPLEMENTING A DIGITAL PRESERVATION SERVICE 83

extent to which it is trusted by those communities to do so. A repository that


is not trusted by potential depositors to manage their content responsibly and
effectively will receive little material to preserve; one which users don’t have
confidence in to provide access to usable, reliable records will not be
consulted.
The debate on trusted digital repositories is therefore about what
organizations must do to develop and maintain that trust.36 It is concerned
with those attributes of a repository on which trust is built. A repository has
no control over the extent to which it is trusted – all it can do is to take steps
that are likely to engender that trust. It is, therefore, more meaningful to talk
about trustworthy repositories.
The concept of trust is not unique to the digital world; cultural memory
institutions have always relied on it. However, digital information brings it
into especially sharp relief: the ease with which it can be published, copied,
altered and destroyed has led many organizations to take a much more
proactive and self-conscious attitude to trust than may previously have been
the case.
In its seminal 1996 report, the Task Force on Archiving of Digital
Information referred frequently to the importance of trust in the archival
process, noting that ‘a critical component of digital archiving infrastructure is
the existence of a sufficient number of trusted organizations capable of
storing, migrating, and providing access to digital collections.’37 It went on to
suggest that ‘a formal process of certification, in which digital archives meet
or exceed the standards and criteria of an independent certifying agency,
would serve to establish an overall climate of value and trust about the
prospects of preserving digital information.’38 The Task Force recognized that
there were two models for certification – one based on third-party inspection,
the other on users’ validation of a repository’s stated adherence to standards
– without advocating one over the other. It recommended that a dialogue be
instituted between relevant organizations as to how such a certification
process might be developed.
Perhaps understandably, over the next few years such a dialogue took
second place to the need to develop a sound theoretical and practical basis for
digital preservation. Such a basis was provided by the publication in 2002 of
the OAIS Reference Model,39 which also foresaw the need for follow-on
standards, including those for certification of archives.40 The next major step
forward took place in the same year, when OCLC and the Research Libraries
Group (RLG) published a report on the attributes and responsibilities of a

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
84 PRACTICAL DIGITAL PRESERVATION

‘trusted digital repository’.41 Since then, a number of substantive initiatives


have emerged, including:

• TRAC: The Trustworthy Repositories Audit & Certification Criteria and


Checklist (TRAC) was published in 2007 by the Task Force on Digital
Repository Certification, and has subsequently become an international
de facto standard.42
• nestor: At the same time, the German nestor (Network of Expertise in
long-term STORage) project43 was developing its catalogue of criteria for
trusted digital repositories.44 Although co-ordinated with the TRAC
standard, the work of nestor is focused on the particular requirements of
libraries, archives and museums in Germany. It has now been published
as a standard by the German National Bureau of Standards (DIN 31644).
• DRAMBORA: Also published in 2007, the Digital Repository Audit
Method Based on Risk Assessment (DRAMBORA) toolkit provides a risk-
based methodology for repository audit developed by the UK’s Digital
Curation Centre, and the EU-funded DigitalPreservationEurope project.45
While drawing on, and intended to complement, the work of TRAC and
nestor, DRAMBORA focuses on the practical application of audit
methodologies, based on self-assessment. It was developed in part to
address perceived difficulties in the practical application of the existing
schemes, such as the lack of metrics for measuring and comparing an
organization’s compliance.
• PLATTER: The PLATTER toolkit was published in 2008 by the
DigitalPreservationEurope project.46 This provides a checklist and
guidance to help repositories plan their objectives, including the
achievement of ‘trusted’ status. It provides a means for repositories to
classify themselves, in order to be able to compare their policies and
practices with similar repositories; defines a series of ‘strategic objective
plans’ within which a repository can codify its objectives; and proposes a
planning cycle, whereby organizations can define, realize, review and
refine their objectives.
• Data Seal of Approval: The 2010 Data Seal of Approval (DSA) is
intended to provide a comparatively lightweight assessment process,
aimed primarily at repositories of structured research data, and
comprising 16 criteria. The repository assesses itself, and this is subject to
external review by the DSA Board.47

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MODELS FOR IMPLEMENTING A DIGITAL PRESERVATION SERVICE 85

Since 2007, work has been under way in various quarters to formalize these
initiatives as de jure standards. ISO 16363, published in 2012, provides an
audit and certification standard, primarily based on TRAC and broadly
equivalent to DIN 31644, while ISO 16919, which is still under development,
will define requirements for audit and certification bodies. Both have been
developed under the auspices of the Consultative Committee on Space Data
Systems (CCSDS).
In 2010, a MoU sponsored by the European Commission was signed
between the three groups currently working on standards for trusted digital
repositories – CCSDS, DIN and DSA – to develop a co-ordinated approach
within Europe. This established a European Framework for Audit and
Certification of Digital Repositories, with three levels of certification:

• basic certification for repositories which obtain DSA certification


• extended certification for repositories which obtain basic certification,
and additionally perform an externally reviewed and publicly available
self-audit based on ISO 16363 or DIN 31644
• formal certification for repositories which obtain basic certification, and
additionally obtain full external audit and certification based on ISO
16363 or DIN 31644.

A series of test audits using the framework were undertaken in 2012 by the
APARSEN project, leading to recommendations for refinements.48 The
framework should go a long way towards unifying the various strands of
certification activity, at least within Europe, although practical approaches to
external review and audit are still evolving. In the meantime, the
DRAMBORA self-assessment method, drawing on the standards within this
framework, is most likely to be helpful to smaller organizations considering
a suitable implementation model.
The relationships between the main certification schemes are illustrated in
Figure 4.2.
The various trusted digital repository schemes can serve a number of
practical purposes, including:

• articulating the benefits of digital preservation in a business case (see


Chapter 2)
• informing the development of your requirements (as discussed in the
previous chapter)

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
86 PRACTICAL DIGITAL PRESERVATION

Figure 4.2 The relationships between the main certification schemes for digital
repositories

• measuring the success of your repository, once operational, and


demonstrating this to your stakeholders.

4.8 A digital preservation maturity model


Maturity models provide a means for organizations to assess their
capabilities in a particular area against a benchmark standard. A good
example of this is the UK Office of Government Commerce’s Prince2 Maturity
Model (P2MM),49 a framework for assessing an organization’s project
management capabilities. This defines a five-step scale of ‘maturity levels’,
corresponding to the sophistication of an organization’s processes, and seven
‘process perspectives’, which define the areas of business function being
assessed, such as financial management or organizational governance.
This chapter proposes a maturity model for assessing an organization’s
digital preservation capabilities, which draws on the P2MM methodology,
together with the various emerging standards for trusted digital repository
certification.50 It then discusses how you can use such a model to review and
plan your organization’s approach to building its own capability.

Maturity levels for digital preservation


The development of any new capability within an organization usually
follows a similar path (illustrated in Figure 4.3): it begins with a developing
awareness of the need for that capability and the steps required to acquire it,
and ends with the realization of that capability, which may potentially be at

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MODELS FOR IMPLEMENTING A DIGITAL PRESERVATION SERVICE 87

Figure 4.3 Developing capability

varying levels of sophistication, ranging from the achievement of minimum


standards to best practice.
With any process, it is possible to envisage a baseline, minimal version, a
fully developed version, and an optimal version, in which the process
includes proactive steps to monitor its performance and identify required
improvements.
This path can be broken down into a series of discrete maturity levels, as
shown in Table 4.9.

Table 4.9 Maturity levels


Stage Maturity level Description
Awareness 0 No awareness The organization has no awareness of either the need
for the process or basic principles for applying it.
1 Awareness The organization is aware of the need to develop the
process, and has an understanding of basic
principles.
2 Roadmap The organization has a defined roadmap for
developing the process.
Capability 3 Basic process The organization has implemented a basic process.
4 Managed The organization has implemented a comprehensive,
process managed process, which reacts to changing
circumstances.
5 Optimized The organization undertakes continuous process
process improvement, with proactive management.

For any given process, in any given organization, we can measure which
maturity level applies. We can also use this scale to define the level to which
the organization should aspire. It is not a given that everyone should strive
for Level 5 in every process – this might well be excessive in certain situations.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
88 PRACTICAL DIGITAL PRESERVATION

Digital preservation process perspectives


Having defined our generic maturity levels, we can now identify the set of
processes that constitute a digital preservation capability. There are many
ways in which we might choose to do this. A good starting point is provided
by the various standards, discussed in the previous section, which are
emerging to define best practice for digital preservation, by identifying the
attributes that make a ‘trusted digital repository’. Although each standard
may arrange and define these attributes in slightly different ways, they can
all be mapped to a common set of requirements.
Table 4.10 distils these core requirements into a set of ten process
perspectives.

Table 4.10 Process perspectives


Process
perspective Definition
A Organizational Governance, organizational structure and resourcing of the repository,
viability including financial and staff management
B Stakeholder Processes to engage with stakeholders within and external to the
engagement repository, including content depositors and users
C Legal basis Management of contractual, licensing, and other legal rights and
responsibilities
D Policy Policies, strategies, and procedures which govern the operation and
framework management of the repository
E Acquisition Processes to acquire and ingest content into a repository
and ingest
F Bitstream Processes to ensure preservation at the bitstream level of all stored
preservation content over time
G Logical Processes to ensure the continued accessibility of the logical content
preservation over time
H Metadata Processes to create and maintain all metadata required to support
management management and use of the repository
I Dissemination Processes to enable discovery and dissemination of stored content
within the designated user community
J Infrastructure Physical and technical infrastructure, including security, required to
support the repository

Together, these process perspectives define the set of resources, policies,


processes and systems that are required to provide a digital preservation
capability. A more detailed definition can be defined for each maturity level
of every process perspective: to illustrate the principle, we can consider what
might constitute a ‘basic’ level of maturity (e.g. Level 3) for each of these
processes (Table 4.11).

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MODELS FOR IMPLEMENTING A DIGITAL PRESERVATION SERVICE 89

Table 4.11 A basic preservation capability


Process Level 3 definition
A Organizational • Staff have assigned responsibilities, and the time to undertake
viability them
• A suitable budget has been allocated
• Staff development requirements have been identified and
funded
B Stakeholder • Key stakeholders have been identified
engagement • Objectives and methods of communication have been identified

C Legal basis • Key legal rights and responsibilities, together with their owners,
have been identified
D Policy framework • A written, approved digital preservation policy exists

E Acquisition and • An acquisition policy exists which defines the types of digital
ingest content which may be acquired
• A documented accession and ingest procedure exists, including
basic guidance for depositors
• Some individual tools are used to support accession and ingest

F Bitstream • Dedicated storage space on a network drive, workstation, or


preservation removable media
• At least three copies maintained of each object, with back-up to
removable media
• Basic integrity checking performed
• Virus checking performed
• Existing access controls and security processes applied

G Logical • Basic characterization capability exists, allowing at least format


preservation identification
• Ad hoc preservation planning takes place
• Ad hoc preservation actions can be performed if required
• Ability to manage multiple manifestations of digital objects
H Metadata • Documented minimum metadata requirement exists
management • Consistent approach to organization of data and metadata
implemented
• Metadata stored in a variety of forms using spreadsheets, text
files or simple databases
• Capability exists to maintain persistent links between data and
metadata
• Persistent unique identifiers are assigned and maintained for all
digital objects

I Dissemination • Basic finding aids exist for all digital content


• Users can view or download data and metadata, either online or
on-site
J Infrastructure • Sufficient storage capacity is available, and plans exist to meet
future storage needs
• IT systems are documented, supported and fit for purpose

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
90 PRACTICAL DIGITAL PRESERVATION

This might be considered the minimum standard for any organization to


provide a genuine digital preservation service. For many, this may be an
entirely appropriate target, sufficient to meet their objectives. Others may
wish to achieve greater levels of capability in particular areas. Example
definitions for all three capability levels (Levels 3–5) within each process
perspective are provided in Appendix 2.
An organization can then be assessed according to which maturity level it
has achieved for each process perspective. An example of such an assessment
is illustrated in Figure 4.4.

Figure 4.4 Measuring maturity levels for different process perspectives

Organizations may find it helpful to assess themselves against such a maturity


model at various points, to understand their current level of maturity, define
their aspirations, and review their progress towards achieving those goals. As
previously mentioned, it should not be assumed that these aspirations should
always be to the highest possible level; in practice, most organizations wish to
define different target levels for different process perspectives, and in many
cases a relatively modest level may be entirely appropriate. The value of such
maturity models lies primarily in providing a framework for thinking about
digital preservation as a broad spectrum of acceptable capabilities, rather than
a single, and almost certainly unobtainable, ideal of curatorial perfection. By
doing so, it should help organizations to think about what ‘good enough’
preservation looks like, in their own particular circumstances.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MODELS FOR IMPLEMENTING A DIGITAL PRESERVATION SERVICE 91

The value of maturity models for digital preservation is increasingly being


recognized, with a number of other approaches now being mooted. For
example, Becker et al. (2011) take an approach rooted in the methodologies of
IT enterprise architecture, while Tessella plc (2012) proposes a model focused
on assessing the maturity of repository software systems. Meanwhile, the
National Digital Stewardship Alliance is developing the concept of levels of
digital preservation.51

4.9 Case studies


The following case studies illustrate the practical application of each of the
models for implementing a digital repository discussed in this chapter.

Case study 1 The minimal approach applied to English Heritage’s


digital archaeological archives
The in-house archaeological team at English Heritage, which advises the UK
Government on England’s historic environment, has been generating digital
archives since the 1970s, and had evolved a range of procedures for managing
them.52 In the late 1990s, the then Centre for Archaeology (CfA) began to
devise a comprehensive digital archiving strategy,53 and an accompanying
programme to put this strategy into practice.54 Having no dedicated
resources, this programme had to be developed and operated with a minimal
budget and only a proportion of the time of one staff member (the author). As
such, the minimal approach was the only available option.
The Digital Archiving Programme (DAP) had the advantage of an
internally generated archive, with a very consistent structure and range of
formats. Detailed procedures were developed for appraising new content and
preparing it for accession. The digital repository itself centred on a metadata
database called CAMS (the CfA Metadata System), developed by the author
using Microsoft Access 97. This included custom modules written in Visual
Basic to automate key parts of the ingest process. Its underlying information
model also incorporated the concept of a digital object possessing multiple
technical manifestations, as discussed in detail in Chapter 8, ‘Preserving
digital objects’. This was prompted by the decision to normalize on ingest to
a restricted set of archival file formats, to simplify future preservation
management, while also always retaining a manifestation in its original
format. The longevity of the CfA digital archives meant that they included a

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
92 PRACTICAL DIGITAL PRESERVATION

number of obsolete formats, such as word processed documents created


using WordStar software. The DAP therefore undertook extensive format
migration, using freeware conversion tools.
The first stage of accession was to create a full metadata record for the new
content in CAMS (Figure 4.5). Descriptive metadata for the collection and its
constituent objects were first entered manually, together with technical
information about the extant manifestations of each object, and the migration
pathways used to create them. File level metadata was then generated
automatically by CAMS. The user selected the folder containing the data to
be ingested, and CAMS would automatically capture basic technical
information (file name, file size, file date and path name) and generate an
MD5 checksum for every file. It also used custom code to automatically
identify the format of each file using internal signatures, and extract key
technical metadata, such as the pixel dimensions of images – this module
relied on a rudimentary internal format registry, and was thus a precursor for
PRONOM and DROID. A manual step was then required to associate each
file with its parent manifestation.
As the final stage of ingest, three copies of the data were written to

Figure 4.5 The CAMS main switchboard and object manifestations screens (English
Heritage)

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MODELS FOR IMPLEMENTING A DIGITAL PRESERVATION SERVICE 93

removable media – a preservation master on digital linear tape, with a


security copy and working copy on CD-R – and stored in geographically
separate locations. These media choices reflected the available options in
2000, and were designed to avoid using a single media type. At the time, the
comparatively high cost and low capacity of network storage also dictated
the use of removable media.55
The CfA digital archive made only limited provision for public access at
the time – this element of its function was being developed in conjunction
with the National Monuments Record and the Archaeology Data Service, the
bodies with primary responsibility for access to archaeological archives in
England – although archival content and metadata could be made available
for dissemination on demand. Beyond this, the core functions of a digital
repository, including accession and ingest, and preservation management,
were all achieved operationally, using off-the-shelf technology (albeit with a
small degree of custom development) and minimal resources.

Case study 2 The commercial off-the-shelf approach at the


Wellcome Library
The Wellcome Library is part of the Wellcome Trust, an independent charity
that funds biomedical research and works to support the public
understanding of science. The Library’s stated aims are to provide ‘insight
and information to anyone seeking to understand medicine and its role in
society, past and present’.56 It comprises one of the largest resources for the
study of medical history in the world, with more than 38,000 visits each year
from users ranging from academics, students, health professionals,
consumers, journalists, artists and members of the general public. As well as
printed collections the Library holds extensive series of archive and
manuscript material, and iconographic, moving image and sound collections.
The Wellcome Library has been awarded ‘designated’ status by the Museums,
Libraries and Archives Council, in recognition that its collections are a vital
part of the UK’s national cultural and artistic heritage.
In 2009 the Library began a five-year programme of transformation, to
create a digital library that will enable online open access to its collections,
and develop it as a major cultural destination as well as an internationally
significant academic research library. It plans to do so through targeted
acquisition of new content, strategic digitization of its analogue holdings, and
by providing expert interpretation of its holdings. Its digitization plans are

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
94 PRACTICAL DIGITAL PRESERVATION

ambitious – it intends to digitize over 30 million images in five years. The


finished library will comprise a digital repository to manage and preserve its
digital assets, systems to support the creation and ingest of diverse types of
digitized and born-digital content, and an online public access interface to the
collection.57
A number of the components required to build the Wellcome’s digital
library were already in place before the formal programme started, including
its catalogues and online search systems. Work to develop a digital repository
was also already well advanced. For this repository, the Library opted for a
commercial, off-the-shelf solution, and after a procurement exercise selected
Tessella’s SDB. The initial implementation of the system was launched in
2009. However, as more ambitious plans for the development of the digital
library coalesced, it became clear that the then-current version of SDB did not
meet all the anticipated requirements for the management of digitized
content. After a feasibility study58 conducted in 2010 concluded that SDB
could be extended to meet those requirements, Wellcome commissioned
Tessella to deliver a new version, SDB 4, in 2011. The enhanced repository
went ‘live’ in 2012. SDB provides a number of vital pieces of functionality for
the production and management of digital content including:

• automated workflows to ingest new content into SDB


• a database to store administrative and descriptive metadata describing
the objects stored in SDB
• workflows to characterize digital objects, using tools such as JHOVE,
DROID and PRONOM (see Chapter 6, ‘Accessioning and ingesting digital
objects’)
• the ability to share data with other systems, such as the workflow
tracking system Goobi
• workflows to undertake preservation planning and format migration,
building on the Planets framework
• continuous integrity checking of all stored content
• a programmatic interface to provide access to content, which will be used
by the digital delivery system
• a means to automatically export administrative metadata, such as unique
SDB identifiers, which is then used by other systems to deliver files to
users
• administrative tools to enable library staff to manage the repository.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MODELS FOR IMPLEMENTING A DIGITAL PRESERVATION SERVICE 95

SDB provides the software to manage the repository, but another key
component that had to be developed was its storage system. For this, the
Library uses the Wellcome Trust’s existing corporate storage infrastructure, a
storage area network (SAN) using fast hard disks, which are mirrored
between sites to improve data security. Each file ingested into the repository
is stored on the main server and mirrored to a second server, as well as being
subject to a regular back-up routine. SDB periodically checks the integrity of
every file on the main server – in the event of a failure being detected, the
damaged file can be repaired from the mirror copy.
One reason for using hard disk as the primary storage medium for the
repository is that, in order to reduce the volumes of digitized data being
stored, the Library uses a single manifestation of each digital object, in
JPEG2000 format, to serve as both the preservation master and access copy.
The presentation system requests JPEG2000 format content from SDB and
converts it on the fly to JPEG images for presentation. This makes it
imperative for the repository to be able to retrieve data as quickly as possible,
rather than opting for a potentially cheaper, but also much slower, technology
such as tape. For additional speed of delivery a dynamic cache is used that
holds and delivers most requested content without the need to query SDB.
Being based on enterprise-grade, commodity hardware the Library’s
storage technology is also easily scalable, an important consideration when
contemplating mass digitization programmes. The choice of storage
technology was based on the recommendation of the Trust’s in-house IT
department, working closely with the Digital Services staff to understand
their requirements. Similarly, procurement of the repository software
involved a very inclusive project team, representing all interested parties and
with a wide range of specialisms. Such a collaborative approach, which
secured buy-in from IT and other departments from the outset, is essential to
the success of a project of this nature.

Case Study 3 The outsourced approach at Greater Manchester


Archivists Group
In 2007, a small group of archivists from archive services within the Greater
Manchester area in the UK began investigating digital preservation, and
especially solutions for archiving websites of local interest. After consultation
with district archive services, it was determined that a number of categories
of website should be considered for archiving, including local government

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
96 PRACTICAL DIGITAL PRESERVATION

websites, community websites, the diocesan website, and sites or pages


relating to specific events or topics of local interest.
The group defined its requirements for web archiving, considering issues
such as scope, quality control, storage, end-user access and usability. After
assessing the available options, the group decided to instigate a pilot project
using the Internet Archive’s Archive-It service (described in Chapter 5, ‘Selecting
and acquiring digital objects’). The pilot was undertaken in March 2008,
archiving 15 local government and diocesan websites. By virtue of being
incorporated within the Internet Archive, each was also made publicly available.
The group found Archive-It to be a user-friendly, flexible, all-in-one
solution with good support provided by the Internet Archive and the wider
community of users. The cost of a basic subscription would also be affordable
if shared between the participating organizations but, without a formal
partnership agreement, it proved difficult to establish a sustainable funding
model.
Since completing the pilot, the group has instead chosen to work with the
British Library’s UK Web Archive, which collects sites from across the UK
web domain.59 While the group does not control the archiving process,
instead focusing on suggesting sites for archiving and gaining the necessary
permissions from the owners, it has found this to be an effective approach: at
least 15 sites have been selected for archiving in this way, and the group has
also been using social media to encourage community groups to nominate
their own sites for archiving. Unlike Archive-It, the UK Web Archive also has
the advantage of providing long-term preservation for the content.
In 2011, the Greater Manchester Archivists Group participated in a pilot
project led by the UK National Archives (TNA) to explore a web archiving
model which could be used to preserve local online resources. As part of this,
three websites were archived and made available through the TNA web
archive.60 In 2012, the Greater Manchester Archives and Local Studies
Partnership was established, which will provide a framework for
collaboration and strategic development between archives and local studies
services in Greater Manchester. Underpinned by a formal MoU, this should
make it much easier to develop collaborative projects in future.61
This case study illustrates the effectiveness of collaborative approaches,
and the cost-effectiveness and flexibility possible from using third-party
services. It also shows the importance of developing a strong framework for
partnership, with the necessary senior management buy-in, and of being
adaptable to changing circumstances and opportunities.62

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MODELS FOR IMPLEMENTING A DIGITAL PRESERVATION SERVICE 97

Case Study 4 The hybrid approach at the Burritt Library


The Burritt Library at Central Connecticut State University (CCSU) began
investigating digital preservation solutions out of necessity in 2009, as a result
of embarking on a new digitization programme, which was accumulating
large numbers of uncompressed image files.63 The library wished to
implement a solution that complied with OAIS and PREMIS (see Chapter 7,
‘Describing digital objects’), and began by surveying a range of available
options. The first to be investigated was a commercial service, OCLC’s Digital
Archive, but this was discounted on grounds of cost, and concerns as to
whether it would be suitable for objects not contained within the Library’s
digital collections management system. Next, the Library considered
LOCKSS but, although this was perceived to require much lower direct costs,
the Library was unable to find partners for its preferred option of a private
LOCKSS network, and did not wish to implement such a solution alone,
without the benefit of vendor support.
At this point, the Library decided to investigate an in-house approach,
beginning with storage, and was inspired to look at the Cloud as an option.
An initial comparison of costs between OCLC and Amazon’s S3 cloud storage
service suggested that the latter would be less than one-third of the cost of the
former.
A custom database was designed to manage archival content, using the
open-source MySQL database software. To generate PREMIS metadata, the
Library chose Statistics New Zealand’s prototype PREMIS Creation Tool,
which uses JHOVE, DROID and the National Library of New Zealand’s
Metadata Extractor. However, in order to increase automation, they are
considering moving to the File Information Tool Set (FITS), which integrates
DROID, JHOVE and the NLNZ Metadata Extractor with a number of other
characterization tools.
The Library of Congress’ BagIt tool (see Chapter 7, ‘Describing digital
objects’) is used to package objects and metadata into a standard Submission
Information Package (SIP) for ingest. A set of simple scripts was then written
to run the metadata extractors automatically, package everything into SIPs
(see Chapter 6, ‘Accessioning and ingesting digital objects’) and run antivirus
checks. Standard back-up software is then used to periodically transfer the
SIPs to the S3 storage.
The Library reports that the system is operating satisfactorily, although it
is seeking greater levels of automation. Staff are also investigating
Archivematica as a more fully featured solution (see Appendix 3). The

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
98 PRACTICAL DIGITAL PRESERVATION

current system lacks any form of preservation planning or action facilities, or


end-user access. Nonetheless, it shows that an institution can accomplish
much with minimal resources – CCSU’s system is based on open-source
software, and expertise available in any small IT department, coupled with
one commercial service. By doing so, the staff have been able to collect
content and gain valuable experience in the operation of a digital repository.64

Case study 5 The hybrid approach at the LSE Library


The London School of Economics and Political Science (LSE) is one of the
foremost universities for the social sciences in the world. The LSE Library (the
British Library of Political and Economic Science) was founded in 1896, as a
library for the university as well as a resource for researchers in the social
sciences, and is one of five designated national research libraries in the UK.65
Containing over 4 million books and journals, it collects comprehensively in
economics and the other core social sciences, while its special collections
include government publications, the publications of inter-government
organizations, historical pamphlets and statistics. It also includes the archives
of the Fabian Society, the Liberal Party, and the personal papers of individual
politicians.
The LSE Library has a growing digital collection, which it categorizes into
three types:

• outputs from digitization projects


• born-digital archives
• research outputs.

The composition of these is constantly evolving; for example, within the


born-digital category the Library is scoping the acquisition of official
publications in electronic format, as well as investigating web archiving. The
research outputs are managed in three institutional repositories, running on
an EPrints platform:

• LSE Research Online is an open access repository of LSE staff research,


including journal articles, book chapters and working papers.
• LSE Theses Online contains completed PhD theses of LSE postgraduate
students.
• LSE Learning Resources Online holds electronic teaching materials.

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
MODELS FOR IMPLEMENTING A DIGITAL PRESERVATION SERVICE 99

In 2009, the Library launched its Digital Library Management and


Infrastructure Development Programme, a major initiative to build the
Library’s capacity to collect, manage, preserve and provide access to its
digital collections. In contrast to the Wellcome Library, the LSE Library chose
to build their digital library using predominantly open-source components,
but integrated with existing systems, including commercial products. Thus
their approach is representative of the hybrid strategy.
The relationship between the established repositories and the new digital
library is expected to evolve over time. In the first instance, the Library is
exploring how to provide in-place preservation services to them, while
retaining existing interfaces and workflows; in the long term, a greater degree
of convergence is anticipated.
A notable aspect of the Library’s approach has been that although a core
team was established to undertake the research and development for the
system, its operation is being embedded within existing teams across the
Library. Thus, for example, the IT team will support the systems and
infrastructure; librarians and archivists will undertake collections
management; and, perhaps uniquely, the existing collection care team will
assume responsibility for ingest and digital preservation alongside the
analogue collections.
The Library began by undertaking an audit of its digital collections,
including a risk assessment using the DRAMBORA tool. The Library chose a
representative cross-section of risks, ranging from high-level organizational
threats, to low-level technical risks. The intent was to capture a snapshot of
the overall state of the collections, and identify priority areas for action, rather
than comprehensively analyse each risk in every area. The audit process
identified ten key risks, which were documented in the form of a risk register.
The Library also undertook an initial analysis of their user requirements,
in parallel with an investigation of current best practice within the wider
digital preservation community. Based on this work, staff began to draft a
detailed set of functional requirements, as well as a draft metadata
specification. At the same time, they began development of a set of digital
collection policies, including deposit agreements and content licensing
policies, which would contribute additional requirements into the latter
stages of the programme. Rather than create a standalone digital preservation
policy, the Library is planning to update its existing collection preservation
policy to cover digital material. This exemplifies its approach to embedding
digital preservation operations across the Library, rather than regarding it as

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.
100 PRACTICAL DIGITAL PRESERVATION

a separate, silo activity. It also illustrates how, at least in part, staff are
developing and refining policies to reflect the lessons of practical experience.
While this book generally advocates working from policy to practice, rather
than vice versa, this is an important reminder that policy must always be
rooted in, and refined in the light of, practice.
From these activities there emerged an overarching technical strategy,
based on the incremental development of a modular digital library
architecture. Shared components would be built to provide basic
functionality required across the system, such as storage, integrity checking,
and managing unique identifiers, while more specific tools would be
employed to suit the specialized needs of different types of material in other
areas, such as ingest and user access. Such a modular approach would also
make it easier to integrate existing systems, such as catalogues. The Library
did not wish to undertake new research and development itself, but to make
use of current best-of-breed technologies developed by others. Nonetheless,
it acknowledged that significant technical resource would still be required to
customize, integrate and configure these varied components.
The Library’s functional requirements comprised 24 criteria grouped into
seven functional areas approximating to the OAIS model, as follows:

• data model
• ingest
• data management
• administration
• metadata
• storage
• access.

Staff undertook a detailed comparison of the three major open-source digital


repository platforms available at the time (DSpace, EPrints and Fedora)
against these requirements. The latest version of each system was installed on
an identically specified virtual machine, tested, and scored on a
‘red–amber–green’ scale against each functional criterion.
The LSE Library team drew some interesting conclusions from their study.
They noted a significant difference in approach between Fedora and the other
two systems, concluding that Fedora was inherently more flexible, albeit at
the cost of greater work to set up a usable repository. They felt that DSpace
and EPrints were focused on the management of open access publications,

EBSCOhost: eBook Collection (EBSCOhost) printed on 4/14/2025 11:23:06 PM UTC via UNIVERSIDAD NACIONAL ABIERTA Y A DISTANCIA - UNAD. All use subject to
https://www.ebsco.com/terms-of-use.

You might also like