0% found this document useful (0 votes)
27 views65 pages

9449692

Uploaded by

egersmunnsjr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views65 pages

9449692

Uploaded by

egersmunnsjr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Visit https://ebooknice.

com to download the full version and


explore more ebooks

(Ebook) Preserving digital materials by Ross Harvey


ISBN 9783110253689, 9783110253696, 3110253682,
3110253690

_____ Click the link below to download _____


https://ebooknice.com/product/preserving-digital-
materials-4724846

Explore and download more ebooks at ebooknice.com


Here are some recommended products that might interest you.
You can download now and explore!

(Ebook) Biota Grow 2C gather 2C cook by Loucas, Jason; Viles, James


ISBN 9781459699816, 9781743365571, 9781925268492, 1459699815,
1743365578, 1925268497

https://ebooknice.com/product/biota-grow-2c-gather-2c-cook-6661374

ebooknice.com

(Ebook) Matematik 5000+ Kurs 2c Lärobok by Lena Alfredsson, Hans


Heikne, Sanna Bodemyr ISBN 9789127456600, 9127456609

https://ebooknice.com/product/matematik-5000-kurs-2c-larobok-23848312

ebooknice.com

(Ebook) SAT II Success MATH 1C and 2C 2002 (Peterson's SAT II Success)


by Peterson's ISBN 9780768906677, 0768906679

https://ebooknice.com/product/sat-ii-success-
math-1c-and-2c-2002-peterson-s-sat-ii-success-1722018

ebooknice.com

(Ebook) Master SAT II Math 1c and 2c 4th ed (Arco Master the SAT
Subject Test: Math Levels 1 & 2) by Arco ISBN 9780768923049,
0768923042

https://ebooknice.com/product/master-sat-ii-math-1c-and-2c-4th-ed-
arco-master-the-sat-subject-test-math-levels-1-2-2326094

ebooknice.com
(Ebook) Cambridge IGCSE and O Level History Workbook 2C - Depth Study:
the United States, 1919-41 2nd Edition by Benjamin Harrison ISBN
9781398375147, 9781398375048, 1398375144, 1398375047

https://ebooknice.com/product/cambridge-igcse-and-o-level-history-
workbook-2c-depth-study-the-united-states-1919-41-2nd-edition-53538044

ebooknice.com

(Ebook) Digital Curation by Gillian Oliver, Ross Harvey ISBN


9780838913857, 9780838914298, 9780838914304, 9780838914311,
0838913857, 0838914292, 0838914306, 0838914314, 2015043274

https://ebooknice.com/product/digital-curation-36371378

ebooknice.com

(Ebook) Digital Curation in the Digital Humanities: Preserving and


Promoting Archival and Special Collections by Arjun Sabharwal ISBN
9780081001431, 0081001436

https://ebooknice.com/product/digital-curation-in-the-digital-
humanities-preserving-and-promoting-archival-and-special-
collections-5138070
ebooknice.com

(Ebook) The Preservation Management Handbook by Ross Harvey, Martha R.


Mahard ISBN 9781538109014, 1538109018

https://ebooknice.com/product/the-preservation-management-
handbook-34485152

ebooknice.com

(Ebook) Fundamental building materials by K Ward-Harvey ISBN


9781599429540, 1599429543

https://ebooknice.com/product/fundamental-building-materials-4101584

ebooknice.com
Preserving digital materials 2nd Edition Ross Harvey
Digital Instant Download
Author(s): Ross Harvey
ISBN(s): 9783110253696, 3110253690
Edition: 2
File Details: PDF, 1.46 MB
Year: 2011
Language: english
Ross Harvey
Preserving Digital Materials
Current Topics
in Library and Information Practice

De Gruyter Saur
Ross Harvey

Preserving
Digital Materials
2nd Edition

De Gruyter Saur
ISBN 978-3-11-025368-9
e-ISBN 978-3-11-025369-6
ISSN 2191-2742

Library of Congress Cataloging-in-Publication Data

Harvey, D. R. (Douglas Ross), 1951-


Preserving digital materials / Ross Harvey. -- 2nd ed.
p. cm. -- (Current topics in library and information practice)
Includes bibliographical references and index.
ISBN 978-3-11-025368-9 (acid-free paper) -- ISBN 978-3-11-025369-6 (ebook)
1. Digital preservation. I. Title.
Z701.3.C65H37 2011
025.8‘4--dc23
2011032053

Bibliographic information published by the Deutsche Nationalbibliothek


The Deutsche Nationalbibliothek lists this publication in the Deutsche
1DWLRQDOELEOLRJUD¿HGHWDLOHGELEOLRJUDSKLFGDWDDUHDYDLODEOHLQWKH,QWHUQHW
at http://dnb.d-nb.de.
© 2012 Walter de Gruyter GmbH & Co. KG, Berlin/Boston
Typesetting: Dr. Rainer Ostermann, München
Printing: Hubert & Co. GmbH & Co. KG, Göttingen
’3ULQWHGRQDFLGIUHHSDSHU
Printed in Germany
www.degruyter.com
Contents

List of figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Chapter 1
What is Preservation in the Digital Age? Changing Preservation
Paradigms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Changing paradigms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
The need for a new preservation paradigm . . . . . . . . . . . . . . . . . . . . . . . 10
Changing definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Preservation definitions in the digital world . . . . . . . . . . . . . . . . . . . . . . 16
What exactly are we trying to preserve? . . . . . . . . . . . . . . . . . . . . . . . . . 21
How long are we preserving them for? . . . . . . . . . . . . . . . . . . . . . . . . . . 23
What strategies and actions do we apply? . . . . . . . . . . . . . . . . . . . . . . . . 24
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Chapter 2
Why do we Preserve? Who Should do it? . . . . . . . . . . . . . . . . . . . . . . 25
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Why preserve digital materials? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Professional imperatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
New stakeholders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
How much data have we lost? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Current state of awareness of digital preservation problems. . . . . . . . . . . 37
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Chapter 3
Why There’s a Problem: Digital Artifacts and Digital Objects . . . . . . . 39
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Modes of digital death . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Digital storage media. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Magnetic media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Optical disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
The future for digital storage media . . . . . . . . . . . . . . . . . . . . . . . . . 49
VI Contents

Digital objects – more than digital artifacts. . . . . . . . . . . . . . . . . . . . . . . 50


Loss of functionality of access devices . . . . . . . . . . . . . . . . . . . . . . . 51
Loss of manipulation and presentation capabilities . . . . . . . . . . . . . . 52
Weak links in the documentation chain and loss of contextual
information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Chapter 4
Selection for Preservation – The Critical Decision. . . . . . . . . . . . . . . . 56
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Selection for preservation, cultural heritage, and professional practice. . . 57
Selection criteria traditionally used by libraries and archives . . . . . . . . . . 59
Why traditional selection criteria do not apply to digital materials . . . . . . 63
IPR, context, stakeholders, and lifecycle models. . . . . . . . . . . . . . . . . . . 65
Intellectual property rights and legal deposit. . . . . . . . . . . . . . . . . . . 65
Context and community. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Stakeholder input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Value of lifecycle models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Developing selection frameworks for preserving digital materials . . . . . . 69
Some selection frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
How much to select? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Chapter 5
What Attributes of Digital Materials Do We Preserve? . . . . . . . . . . . . 75
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Digital materials, technology, and data. . . . . . . . . . . . . . . . . . . . . . . . . . 77
The importance of preserving context. . . . . . . . . . . . . . . . . . . . . . . . . . . 79
The OAIS Reference Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
The role of metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Preservation metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Preservation metadata standards. . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Persistent identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Authenticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Significant properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Research into authenticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Functional Requirements for Evidence in Recordkeeping Project
(Pittsburgh). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
InterPARES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Trusted digital repositories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Contents VII

Chapter 6
Overview of Digital Preservation Strategies . . . . . . . . . . . . . . . . . . . . 99
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Historical overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Who is doing what?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Criteria for effective strategies and practices. . . . . . . . . . . . . . . . . . . . . . 107
Broader concerns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Policies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Sustainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Typologies of principles, strategies, and practices. . . . . . . . . . . . . . . . . . 114
A typology of digital preservation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

Chapter 7
‘Preserve Technology’ Approaches: Tried and Tested Methods. . . . . . 121
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
‘Non-solutions’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Do nothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Storage and handling practices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Durable/persistent digital storage media . . . . . . . . . . . . . . . . . . . . . . 127
Analogue backups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Digital archaeology and digital forensics . . . . . . . . . . . . . . . . . . . . . 130
‘Preserve technology’ approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Technology preservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Technology watch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Emulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
The Universal Virtual Computer . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

Chapter 8
‘Preserve Objects’ Approaches: New Frontiers? . . . . . . . . . . . . . . . . . 140
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
‘Preserve Objects’ approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Bit-stream copying, refreshing, and replication . . . . . . . . . . . . . . . . . . . . 142
Bit-stream copying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Refreshing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Standard data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
File format registries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
VIII Contents

Standardizing file formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150


Restricting the range of file formats . . . . . . . . . . . . . . . . . . . . . . . . . 152
Developing archival file formats . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Viewers and migration on request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Encapsulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Combining principles, strategies, and practices. . . . . . . . . . . . . . . . . . . . 165
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

Chapter 9
Digital Preservation Initiatives and Collaborations . . . . . . . . . . . . . . . 168
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Collaboration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Typologies of digital preservation initiatives . . . . . . . . . . . . . . . . . . . . . 171
International initiatives and collaborations . . . . . . . . . . . . . . . . . . . . . . . 172
International services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
The Internet Archive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
JSTOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
DuraSpace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
LOCKSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
MetaArchive Cooperative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
International alliances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
UNESCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
PADI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
OCLC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
CAMiLEON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
International Internet Preservation Consortium . . . . . . . . . . . . . . 183
Regional initiatives and collaborations . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Regional services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
NEDLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Regional alliances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
ERPANET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
European Commission-funded projects . . . . . . . . . . . . . . . . . . . 186
Digital Recordkeeping Initiative . . . . . . . . . . . . . . . . . . . . . . . . 188
National initiatives and collaborations . . . . . . . . . . . . . . . . . . . . . . . . . . 187
National services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
AHDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Florida Digital Archive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
National alliances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
Digital Curation Centre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
Digital Preservation Coalition . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Contents IX

NDIIPP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
National Digital Stewardship Alliance . . . . . . . . . . . . . . . . . . . . 193
HathiTrust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
Sectoral initiatives and collaborations . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Sectoral services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Cedars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
Sectoral alliances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
JISC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

Chapter 10
Challenges for the Future of Digital Preservation . . . . . . . . . . . . . . . . 199
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
What have we learned so far? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Four major challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Challenge 1: managing digital preservation . . . . . . . . . . . . . . . . . . . 206
Challenge 2: funding digital preservation . . . . . . . . . . . . . . . . . . . . . 208
Challenge 3: peopling digital preservation . . . . . . . . . . . . . . . . . . . . 211
Challenge 4: making digital preservation fit . . . . . . . . . . . . . . . . . . . 213
Research and digital preservation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Conclusion: the future of digital preservation . . . . . . . . . . . . . . . . . . . . . 219

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
List of Figures

Figure 1.1 Selected Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19


Figure 1.2 Selected Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Figure 3.1 Threats to Digital Continuity . . . . . . . . . . . . . . . . . . . . . . . 43
Figure 3.2 Comparison of Data Carriers . . . . . . . . . . . . . . . . . . . . . . . 44
Figure 3.3 Sample Generic Figures for Lifetimes of Media . . . . . . . . . 47
Figure 5.1 Deciding on Essential Elements . . . . . . . . . . . . . . . . . . . . . 93
Figure 6.1 Factors to Consider when Selecting Digital
Preservation Technologies. . . . . . . . . . . . . . . . . . . . . . . . . 109
Figure 7.1 Environmental Storage Conditions for Some Digital
Storage Media. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Figure 7.2 Storage and Handling of Digital Storage Media . . . . . . . . . . 127
Figure 8.1 Formats – Open and Proprietary . . . . . . . . . . . . . . . . . . . . 145
Figure 9.1 Initiatives and Collaborations . . . . . . . . . . . . . . . . . . . . . . 172
Introduction
This book is a revision of Preserving Digital Materials published in 2005
(Harvey, 2005b). The first edition was well received, one author describing it
as a ‘comprehensive examination of the landscape of preservation’ (Ross, 2007).
An update is timely, because there has been significant change in the field
since 2005 as the experience of practitioners expands and the findings of re-
searchers accumulate. This second edition has the same aims as the first: to
provide an introduction to the preservation of digital materials in order to inform
practice in cultural heritage institutions, and to provide a framework within
which to reflect on digital preservation issues. It is intended for a similar audi-
ence as the first edition – information professionals who seek a reference text,
practitioners who want to reflect on the issues, and students in the field of digital
preservation. It differs from the first edition in four principal respects:
– It provides a more international perspective. The first edition described
Australian activities in detail, whereas the second edition gives attention to
major initiatives in the UK, the EU and the US since 2005.
– It expands the audience to include information professionals working in
environments other than libraries and recordkeeping organizations, as well
as those who create digital materials. Digital preservation is considered in-
creasingly as being within the purview of scientists, scholars and individu-
als, not just of professionals employed in the institutional settings of librar-
ies and archives.
– It takes account of developments since 2005. These include, in particular,
the consolidation of digital preservation practice (so that we can now begin
to discuss ‘standard’ practice), developments that result from the activities
of bodies such as the Joint Information Systems Committee (JISC) and the
Digital Curation Centre (DCC) in the UK, and research and development
projects funded by the EU and, in the US, the Library of Congress and the
National Science Foundation (NSF). Specific topics that are added to or
given greater emphasis in the second edition include cost modeling and the
cost of digital preservation, skills identification and education and training
requirements and initiatives, personal digital archiving, and models such as
lifecycle models and the OAIS Reference Model.
– It also takes account of significant publications since 2005, such as Long-
Term Preservation of Digital Documents: Principles and Practices (Borg-
hoff et al., 2006), Digital Preservation (Deegan and Tanner, 2006), Preserv-
ing Digital Information (Gladney, 2007), the Workbook on Digital Private
Papers (Paradigm Project, 2008), and my own Digital Curation (Harvey,
2010).
2 Introduction

Preserving Digital Materials investigates the current practice of those who


preserve digital materials – information professionals (librarians, recordkeep-
ing professionals, museum professionals), scholars and scientists, and indi-
viduals. A strong claim can be made that preservation of digital materials is the
single most serious issue faced by information professionals and that it is also
of considerable importance to scholars and scientists, as well as to individuals.
Much information about digital preservation is available in print and on the
web, and the quantity is increasing, but most practitioners do not have the time
or the technical expertise to evaluate and synthesize it. Preserving Digital
Materials fills this gap in the literature. It provides an introduction to the prin-
ciples, strategies and practices applied by information professionals (librarians,
recordkeeping professionals, museum professionals), scholars and scientists,
and individuals to the preservation of digital materials. It aims to improve digital
preservation practice by focusing on current practice, taking stock of what we
know about the principles, strategies and practices that prevail, and describing
the outcomes of recent and current research.
Digital preservation poses many challenges. From a university newsletter
comes this comment: ‘why not embrace the digital future now? The issue of
preservation is one of the main obstacles’ (Shaw, 2010). There is, increasingly,
comment on digital preservation concerns in the popular press and the blogo-
sphere, of which the following is just one of many examples: Thorpe (2011)
reports in The Observer on the ‘Race to save digital art from the rapid pace of
technological change’. The National Library of Australia’s Digital Preserva-
tion Policy neatly sums up the challenges that institution faces in keeping digital
information materials accessible, all of which are faced by a very wide range
of institutions and individuals:

– The volume of materials to be maintained


– The diverse and frequently changing range of file formats and standards, and the
changing availability of hardware, software and other technology required for access
– Widespread use of relatively unstable carriers, subject to short-term media deterio-
ration and data corruption or loss
– The need for preservation decisions to be made early in the life cycle of digital objects
– For some materials, relatively long delays between their creation and their being
acquired and controlled …
– Uncertainty about the significant properties or essential characteristics that must be
maintained for some digital resources
– The need to maintain relationships between objects, between parts of complex ob-
jects which may be in different formats, and between objects and the metadata that
describes them
– The recurring nature of many of the threats and the short replacement cycles for …
infrastructure for managing digital collections
– Uncertainty about the strategies and techniques most likely to be effective, and the
significant time required to plan and implement any currently available strategies
for such diverse and large collections
Introduction 3

– The likely high costs of taking action, and the likely high costs of delaying or not
taking action (including the likelihood of loss of access)
– A mismatch between funding cycles and long term preservation commitments, even
for long existing institutions …, leading to the possibility that some preservation
commitments may have to be given priority over others
– Intellectual property and other rights-based constraints on preservation processes
and on the provision of access
– Administrative complexities in ensuring timely action is taken that will be cost-
effective over very long periods of time
– The need to develop and maintain suitable knowledge and systems to deal with
these challenges (National Library of Australia, 2008).

Failure to address these challenges results in the loss of significant quantities


of digital information. The paradigms that shape the environment in which we
now function have changed. By comparison with traditional practice in cultural
heritage institutions, significantly different issues are raised by the increasing
reliance in today’s society on information in digital form, both born-digital and
digitized from paper, film, or other media. The benign neglect that may have
sufficed in the past to preserve information is no longer enough; active inter-
vention is required.
The preservation of digital materials poses many challenges for which pre-
digital paradigms offer little assistance. One challenge is that preserving digital
materials requires constant maintenance, relying on complex hardware and
software that are frequently upgraded or replaced. Another challenge is the
increasing quantity and complexity of digital materials, taxing libraries and
archives systems designed to manage small numbers of simple documents. The
range of stakeholders who have an interest in maintaining digital materials for
use into the future is wide. A 2003 report begins with the words

The need for digital preservation touches all our lives, whether we work in commercial
or public sector institutions, engage in e-commerce, participate in e-government, or use
a digital camera. In all these instances we use, trust and create e-content, and expect
that this content will remain accessible to allow us to validate claims, trace what we
have done, or pass a record to future generations (NSF-DELOS Working Group on
Digital Archiving and Preservation, 2003, p.i).

These words remain as relevant now as when they were written almost ten
years ago.
We cannot expect a technological quick fix. We now appreciate that the
challenges of maintaining digital materials so they remain accessible in the
future are not just technological. They are equally bound up with organiza-
tional infrastructure, resourcing, and legal factors, and we have not yet got
the balance right. These and other factors combine to make the task difficult,
although there are clear pointers to the way ahead. As Breeding (2010, p.32)
notes, ‘while the current state of the art in digital preservation falls short of an
4 Introduction

ideal system that guarantees permanent survival, much has been done to address
the vulnerabilities inherent in digital content’.
Both the library community and the recordkeeping community (archivists
and records managers), as well as an increasing number of other groups, are
energetically seeking solutions to the challenges of digital preservation. Over
the last decade there has been increased sharing of the outcomes of research
and practice. Developments in one community have considerable potential to
assist practice in other information and heritage communities. This book goes
some way towards addressing this need by providing examples from several
different communities.
Although much high-quality information is available to information pro-
fessionals concerned with preserving materials in digital form, most notably on
the web, its sheer volume causes problems for busy information professionals,
scholars and scientists, and individuals who wish to understand the issues and
learn about strategies and practices for digital preservation. Preserving Digital
Materials is written for these time-poor information professionals, scholars
and scientists, and individuals. Its synthesis of current information, research
and perspectives about digital preservation from a wide range of sources
across many areas of practice makes it of interest to a wide range of readers ௅
from preservation administrators and managers who want a professional refer-
ence text to thinking practitioners who wish to reflect on the issues that digital
preservation raises in their professional practice. It will also be of interest to
students.
The reader should note two features of this book. Preserving Digital Mate-
rials is not a how-to-do-it manual, although it does include information about
practical applications, so it is not the place to learn how to apply the technical
procedures of digital preservation. It is not primarily concerned with digitiza-
tion and makes little distinction between information that is born-digital and
information that is digitized from physical media.
This book addresses four key questions which give the text its four-part
structure:

1. Why do we preserve digital materials?


2. What digital materials do we preserve?
3. How do we preserve digital materials?
4. How do we manage digital preservation?

Chapters 1 to 3 address the first question: why do we preserve digital materials?


These chapters examine key definitions and their relationship to ways of think-
ing about digital preservation, note some of the reasons why preservation is a
strong professional imperative for librarians, recordkeepers, scholars and scien-
tists, and individuals, indicate the extent of the preservation problem for digital
materials, and look at the reasons why a digital preservation problem exists.
Introduction 5

The question ‘what digital materials do we preserve?’ is investigated in chap-


ters 4 and 5. Chapter 4 examines the issues of selection of digital materials for
preservation, and chapter 5 notes the questions about the attributes of digital
materials we need to preserve. The question ‘how do we preserve digital mate-
rials?’ is covered in chapters 6, 7 and 8. An overview of digital preservation
strategies is provided in chapter 6, and chapters 7 and 8 describe specific strate-
gies. Aspects of the question ‘how do we manage digital preservation?’ are
noted in chapters 9 and 10. Chapter 9 describes major digital preservation ini-
tiatives and collaborations. Chapter 10 examines some of the issues that digital
preservation faces in the future.
The reader should be aware that this book presents a Western view of
preservation, a view not necessarily embraced by all cultures. The reader should
also note the words in Article 9 of the UNESCO Charter on the Preservation
of the Digital Heritage:

The digital heritage is inherently unlimited by time, geography, culture or format. It is


culture-specific, but potentially accessible to every person in the world. Minorities may
speak to majorities, the individual to a global audience. The digital heritage of all regions,
countries and communities should be preserved and made accessible, creating over
time a balanced and equitable representation of all peoples, nations, cultures and lan-
guages (UNESCO, 2004).

The first edition of Preserving Digital Materials (2005) used many Australian
examples, because Australian practice in digital preservation – from the library,
recordkeeping, audiovisual archiving, data archiving and geoscience sectors –
was often at the forefront of international best practice. This second edition of
Preserving Digital Materials provides a more international perspective, noting
major initiatives in the UK, the EU and the US since 2005. It is possible to do
this in 2011 because of the considerable quantity of material reported by these
and many other initiatives and readily available on web sites, in conference
proceedings and from other public sources.
As noted above, there is a considerable amount of high-quality information
available about preserving materials in digital form, much of it available on the
web. The accessibility that this provides is countered by the impermanence of
much web material, as noted in several chapters in this book. All URLs in this
book were correct at the time of writing.
The first edition of this book acknowledged my indebtedness to many
people, and these debts still remain. Producing the first edition I benefited from
discussions with many colleagues at that time. In particular, I acknowledged
the following individuals for their ideas and support: Tony Dean for suggesting
the example of Piltdown Man; Liz Reuben, Matthew Davies, Stephen Ellis and
Rachel Salmond for case studies; Alan Howell, of the State Library of Victoria,
and staff of the National Library of Australia, in particular Pam Gatenby, Colin
Webb, Kevin Bradley and Margaret Phillips, for their assistance with clarifying
6 Introduction

concepts. Some of the material in the first edition was based on interviews
with Australian digital preservation experts, whose assistance and encourage-
ment was invaluable: Toby Burrows, Mathew Davies, Ray Edmondson, Stephen
Ellis, Alan Howell, Maggie Jones, Gavan McCarthy, Simon Pockley, Howard
Quenault, Lloyd Sokvitne, Paul Tresize, and Andrew Wilson. Heather Brown
and Peter Jenkins provided examples, and their assistance and the permission
of the State Library of South Australia was gratefully acknowledged. Thanks
were due to Ken Thibodeau, CLIR, ERPANET and UNESCO for permission
to use their material. I acknowledged my gratitude to my then employer, Charles
Sturt University, which supported me by providing study leave in 2003. I was
fortunate to be based at the National Library of Australia as a National Library
Fellow from March to June 2003 and I greatly appreciated the generous sup-
port of its then Director-General, Jan Fullerton, and other staff of the National
Library. Finally, I acknowledged the unfailing support of Rachel Salmond in
this and others of my endeavours.
In writing the second edition of this book I have incurred new debts. In
addition to those noted for the first edition, I gratefully acknowledge students
who have enrolled in my courses on digital preservation at Yonsei University,
Seoul, and the Graduate School of Library and Information Science, Simmons
College, Boston. My ideas have been informed by conversations with people too
numerous to name, but I wish to particularly thank Jeannette Bastian, Michèle
Cloonan, Joy Davidson, Cal Lee, Michael Lesk, Martha Mahard, Seamus Ross,
Anne Sauer, Shelby Sanett, and Terry Plum. I am grateful for the support of
my current employer, the Graduate School of Library and Information Science,
Simmons College, Boston.
I must again acknowledge the unfailing support of Rachel Salmond. I owe
Rachel more than I can adequately express here for her help over three decades,
her editorial assistance and her patience with me as the preparation of this
book took over normal schedules.
Chapter 1
What is Preservation in the Digital Age?
Changing Preservation Paradigms
Introduction
To preserve, as the dictionary reminds us is to keep
safe … to maintain unchanged … to keep or maintain
intact. But the rapid obsolescence of information
technology entails the probability that any digital
object maintained unchanged for any length of time
will become inaccessible (Thibodeau, 1999)

Any discussion about the preservation of digital materials must begin with the
consideration of two interlinked areas: changing preservation paradigms, and
definitions of terms. Without a clear understanding of what we are discussing,
the potential for confusion is too great. In library and recordkeeping practice
we are moving rapidly from collection-based models, whose principles and
practices have been developed over many centuries, to models where collec-
tions are not of paramount importance and where what matters is the extent of
access provided to information resources, whether they are managed locally or
remotely. Archivists have considered, debated, and sometimes applied the con-
cept of non-custodial archives, where there is no central collection, to accom-
modate the massive increase in numbers of digital records. Librarians manage
hybrid libraries, consisting of both physical collections and distributed digital
information resources, and digital libraries. Other stakeholders with a keen inter-
est in digital preservation manage digital information in specific subject areas,
such as geospatial data or social science data. In the past this material, where it
existed, was maintained as collections of paper and other physical objects. The
practices developed and applied in libraries and archives are still largely based
on managing physical collections and cannot be applied automatically to man-
aging digital collections.
The changing models of library and recordkeeping practice require new
definitions. The old terms do not always convey useful meanings in the digital
environment and can be misleading and, on occasion, even harmful. In library
and recordkeeping practice we are changing from a preservation paradigm
where primary emphasis is placed on preserving the physical object (the arti-
fact as carrier of the information we wish to retain, for example, a CD) to one
where there is no physical carrier to preserve. What, then, does the term pres-
ervation mean in the digital environment? How has its meaning changed?
8 What is Preservation in the Digital Age?

What are the implications of these changes? The phrase benign neglect pro-
vides an example of a concept that is helpful in the pre-digital preservation
paradigm but is harmful in the new. It refers to the concept that many informa-
tion carriers made of organic materials (most notably paper-based artifacts)
will not deteriorate rapidly if they are left undisturbed. For digital materials
this concept is positively harmful. One thing we understand about information
in digital form is that actions must be applied almost from the moment it is
created, if it is to survive. Pre-digital paradigm definitions do not accommodate
new forms, such as works of art that incorporate digital technologies, and time-
defined creative enterprises such as performance art.
This chapter examines the effect of digital information on ‘traditional’
librarianship and recordkeeping paradigms, noting the need for a new preser-
vation paradigm in an environment that is dynamic and has many stakeholders,
often with competing interests. It considers the differences between born-
digital and digitized information, and defines key terms.

Changing paradigms
It is now commonplace to hear or read that we live in an information society,
of which one main characteristic is the widespread and increasing use of net-
worked computing, which relies on data. This is revolutionizing the way in
which large parts of the world’s population live, work and play, and how
libraries, archives, museums and other institutions concerned with preserving
documentary heritage function and are managed. New expectations of these
institutions are evolving.
The significance of these changes is readily illustrated by just one example.
The internet is rapidly becoming the first choice for people who are searching
for information on a subject, and a new verb, to google (derived directly from
Google, the name of a widely used internet search engine) has entered our
vocabulary. The sheer size and rapid rate of the internet’s growth mean that
no systems have been developed to provide comprehensive access to it. The
systems that do exist are embryonic and experimental, and the quality of the
information available on the web is variable. Attempts to estimate the rate of
the internet’s growth have included counting the number of domain names
over several years. There has been a dramatic increase in the number of domain
names since 1994, when only a small number were registered, rising to around
100 million at the start of 2001 and, ten years later, to almost 800 million in
2010 (Internet Systems Consortium, 2011).
These major changes – it is not too extreme to call it a revolution – raise
the question of how to keep the digital materials we decide are worth keeping.
The ever-increasing quantities being produced do not assist us in finding an
answer. ‘According to a recent study by market-research company IDC … the
Changing paradigms 9

size of the information universe is currently 800,000 petabytes. … but it’s just
a down payment on next year’s total, which will reach 1.2 million petabytes, or
1.2 zettabytes. If these growth rates continue, by 2020 the digital universe will
total 35 zettabytes, or 44 times more than in 2009’ (Tweney, 2010). Nor does
the rapidity with which changes in computer and information technology occur.
The challenges are new and complex for nearly all aspects of librarianship and
recordkeeping, including preservation.
There have also been changes in the ways in which information is pro-
duced and becomes available to communities of users. The internet is only one
of these ways. In the pre-digital (print) environment the processes of creation,
reproduction and distribution were separate and different; now, ‘technology
tends to erase distinctions between the separate processes of creation, repro-
duction and distribution that characterize the classic industrial model of print
commodities’ (Nurnberg, 1995, p.21). This has significant implications for
preservation, especially in terms of who takes responsibility for it and at what
stage preservation actions are first applied. For instance, in the industrial-mode
print world, acquiring the artifact – the book – so that it could be preserved
occurred by means such as legal deposit legislation, requiring publishers to
provide copies to libraries for preservation and other purposes. If the creator is
now also the publisher and distributor, as is often the case in the digital world,
who has the responsibility of acquiring the information? These points are noted
in more detail later in this book.
New ways of working and new structures are developing. Cyberscholarship
(known also as e-science or e-research) is based on ready access to digital mate-
rials and applies computing techniques to analyze, visualize and present results.
This research is typically highly collaborative, being based on the use of large
data sets produced and shared by international communities of scholars. The
practices developed in this cyberscholarship environment are significantly dif-
ferent from traditional practices. Other characteristics of cyberscholarship also
illustrate different practices. The enhanced ability to compute large quantities of
data, such as using visualizations and simulations, provide new possibilities,
some of which can be seen in the Electronic Cultural Atlas Initiative (ecai.org).
The generation of large quantities of data places heavy demands on how data are
stored and managed. Heavy emphasis is placed on sharing and re-using digital
information. All of these factors place different demands on how digital informa-
tion is managed, including on its preservation over time to ensure it remains
available and usable in the future. A 2008 study carried out for the Association
of Research Libraries provides examples of cyberscholarship in humanities, so-
cial sciences and scientific/technical/medical subject areas in the US (Maron and
Kirby Smith, 2008). Changing information practices in the humanities, in the
UK specifically, are described by Bulger and her colleagues (Bulger et al., 2011).
Cyberinfrastructure refers to the computer networks, libraries and archives,
online repositories and other resources needed to support cyberscholarship.
10 What is Preservation in the Digital Age?

These include easy-to-use, effective applications and services to locate, man-


age, analyze, visualize and store data, and sufficient numbers of people skilled
in managing large quantities of data. (Borgman (2007) provides an excellent
overview of cyberscholarship and cyberinfrastructure.) Much of the current
discussion about cyberscholarship and cyberinfrastructure is centered on the
roles that research libraries will play. Walters and Skinner (2011, p.5) see
research libraries being ‘repositioned as vibrant knowledge branches that reach
throughout their campuses to provide curatorial guidance and expertise for
digital content, wherever it may be created and maintained’. New structures
will continue to evolve and develop.

The need for a new preservation paradigm


The digital revolution is changing the professional practice of librarians, record-
keepers, and indeed all other information professionals. The paradigm shift in
how libraries, archives and information agencies conduct their activities can be
crudely (and naively) described as the change from acquiring, storing and pro-
viding access to information resources in physical forms, to acquiring, storing
and providing access to digital information resources. Preservation activities
are central to this new paradigm.
The pre-digital preservation paradigm is based on principles such as the
following:

– When materials are treated, the treatments should, when possible, be re-
versible
– Whenever possible or appropriate, the originals should be preserved; only
materials that are untreatable should be reformatted
– Library materials should be preserved for as long as possible
– Efforts should be put into preventive conservation, and aimed at providing
appropriate storage and handling of artifacts
– Benign neglect may be the best treatment (derived from Cloonan (1993,
p.596), Harvey (1993, pp.14,140), and Bastian, Cloonan and Harvey (2011,
pp.612-613)).

The definitions associated with the old preservation paradigm are firmly rooted
in the conservation of artifacts – the physical objects that carry the information
content. In fact, the term ‘materials conservation’ is sometimes used, especially
by museums. The definitions provided in the IFLA Principles for the Care and
Handling of Library Materials (Adcock, 1998), widely adopted in the library
and recordkeeping contexts, articulate principles firmly based on maintenance
of the physical artifact. The definition of Conservation notes that its aims are
to ‘slow deterioration and prolong the life of an object’, and that of Archival
The need for a new preservation paradigm 11

quality emphasizes the longevity and stability of materials in the words ‘a


material, product, or process is durable and/or chemically stable, that it has a long
life, and can therefore be used for preservation purposes’ (Adcock, 1998, p.4).
Medium/media is defined as ‘the material on which information is recorded.
Sometimes also refers to the actual material used to record the image’ (Adcock,
1998, p.5). The point here is that the old preservation paradigm considers
information content and the carrier as one and the same, although this changed
in the declining years of the old paradigm as large-scale copying programmes,
especially microfilming programmes, were implemented. To these definitions
we should add Restoration (from the 1986 version of the IFLA Principles):
‘Denotes those techniques and judgements used by technical staff engaged in
the making good of library and archive materials damaged by time, use and
other factors’ (Dureau and Clements, 1986, p.2).
Pre-digital paradigm thinking does not transfer well to the digital environ-
ment. This is easily illustrated. Taking just one example, the emphasis on
keeping the physical carrier (the diskette, CD, magnetic tape, hard drive, flash
drive) does not work because these carriers quickly become obsolete, are
closely linked to specific hardware and software drivers which also quickly
become obsolete, are easy to corrupt, and deteriorate rapidly (Bastian, Cloonan
and Harvey, 2011, p.611). Cox indicates how a different way of thinking is
needed in the recordkeeping community and suggests four elements of a new
preservation paradigm for electronic records. He notes, for instance, that record-
keepers ‘have long seen centralization or custody of records as crucial to their
work’, but this is not feasible for electronic records; transferring electronic
records to the custody of archives ‘may undermine their very long-term use’
(Cox, 2001, p.95).
As Marcum noted in her preface to a 2002 survey about the state of pres-
ervation programmes in American college and research libraries,

the information landscape has changed, thanks to the digital revolution. Libraries are
working to integrate access to print materials with access to digital materials. There is
likewise a challenge to integrate the preservation of analog and digital materials. Preser-
vation specialists have been trained to work with print-based materials, and they are
justifiably concerned about the increased complexity of the new preservation agenda
(Kenney and Stam, 2002, p.v).

Marcum’s description of the situation is still accurate ten years later. Addition-
ally research libraries are seeking new roles as experts in the curation of digital
materials (Walters and Skinner, 2011).
What is the new preservation agenda? How has the preservation paradigm
changed to accommodate it? Pre-digital preservation paradigm thinking does
include some useful understanding of digital preservation. For example, it
recognizes that copying (as in refreshing from tape to tape) is the basis of digital
12 What is Preservation in the Digital Age?

preservation; this recognition is encapsulated in British Standard BS4783 Part 2,


dating from 1988, which focuses on procedures for refreshing digital data by
copying them from magnetic tape to magnetic tape at regular intervals and on
the best storage and handling of these tapes. The old paradigm does not, how-
ever, engender an understanding of the complexity of copying – which is more
than simply preserving a bit-stream, but must take account of a wide range of
other attributes of the digital object that also need to be preserved.
In 1998 Hedstrom provided early recognition of the need for new preserva-
tion paradigms and of the enormity of the challenges faced, noting that ‘digital
preservation adds a new set of challenges for libraries and archives to the exist-
ing task of preserving a legacy of materials in traditional formats’ (Hedstrom,
1998, p.192). However, old-paradigm preservation thinking led to the charac-
terization of thinking about digital preservation as ‘a myopic focus on technical
problems (such as preserving digital objects) and a concomitant neglect of the
bigger picture (for example, public policy, among other issues)’ (Cloonan,
2001, p.232). Old-paradigm thinking does not suffice because not only are there
additional technical challenges, there are also new challenges resulting from
the quantities of digital information being produced. In a prescient comment
Hedstrom noted: ‘Our ability to create, amass, and share digital materials far
exceeds our current capacity to preserve even that small amount with continuing
value’ (Hedstrom, 1998, p.192). Specific examples from scientific areas, such
as determining long-term global climate change, and biomedicine, are pro-
vided in a 2003 report which contends that ‘much more digital content is avail-
able and worth preserving’ (Workshop on Research Challenges in Digital
Archiving and Long-term Preservation, 2003, p.2). The situation has not
changed significantly in the intervening years. New ways of thinking about
preservation and new skills are still needed.
At this point it is useful to consider some of the key elements of new para-
digm preservation thinking. The first of these is the need to actively maintain
digital information over time from the moment of its creation. Interruptions in
the management of a digital collection will mean that there is no collection left
to manage. Unlike most collections of physical objects, collections of digital
materials require ‘constant maintenance and elaborate “life-support” systems
to remain viable’ (Workshop on Research Challenges in Digital Archiving and
Long-term Preservation, 2003, p.7). In addition to these technical issues there
are other issues – social and institutional (political in the broadest sense) – for
‘even the most ideal technological solutions will require management and
support from institutions that go through changes in direction, purpose, and
funding’ (Workshop on Research Challenges in Digital Archiving and Long-
term Preservation, 2003, p.7). The three-legged stool model of digital preser-
vation, in which the legs represent technology, organization, and resources as
equally important components of supporting digital preservation, describes this
well (McGovern, 2007a).
Changing definitions 13

Further key elements are the scale and nature of the digital information we
wish to maintain into the future and the preservation challenges these pose.
The complexities of the variety of digital materials are described in this way:

Digital objects worthy of preservation include databases, documents, sound and video
recordings, images, and dynamic multi-media productions. These entities are created
on many different types of media and stored in a wide variety of formats. Despite a
steady drop in storage costs, the recent influx of digital information and its growing
complexity exceeds the archiving capacity of most organizations (Workshop on Research
Challenges in Digital Archiving and Long-term Preservation, 2003, p.7).

Arising from these key factors is the need for new kinds of skills. Current
preservation skills and techniques are labour-intensive and, even where ap-
propriate, do not scale up to the massive quantities of digital materials we are
already encountering. The problem cannot simply be addressed by technologi-
cal means. What kind of person will implement the new policies and develop
the new procedures required to maintain digital materials effectively into the
future? New kinds of positions, requiring new skill sets are already being es-
tablished in libraries and archives. Among key selection criteria for a Digital
Archivist at the MIT Libraries in Cambridge, Massachusetts, advertised in
May 2011, were:

– Demonstrated knowledge of digital archival and records management the-


ory and practice including issues related to intellectual property, content
management, access, and preservation
– Demonstrated knowledge of data storage methods, media and security
– Experience with digital repository platform(s) and with XML and digital
content creation/transformation tools
– Demonstrated knowledge of descriptive metadata standards including
MARC, DACS, Dublin Core and of data structure standards relevant to
archival control of digital collection material (examples: EAD, Dublin
Core, MODS, METS, PREMIS, or VRA Core)
– Experience with relational databases.

People with these skill sets are still in short supply. But it is more than new
skills that is required. We need to redefine the field of preservation and the
terms we use to describe preservation activities.

Changing definitions
Pre-digital preservation paradigm definitions do not convey useful meanings
when they are applied to digital preservation. What are they?
14 What is Preservation in the Digital Age?

Currently ‘conservation’ is the more specific term and is particularly used in relation to
specific objects, whereas ‘preservation’ is a broader concept covering conservation as
well as actions relating to protection, maintenance, and restoration of library collections.
The eminent British conservator, Christopher Clarkson, emphasizes this broader aspect
when he states that preservation ‘encompasses every facet of library life’: it is, he says,
‘preventive medicine ... the concern of everyone who walks into, or works in, a library.’
For Clarkson conservation is ‘the specialized process of making safe, or to a certain
degree usable, fragile period objects’ and ‘restoration’ expresses rather extensive rebuild-
ing and replacement by modern materials within a period object, catering for a future
of more robust use.’ He neatly distinguishes the three terms by relating them to the extent
of operations applied to an item: ‘restoration implies major alterations, conservation
minimal and preservation none’ (Harvey, 1993, pp.6-7).

These definitions are based on the assumptions that deteriorated materials or


artifacts can be made good by restoring them, and that we can slow down, per-
haps even halt, the rate of deterioration by taking appropriate measures, such
as paying attention to careful handling and high-quality storage. The principles
on which old-paradigm preservation practice is based, and in which these defi-
nitions are rooted, are almost entirely oriented towards artifacts – preserving
the media in which the information is stored. These principles were modified
as large-scale mass preservation treatments and practices, such as mass deacidi-
fication or microfilming, were implemented. For example, reformatting pro-
grammes (microfilming and photocopying) moved the paradigm towards a
recognition that information content could be preserved without also having to
preserve the original media on which the content was carried.
But these definitions simply do not work with digital materials. In a per-
ceptive conjecture about how preservation will change in the future, Cloonan
asked in 1993 ‘Which principles will be practiced? ... Will our very concept of
permanence change in the next generation ...? The answer ... is likely to be
“yes”’ (Cloonan, 1993, p.602). She suggested that our preservation concerns in
the future will be about the preservation of knowledge, not the preservation of
individual items, and that ‘we must continue to save as much information as
possible, regardless of the format or the means by which it is stored and dis-
seminated’ (Cloonan, 1993, p.603). Elsewhere, Cloonan conjectured that, while
it is easy to indicate what a library preservation programme consists of – ‘disaster
recovery planning, collection development policies, environmental controls,
integrated pest management, proper storage, physical treatment, reformatting,
migration, staff and user education, and the like’ – we are still not any closer to
what preservation is really about, what its ‘essence’ is. ‘Is preservation merely
a set of actions? Is it a way of seeing? Or a way of interpreting information? Is
copying preservation? Is reformatting?’ (Cloonan, 2001, pp.232-233).
The very stuff we seek to preserve needs better definition. Preserving the
media will not suffice; but what exactly is it that we want to preserve? Is it digi-
tal materials (as this book’s title suggests), digital objects, digital records, digital
Changing definitions 15

documents, data? We need definitions that can be commonly understood by


those who create, manage and use digital materials. The Digital Curation Centre’s
Curation Lifecycle Model (Digital Curation Centre, 2008) gives definitions for
the terms data, digital object and database that accommodate the range of digital
materials we seek to preserve. Data is ‘any information in binary digital form’
and includes digital objects and databases. Digital objects can be simple (‘such
as textual files, images or sound files, along with their related identifiers and
metadata’) or complex (‘made by combining a number of other digital objects,
such as websites’). Databases are ‘structured collections of records or data stored
in a computer system’.
Conway provided an early description of the changes from old to new
preservation paradigms and the development of new definitions in terms of
transformation, suggesting that consensus about ‘a set of fundamental princi-
ples that should govern the management of available resources in a mature
preservation program’ has been reached in the analogue world and that they
persist in the digital world. They ‘in essence, define the priorities for extending
the useful life of information resources. These concepts are longevity, choice,
quality, integrity, and accessibility’ (Conway, 2000, p.22).
These concepts have been transformed to accommodate the preservation of
digital information. Longevity has altered from a focus on extending the life of
physical media to one on ‘the life expectancy of the access system’. Choice
(selection of material to be preserved) is no longer a decision made later in the
life cycle of an item, but has become, for digital materials, ‘an ongoing process
intimately connected to the active use of the digital files’. Integrity, based on
‘the authenticity, or truthfulness, of the information content of an item’, no
longer has maintaining the physical medium as its primary emphasis, but now
is about developing procedures that allow us to ensure and be assured that no
changes have been made. In the digital world, access to the artifact is clearly no
longer sufficient; what is required, suggests Conway, is access to ‘a high quality,
high value, well-protected, and fully integrated digital product’ (Conway,
2000, p.27). These concepts continue to be explored further in relation to elec-
tronic records by the InterPARES Project (InterPARES, 1999-).
The changing paradigms and definitions also entail major shifts in the con-
ceptualization of preservation. Preservation implies alteration; Cloonan (2001,
p.235) refers to the ‘paradox’ of preservation: ‘it is impossible to keep things
the same forever. To conserve, preserve, or restore is to alter’. If we seek to
keep a digital object unchanged, we require, in effect, that the technology on
which it was developed to operate (or another technology designed to emulate
the original) is available. Over time this reduces access to the digital objects
we are attempting to maintain by preserving them. As Thibodeau, Moore and
Baru (2000, p. 113) suggest, ‘many users would not be pleased if, in order to
access digital objects that had been preserved across the last 30 years, they had
to learn to use the PL1 programming language or Model 204 database soft-
16 What is Preservation in the Digital Age?

ware’. An implication of this paradox is that we should accept some degree of


change in digital materials as we preserve them over time. The real issue is:
how much change is acceptable?
Change is also apparent in the ways that information professionals define
preservation. In a 2002 survey of records managers and archivists who deal
with electronic records, Cloonan and Sanett (2002, p.73) posed the questions:
‘What is the meaning of preservation? Does the meaning change when it is
applied to electronic rather than paper-based records?’. They noted that:

It is clear that professionals are revising their definitions of preservation from a once-
and-forever approach for paper-based materials to an all-the-time approach for digital
materials. Preservation must now accommodate both media and access systems …
while we once tended to think about preserving materials for a particular period of time
– for example, permanent/durable paper was expected to last for five hundred years –
we now think about retaining digital media for a period of continuing value (Cloonan
and Sanett, 2002, p.93).

Interviewees expressed dissatisfaction with the term digital preservation, sug-


gesting that other terms such as long-term retention are more suitable (Cloonan
and Sanett, 2002, p.74). Definitions of preservation provided by survey re-
spondents, Cloonan and Sanett (2002, p.85) concluded, ‘demonstrate a shift
taking place from defining preservation as a once-and-forever approach for
paper-based materials, to an all-the-time approach for digital materials’. This
shift implies an acceptance that preservation may even begin before a record has
been created.
Debate has continued about whether digital preservation sufficiently de-
scribes what needs to occur for digital materials to be accessible over time, on
the grounds that more than just aspects of preservation need to be encompassed
by whatever term is used. The terms digital curation and digital stewardship
have been proposed and are gaining acceptance because they describe more than
just preservation, referring also to ‘the creation, collection, organization [and]
dissemination’ of digital objects (Bastian, Cloonan and Harvey 2011, p.609).

Preservation definitions in the digital world


What, then, are viable definitions of preservation in a digital world? What
concepts do these definitions need to accommodate? What difficulties do we
encounter in trying to develop definitions that will assist us in our attempts to
preserve digital materials?
The first point to note is the dynamic nature of the field. We must be pre-
pared to change our paradigms and the definitions that we develop from them to
address its changing nature. In a discussion of how archivists’ thinking can bet-
ter inform digital preservation, Gilliland-Swetland (2000, p.v) comments that
Preservation definitions in the digital world 17

the paradigms of any of the information professions come up short when compared with
the scope of the issues continuously emerging in the digital environment. An overarching
dynamic paradigm – that adopts, adapts, develops, and sheds principles and practices
of the constituent information communities as necessary – needs to be created.

Our new definitions need to accommodate the idea of information being


preserved independent of the media on which it resides. This is now well
accepted; old preservation paradigm practices were well acquainted with it
through microfilming programmes. It is no longer a fact that the original has
‘more integrity and veracity than a copy’ (Cloonan, 2001, pp.236-237); instead,
in the digital world, we need to look further to define what attributes of digital
objects we wish to maintain over time.
Definitions should also accommodate the social and organizational aspects
of digital preservation, the ‘public policy, economic, political, social or educa-
tional perspective’ (Cloonan, 2001, p.238). Old-paradigm definitions certainly
recognized that there was more to preservation than the technical aspects – the
IFLA Principles suggest it also encompasses ‘managerial and financial con-
siderations, ... staffing levels [and] policies’ (Adcock, 1998, p.5) – but defini-
tions of digital preservation need to go much further. They must be extended
because of yet another factor, the need to start preserving digital materials
almost from the moment of their creation, and even, some suggest, before they
are created.
Preservation in the pre-digital paradigm was usually applied retrospec-
tively. Conservation procedures were applied to artifacts only after the artifact,
or the information contained in or on it, had been deemed to be of significance
and therefore worth preserving for use in the future. For example, books printed
before 1800 are typically considered to be significant because they are the
product of handcraft production techniques and, therefore, no two items are
identical; and some artifacts are preserved because they attain iconic status –
the Magna Carta, the US Declaration of Independence. We could also rely on
benign neglect, where lack of action did not usually harm the item (assuming
certain factors such as low use were in play) and did not significantly affect the
likelihood of its survival. This concept no longer works for digital materials.
The 3.5-inch diskette, once very common, provides a good example. Informa-
tion on it was likely to become unreadable for many reasons: the diskette may
have been stored in conditions too humid or too hot; the drive to read it may
have been superseded by newer technology and may no longer be readily
available; the driver software for that drive may no longer be easy to find. After
a period of time, the diskette is rendered unusable. Active preservation needs
to start close to the time of creation of digital information if there is to be any
certainty that that information will be accessible in the future.
Other concepts need to be accommodated by the new definitions. For digital
materials ‘their preservation must be an integral element of the initial design of
systems and projects’ (Ross, 2000, p.13), but this is not usually the case. Digital
18 What is Preservation in the Digital Age?

materials exist in a bewilderingly large number of formats; there is still little


standardization. The most significant concept is that the preservation of digital
materials is much more than the preservation of information content or physical
carrier:

it is about preserving the intellectual integrity of information objects, including capturing


information about the various contexts within which information is created, organized,
and used; organic relationships with other information objects; and characteristics that
provide meaning and evidential value (Gilliland-Swetland, 2000, p.29).

Preserving the original bit-stream is only one part of the problem; equally im-
portant is the requirement to preserve ‘the means of interpreting, reading and
utilizing the bit stream’ (Deegan and Tanner, 2002).
The difficulties of definition are not helped by disciplinary differences.
There are, for instance, differences in the way archivists and librarians use terms.
Some terms, such as integrity and authenticity, arise from the world of archives
and were not, until recently, usually associated with the work of librarians.
These differences, however, pale in comparison with the significantly different
definitions used in the IT industry. How IT professionals think about the long-
term storage of data is a question that assumes importance for digital preserva-
tion because of the heavy reliance that information professionals place on their
skills and services. There is abundant evidence that they think very differently
about preservation. Definitions of archive, archiving and archival storage give
us some indication of the mindset of IT professionals. A selection of online
dictionaries of information technology indicate that the terms are used in two
ways:

1. The process of moving data to a different kind of storage medium: for ex-
ample, ‘archive … 2 verb to put data in storage … on backing storage (such
as magnetic tape rather than a hard disk)’ (Collins, 2002)
2. The process of backing up data for long-term storage: for example, ‘ar-
chive (v.) To copy files to a long-term storage medium for backup … On
smaller systems archiving is synonymous with backing up’ (Webopedia,
2011).

Few of the definitions located display any interest or concern with the reasons
why long-term storage might be required, although one earlier definition is a
notable exception: ‘archiving Long term storage of information on electronic
media. Information is archived for legal, security or historical reasons, rather
than for regular processing or retrieval’ (Gunton, 1993, p.11). Perhaps the mind-
set of IT professionals is better indicated by this excerpt: ‘You detect data that’s
not needed online and move it an off-shore store. When someone wants to use
it, go find the off-line media and restore the data’ (Faulds and Challinor, 1998,
p.280).
Preservation definitions in the digital world 19

There is no indication in these definitions of the period of time that long-


term refers to, yet this is a crucial point for those who are concerned with preser-
vation. While it is not especially helpful to define long-term in terms of a spe-
cific number of months or years, some awareness of the problems is required
in the definition, as the OAIS (Open Archival Information System) Reference
Model’s definition of long-term indicates:

A period of time long enough for there to be concern about the impacts of changing
technologies, including support for new media and data formats, and of a changing user
community, on the information being held in a repository (International Organization
for Standardization, 2003, p.1-11).

Definitions have changed as we come to know more about how to preserve


digital objects, as the example of time – how long we want to preserve material
for – illustrates. Long-term, initially incorporating elements of old-paradigm
thinking of indefinitely, or as long as possible – as in ‘digital preservation
means retaining digital image collections in a usable and interpretable form for
the long term’ (Kenney and Rieger, 2000, p.135) – is now more commonly
defined in the preservation community, in terms derived from the archival
community, as the period during which the information remains of value. Such
a definition is more helpful than the commonly encountered phrase ‘over time’
– as in ‘ensuring the integrity of information over time’ (Gilliland-Swetland,
2000, p.22), and ‘digital preservation [is] the processes and activities which
stabilize and protect reformatted and “born digital” authentic electronic mate-
rials in forms which are retrievable, readable, and usable over time’ (Cloonan
and Sanett, 2002, p.95). Selected definitions from the influential publications
Preservation Management of Digital Materials: The Handbook (Digital Pres-
ervation Coalition, 2008) and the UNESCO Guidelines for the Preservation of
Digital Heritage (UNESCO, 2003) are presented in Figure 1.1.

Preservation Management of Digital Guidelines for the Preservation of Digital


Materials (Digital Preservation Heritage (UNESCO, 2003)
Coalition, 2008)
Access ... continued, ongoing usability of a Accessibility The ability to access the es-
digital resource, retaining all qualities of sential, authentic meaning or purpose of a
authenticity, accuracy and functionality digital object (p.157)
deemed to be essential for the purposes the Digital materials cannot be said to be pre-
digital material was created and/or acquired served if access is lost. The purpose of
for (p.24) preservation is to maintain the ability to
present the essential elements of authentic
digital materials (p.21)
20 What is Preservation in the Digital Age?

Preservation Management of Digital Guidelines for the Preservation of Digital


Materials (Digital Preservation Heritage (UNESCO, 2003)
Coalition, 2008)
Authenticity The digital material is what it Authenticity Quality of genuineness and
purports to be. In the case of electronic trustworthiness of some digital materials, as
records, it refers to the trustworthiness of being what they purport to be, either as an
the electronic record as a record. In the case original object or as a reliable copy derived
of ‘born digital’ and digitized materials, it by fully documented processes from an
refers to the fact that whatever is being original (p.157)
cited is the same as it was when it was first
created unless the accompanying metadata
indicates any changes. Confidence in the
authenticity of digital materials over time is
particularly crucial owing to the ease with
which alterations can be made (p.24)
Digital heritage Those digital materials that
are valued sufficiently to be retained for fu-
ture access and use (p.157)
Digital Materials A broad term encom- Digital materials is generally used here as
passing digital surrogates created as a result a preferred term covering items of digital
of converting analogue materials to digital heritage at a general level. In some places,
form (digitization), and ‘born digital’ for digital object or digital resource have also
which there has never been and is never been used. These terms have been used in-
intended to be an analogue equivalent, and terchangeably and generically (p.20)
digital records (p.24)
Digital Preservation Refers to the series of Digital preservation The processes of
managed activities necessary to ensure con- maintaining accessibility of digital objects
tinued access to digital materials for as long over time (p.157)
as necessary. Digital preservation is defined … is used to describe the processes in-
very broadly for the purposes of this study volved in maintaining information and other
and refers to all of the actions required to kinds of heritage that exist in a digital form.
maintain access to digital materials beyond In these Guidelines, it does not refer to the
the limits of media failure or technological use of digital imaging or capture techniques
change ... (p.24) to make copies of non-digital items, even if
that is done for preservation purposes …
(p.20)
Information Packages ... Preservation de-
pends on maintaining digital objects and
any information and tools that would be
needed in order to access and understand
them. Together, these can be considered to
form an information package that must be
managed either as a single object or as a vir-
tual package (with the object and associated
information tools linked but stored sepa-
rately) (p.39)
What exactly are we trying to preserve? 21

Preservation Management of Digital Guidelines for the Preservation of Digital


Materials (Digital Preservation Heritage (UNESCO, 2003)
Coalition, 2008)
Preservation program The set of arrange-
ments, and those responsible for them, that
are put in place to manage digital materials
for ongoing accessibility (p.158).
… is used to refer to any set of coherent ar-
rangements aimed at preserving digital ob-
jects (p.20)
Figure 1.1: Selected Definitions (From Digital Preservation Coalition, 2008; UNESCO,
2003)

These definitions assist us by providing useful starting points for an extended


discussion of digital preservation. In particular, they address some significant
questions that we need guidance on:

– What exactly are we trying to preserve?


– How long are we preserving them for?
– What strategies and actions do we need to apply?

What exactly are we trying to preserve?


One of UNESCO’s thematic areas is culture, and, within that, heritage. There-
fore the UNESCO Guidelines are primarily concerned with digital heritage,
‘those digital materials that are valued sufficiently to be retained for future
access and use’. Although this statement is too general to help us decide pre-
cisely what it is we want to preserve, it does introduce the essential concept of
selection, of deciding value – in this case, that digital material on which high
value is placed. This is the subject of Chapter 4.
More specific in the UNESCO Guidelines and in Preservation Management
of Digital Materials: The Handbook are the definitions of digital materials –
the specific digital items, objects or resources that we are concerned with. Both
publications use digital materials, suggesting a high level of consensus; this
term will, therefore, be adopted in this book, despite its suggestion of pre-digital
paradigm thinking in the physicality of the word materials, as things with a
physical presence. Both sets of definitions categorically state that they are not
concerned with the use of digitizing of analogue materials as a preservation
technique. The Digital Preservation Coalition’s handbook (2008, pp.24-25)
‘specifically excludes the potential use of digital technology to preserve the
original artefacts through digitisation’, and the UNESCO Guidelines (2003,
p.20) are equally adamant, stating that digital preservation ‘does not refer to
the use of digital imaging or capture techniques to make copies of non-digital
22 What is Preservation in the Digital Age?

items, even if that is done for preservation purposes’. Such statements are
worth making firmly because of the misconception still too commonly encoun-
tered in the information professions that digitizing of analogue materials, usually
photographs or paper-based material, is sufficient for preservation purposes.
This is not the case, as a 2010 report of LIBER members (Ligue des Biblio-
thèques Européennes de Recherche, representing European research libraries)
indicates:

Making the digitised material available and visible online is only one of the challenges
faced ... Another lies in assuring long-term access to them. Digitised materials í like
other digital data í are also fragile items and need special measures and arrangements
in order to be accessible despite technological change. While the preservation of paper
documents is well understood and is supported by a well-established infrastructure and a
profession of librarians and other experts, the preservation of digital objects in general
and digitised material in particular is a relatively new task for libraries and poses great
challenges in terms of the expertise and resources required (Bergau, 2010, p.6).

In terms of how they are preserved, though, the definitions in these two sources
make no distinction between born-digital materials and digital materials created
by digitizing analogue materials. This is acknowledged in the Digital Preserva-
tion Coalition’s definition of digital materials, which covers both ‘digital sur-
rogates created as a result of converting analogue materials to digital form
(digitisation), and “born digital” for which there has never been and is never
intended to be an analogue equivalent, and digital records’ (Digital Preservation
Coalition, 2008, p.24). These definitions also make clear that it is not only the
bit-stream that we seek to preserve. In order to ensure access in the future to
digital materials, we also need to take account of other attributes of digital
materials. The UNESCO Guidelines indicate this in the definition of information
packages, which comes from the OAIS Reference Model in Chapter 5. In addi-
tion to the bit-stream, which is typically ‘not understandable or re-presentable’
by itself, ‘any information and tools that would be needed in order to access
and understand’ the digital materials must also be preserved (UNESCO, 2003,
p.39).
The definitions are also very clear about the need to maintain other attributes
of digital materials. To ensure that digital materials remain usable in the future,
access to them is required – and not simply access, but access to ‘all qualities
of authenticity, accuracy and functionality’ (Digital Preservation Coalition,
2008, p.24). This, in turn, requires definitions of authenticity, expressed by the
UNESCO Guidelines as the ‘quality of genuineness and trustworthiness of
some digital materials, as being what they purport to be, either as an original
object or as a reliable copy derived by fully documented processes from an
original’ (UNESCO, 2003, p.157). (Note the emphasis on the significance of
full documentation to ensure authenticity; this has important implications for
digital preservation, noted in Chapter 5.) Four further definitions in the
How long are we preserving them for? 23

UNESCO Guidelines clarify and emphasize these requirements (see Figure


1.2). Digital materials can be considered as physical objects, logical objects, or
conceptual objects. The physical object is the artifact (for example, the disk-
ette, the CD, or the magnetic tape whose physical characteristics store in or on
it the bit-stream – that is, the logical object). These are given sense when they
are used by humans and are labeled conceptual objects – ‘what we deal with in
the real world’ (Thibodeau, 2002, p.8). Essential elements (now more commonly
known as significant properties) of digital materials enable us to re-present the
materials in the manner in which they were originally intended.

Guidelines for the Preservation of Digital Heritage (UNESCO, 2003)

Conceptual objects Digital objects as humans interact with them in a human-


understandable form (p.157)
Essential elements The elements, characteristics and attributes of a given digital object
that must be preserved in order to re-present its essential meaning or purpose. Also
called significant properties (pp.157-158)
Logical objects Digital objects as computer encoding, underlying conceptual objects
(p.158)
Physical objects Digital objects as physical phenomena that record the logical encoding,
such as polarity states in magnetic media, or reflectivity states in optical media (p.158)

Figure 1.2: Selected Definitions (From UNESCO, 2003)

How long are we preserving them for?


Although the definitions in the UNESCO Guidelines do not provide specific
guidance about the length of time we preserve digital materials, the Digital
Preservation Coalition’s handbook assists us with its articulation of long-term,
medium-term and short-term preservation. Long-term preservation aims to
provide indefinite access to digital materials, or at least to the information con-
tained in them. Continued access to digital materials for a defined time (but not
indefinitely) is medium-term preservation: here, the time period is long enough
to encompass changes in technology. Short-term preservation is, in part, defined
by changes in technology: access to digital materials is maintained until tech-
nological changes make it inaccessible, or for a period during which the material
is likely to be in use but which is relatively short (Digital Preservation Coali-
tion, 2008, p.25). Definitions like these provide helpful ways of thinking about
digital preservation programmes, for example, about resource allocation and
the long-term resource implications of embarking on long-term preservation.
24 What is Preservation in the Digital Age?

What strategies and actions do we apply?


The definitions in these two sources make us aware, in a general sense, of the
components of a digital preservation programme. In order to achieve the aim of
a digital preservation programme (‘maintaining accessibility of digital objects
over time’ (UNESCO, 2003, p.20), or ‘to ensure continued access to digital
materials for as long as necessary’), various processes forming a ‘series of
managed activities’ (Digital Preservation Coalition, 2008, p.24) are required.
They need to form ‘a set of coherent arrangements’ (UNESCO, 2003, p.20).
These are noted in Chapters 7 to 10.

Conclusion
This chapter has introduced some of the key concepts that are reshaping preser-
vation practice in the digital environment. It notes the need for new ways of
thinking about preservation and poses three key questions that need to be con-
sidered when we think about the preservation of digital materials:

– What exactly are we trying to preserve?


– How long are we preserving them for?
– What strategies and actions do we need to apply?

These questions and other themes introduced in Chapter 1 are explored in the
rest of this book.
Chapter 2
Why do we Preserve? Who Should do it?

Introduction
Society, of course, has a vital interest in preserving
materials that document issues, concerns, ideas, dis-
course and events ... The ability of a culture to survive
into the future depends on the richness and acuity of
its members’ sense of history (Task Force on Archiv-
ing of Digital Information, 1996, p.1)

Preservation is commonly perceived to be the responsibility of large, well-


resourced institutions such as national libraries and archives, state libraries, and
university and research libraries. This perception is no longer valid in the digital
age. It may have been a legitimate view in the days when expensive conserva-
tion laboratories were considered a necessary requirement for a successful
preservation programme and when computer installations were expensive and
few in number, but the current reality is very different. Documentary materials
in digital form are now being created at all levels of society. Responsibility for
the preservation of these digital materials must be shared among creators and
users of digital information, and not remain solely the concern of librarians
and archivists.
This chapter investigates three questions:

– Why should digital materials be preserved?


– Who has responsibility for their preservation?
– How significant is the problem of digital preservation?

While the continuing roles of the institutions traditionally identified as respon-


sible for preservation – libraries, archives and museums – are noted, attention
is also paid to the roles of the much wider range of stakeholders who must also
participate if digital preservation is to be effective.

Why preserve digital materials?


The reasons for preserving knowledge are variously described in terms of duties,
obligations and benefits. Preservation is based on the notion that, because man
learns from the past, ‘evidence of the past therefore has considerable signifi-
cance to the human race and is worth saving’ (Harvey, 1993, p.6). Not only is
26 Why do we Preserve? Who Should do it?

preservation worth doing, it is also, some suggest, a duty. Agresto, former head
of the US National Endowment for the Humanities, suggested that ‘we have a
human obligation not to forget’ (cited in Harvey, 1993, p.7) and that preserva-
tion is essential for the well-being of democracies that depend ‘on knowledge
and the diffusion of knowledge’ and on ‘knowledge shared’ (Harvey, 1993,
p.7). Even greater claims are made: ‘the ability of a culture to survive into the
future’ depends on the preservation of knowledge (Task Force on Archiving of
Digital Information, 1996, p.1).
The cultural and political imperatives that have led to preservation being
considered as fundamental have been explored in books, such as Lowenthal’s
The Past is a Foreign Country (Lowenthal, 1985) and Taylor’s Cultural Selec-
tion (Taylor, 1996), which persuade us that preservation is not simply the con-
cern of a limited number of cultural heritage institutions and professions, but
has dimensions that have significant impact, both limiting and sustaining, on
most aspects of society. There is, in fact, no single reason why we preserve
knowledge. Preservation, suggests Cloonan (2001, p.231), ‘has a life force
fueled by many (often disparate) sources’.
None of these reasons change when we consider the preservation of
knowledge encoded in digital materials, but the rhetoric alters to emphasize
economic rationales. Preserving digital materials is essential. If we do not attend
to it ‘what is at stake is the loss of data representing billions of dollars of in-
vestment in new intormation technology, new scientific discoveries, and new
information on which our economic prosperity and national security depend’
(NDIIPP, 2011, p.1). Evidential and accountability reasons are also commonly
given: ‘we expect that this [digital] content will remain accessible to allow us to
validate claims, trace what we have done, or pass a record to future generations’,
states the NSF-DELOS Working Group on Digital Archiving and Preservation
(2003, p.[i]), who also specify five conditions for preservation, any one of
which is sufficient to provide a benefit to society:

– If unique information objects that are vulnerable and sensitive and therefore subject
to risks can be preserved and protected;
– If preservation ensures long-term accessibility for researchers and the public;
– If preservation fosters the accountability of governments and organisations;
– If there is an economic or societal advantage in re-using information, or
– If there is a legal requirement to keep it (NSF-DELOS Working Group on Digital
Archiving and Preservation, 2003, p.3).

Considerable attention is being directed to the need to preserve scientific data.


‘Data are the foundation on which scientific, engineering, and medical knowl-
edge is built’, notes the Committee on Ensuring the Utility and Integrity of
Research Data in a Digital Age (2009, p.ix). Their report examines in detail
how digital technologies have changed scientific research, leading to the genera-
tion of very large quantities of data that need to be accessible into the future.
Professional imperatives 27

There is also a widespread appreciation that the preservation scene is chang-


ing in significant ways. Lyman and Kahle (of Internet Archive fame) note that
recordkeeping, which, combined with archival preservation, is the basis of
historical memory, was greatly facilitated by print, and institutions such as uni-
versities, publishers and museums collect, organize and preserve ‘the historical
memory that gives culture continuity and depth’. But this is changing:

What are, and will be, the social contexts and institutions for preserving digital docu-
ments? Indeed, what new kinds of institutions are possible in cyberspace, and what
technologies will support them? What kind of new social contexts and institutions
should be invented for cyberspace? (Lyman and Kahle, 1998).

Such new social contexts are emerging and their digital content is deemed
worth preserving. The Library of Congress’s work in preserving Twitter content
(Watters, 2011) and the Schlesinger Library’s in preserving blogs (Dunn, 2009)
are two examples.
The very aims of preservation are also being questioned – ‘What are we
preserving? For whom? And why?’ (NSF-DELOS Working Group on Digital
Archiving and Preservation, 2003, p.2) – and the expanded number of stake-
holders in the digital age means that a range of different interests must be con-
sidered.

Professional imperatives
What do these changes mean for libraries and archives? Have there been signifi-
cant changes in their practices?
At one level, there has been little change. Libraries are still, to use Deanna
Marcum’s words, ‘society’s stewards of cultural and intellectual resources’
(Kenney and Stam, 2002, p.v). Preservation is nothing less than core business
for libraries who maintain collections for use in the future. One typical view of
the preservation role of libraries is Gorman’s statement:

Libraries have a duty to preserve and make available all the records of humankind.
That is a unique burden. No other group of people has ever been as successful in pre-
serving the records of the past and no other group of people has that mission today ... Let
there be no mistake: if we librarians do not rise to the occasion, successive generations
will know less and have access to less for the first time in human history. This is not a
challenge from which we can shrink or a mission in which we can fail (Gorman, 1997).

The arguments for preservation as a core activity in libraries are traditionally


articulated as maintaining the collection for access for a period of time (which
varies according to the type of library and the number of years material is
required to remain accessible), as making good sense economically by allowing
28 Why do we Preserve? Who Should do it?

items to be used longer before they wear out, and by the ‘just in case’ argument:
‘It cannot easily be predicted what will be of interest to researchers in the
future. Preserving current collections is the best way to serve future users’
(Adcock, 1998, p.8).
Similarly, there has been no fundamental change in archivists’ secure under-
standing of their preservation responsibilities. They have typically placed the
physical care of their collections at least on a par with, if not at a higher level
of importance than, the provision of access to those collections. This ‘physical
defence of archives’ was indeed considered paramount by the British archivist
Sir Hilary Jenkinson, who formulated this influential statement in 1922:

The duties of the Archivist ... are primary and secondary. In the first place he has to
take all possible precautions for the safeguarding of his Archives and for their custody
... Subject to the discharge of these duties he has in the second place to provide to the
best of his ability for the needs of historians and other research workers. But the position
of primary and secondary must not be reversed (Jenkinson, 1965, p.15).

There is ample evidence in the archives literature to support the generalization


that archivists have thought more thoroughly about their professional practice
and have articulated it more clearly than have librarians. Gilliland-Swetland
examines how archival principles provide powerful ways of thinking about the
preservation of digital materials. ‘Implementing the archival perspective in the
digital environment’, she suggests, encompasses

– Working with information creators to identify requirements for the long-term man-
agement of information;
– Identifying the roles and responsibilities of those who create, manage, provide ac-
cess to, and preserve information
– Ensuring the creation and preservation of reliable and authentic materials;
– Understanding that information can be dynamic in terms of form, accumulation,
value attribution, and primary and secondary use; …
– Identifying evidence in materials and addressing the evidential needs of materials
and their users through archival appraisal, description, and preservation activities
(Gilliland-Swetland, 2000, p.21).

Nor should we forget the specific legal reasons for preservation. In the case of
archives these reasons are often connected to administrative and political ac-
countability. For some types of libraries, statutory responsibilities require that
preservation is their core business. National libraries, for example, have a stat-
utory responsibility for collecting and safeguarding access to information pub-
lished in their countries.
While the traditional preservation responsibilities of libraries and archives
may remain the same, there has been significant change in the ways in which
they are interpreted and operationalized as digital materials have become prev-
New stakeholders 29

alent. In their report about repositioning research libraries Walters and Skinner
(2011, p.57) state firmly that

very few research libraries should have more than half of their infrastructure devoted to
physical collections at this point in time. The library needs to think of digital curation as a
core function of the library and to invest financial and other resources into it accordingly.

New expertise and new perspectives are required, without discarding the prin-
ciples of preservation developed for non-digital materials. Of greatest signifi-
cance is the need to engage with other stakeholders and ‘form new alliances and
partnerships’ (Webb, 2000). These new stakeholders may not always embrace
engagement willingly and will often need to be convinced of their roles, as
Hilton, Thompson and Walters (2010) point out when writing of donations of
digital material to the Wellcome Library in the UK.
Smith cogently summarizes the concepts in this section:

Society has always created objects and records describing its activities, and it has con-
sciously preserved them in a permanent way … Cultural institutions are recognised
custodians of this collective memory: archives, librar[ies] and museums play a vital
role in organizing, preserving and providing access to the cultural, intellectual and his-
torical resources of society. They have established formal preservation programs for
traditional materials and they understand how to safeguard both the contextual circum-
stances and the authenticity and integrity of the objects and information placed in their
care … It is now evident that the computer has changed forever the way information is
created, managed, archives and accessed, and that digital information is now an integral
part of our cultural and intellectual heritage. However the institutions that have tradi-
tionally been responsible for preserving information now face major technical, organiza-
tional, resource, and legal challenges in taking on the preservation of digital holdings
(B. Smith, 2002, pp.133-134).

New stakeholders
The new challenges of digital preservation call for the involvement of new par-
ticipants. No longer are librarians and archivists the main groups concerned
with preserving digital materials: it is increasingly evident that the cultural heri-
tage institutions traditionally charged with responsibility for preserving materials
cannot continue to carry this responsibility in the digital age without widening
the range of partners in their endeavours. Scholars and scientists who, increas-
ingly, base their research on large data sets, drug companies who need to prove
ownership of intellectual property, lawyers who must keep secure evidence in
digital form, are but a few of myriad potential stakeholders.
Not only are new kinds of stakeholders claiming an interest or claiming
control, but higher levels of collaboration among stakeholders are also com-
monly understood to be necessary for digital preservation to be effective.
30 Why do we Preserve? Who Should do it?

Narrowly focused localized solutions are not considered likely to be the most
effective. Cooperation ‘can enhance the productive capacity of a limited sup-
ply of digital preservation funds, by building shared resources, eliminating
redundancies, and exploiting economies of scale’ (Lavoie and Dempsey, 2004).
The preservation of digital materials has become ‘essentially a distributed proc-
ess’ where ‘traditional demarcations do not apply’ and one for which ‘an inter-
disciplinary approach is necessary’ (Shenton, 2000, p.164).
Collaboration is considered more and more as the only way in which viable
and sustainable solutions can be developed, as the problems are well beyond
the scope of even the largest and most well-resourced single institution.
(UNESCO, 2003, Chapter 11 explores collaboration in more detail, and Chap-
ter 9 of this book provides examples of collaborative activities).
Who, more specifically, are these new stakeholders? What are their preser-
vation roles in an increasingly digital environment? An early indication was
provided by the Task Force on Archiving of Digital Information, whose in-
fluential 1996 report set much of the digital preservation agenda for the fol-
lowing decade. This report suggested that ‘intense interactions among the
parties with stakes in digital information are providing the opportunity and
stimulus for new stakeholders to emerge and add value, and for the relation-
ships and division of labor among existing stakeholders to assume new forms’.
It proposed two principles, the first that information creators, providers and
owners ‘have initial responsibility for archiving their digital information ob-
jects and thereby assuming the long-term preservation of these objects’, and
the second that, where this mechanism fails or becomes unworkable, ‘certified
digital archives have the right and duty’ to preserve digital materials (Task
Force on Archiving of Digital Information, 1996, pp.19-20). Since 1996 the
landscape of digital preservation has become clearer and we now see significant
levels of collaboration, strong emphasis on and involvement of data creators as
first-line preservers, and the development of certified digital archives (trusted
digital repositories).
In addition to data creators and certified digital archives, there are other
new stakeholders. They include commercial services, government agencies,
individuals, rights holders, beneficiaries, funding agencies, and users (Hodge and
Frangakis, 2004, p.15). ‘Hardware and software developers, publishers, produc-
ers, and distributors of digital materials as well as other private sector partners’
(UNESCO, 2004, Article 10) can also be added to the list. All stakeholders
are learning how to work together, learning first to understand the languages
of other disciplines and then working out how complementary skills can fit
together. Collaboration was of course not unknown in the old preservation
paradigm, one example being the collaboration of scholars and librarians to
identify the core literature in specific discipline areas for microfilming and
scanning projects (see Gwinn (1993) for an example in agriculture). The extent
and nature of collaborative activity has, however, intensified. It is encapsulated
New stakeholders 31

in the ‘Community Watch and Participation’ action in the influential DCC


Curation Lifecycle Model (Digital Curation Centre, 2008) where collaboration
to develop shared standards, tools and software is specifically noted. Chapter 9
notes many examples of collaborative digital preservation activities.
The role of scientists and scholars as stakeholders in digital preservation is
increasingly recognized. The way in which scholars’ work is being transformed,
most notably in the sciences but also in the social sciences and the humanities,
is well documented. The editorial of a 2011 issue of D-Lib Magazine notes that
‘The management of research data in a digital networked world is increasingly
recognized as a significant challenge, a significant opportunity, and absolutely
essential to the conduct of scientific research in the 21st century’ (Lannom,
2011). The predominantly data-driven approaches to science create and use
very large data sets and, consequently, increasing attention is being paid to the
long-term preservation of scientific data (see, for example, the DCC SCARP
project (www.dcc/ac/uk/projects/scarp) and the Alliance for Permanent Access
(www.alliance permanentaccess.org)).
The dissemination of scholarly output provides an example of the trans-
formation of scholarship. Many scholars no longer rely solely on the formal
mechanisms of printed publications for disseminating their research. They
place more and more emphasis on other mechanisms, such as pre-print archives
in high-energy physics and in mathematics, institutional repositories (often
university-based), the development of web sites and social media sites based
around communities of scholars, and the requirements of scholarly journals
that data supporting published research are deposited in a public archive, all of
which have implications for the preservation of their scholarly output.
Digital preservation is increasingly also of concern to individuals. Personal
information – ‘social and personal memory’ (B. Smith, 2002, p.135) – is being
created and stored in digital form on digital media such as CDs and flash
drives ‘with the mistaken belief that this will ensure that those memories will
always be available to them for consultation’ (B. Smith, 2002, p.135). As recog-
nition of the precariousness of personal digital materials grows, so too does the
interest of individuals in understanding the issues and responding to them.
Individuals have been identified as essential to any discussion of digital pres-
ervation. How we archive our personal digital materials has become a topic of
scholarly investigation (see, for example, a synthesis of the outcomes of the
Digital Lives research project that ran from 2007 to 2009, which indicates the
extent of the research and literature about this field (John et al., 2010)), the focus
of conferences (Personal Digital Archiving conferences were held in 2010 and
2011), the topic of publications (for example, Lee, 2011) and the target of advice
(the Library of Congress ran a Personal Archiving Day in 2011 and provides
advice (www.digitalpreservation.gov/you) for keeping personal information in
digital form).
32 Why do we Preserve? Who Should do it?

The keeping of personal correspondence provides an informative illustra-


tion of the major changes that are occuring. Its value as a source of historical
information has long been recognized, but the widespread shift from writing
letters to the use of email has diminished the likelihood that personal corre-
spondence will remain accessible to future historians. Lukesh asked in 1999,
‘Where will our understandings of today and, more critically, the next century be,
if this rich source of information is no longer available?’, as scientists, scholars,
historians and almost everyone increasingly use email (Lukesh, 1999). This
concern has continued to be expressed in actions to archive emails, for example,
at Harvard University (Goethals and Gogel, 2010). Pre-digital paradigm pres-
ervation conventions mean that creators of emails expect librarians and archi-
vists to collect and maintain materials ascertained to be of long-term value,
usually well after the time of the materials’ creation. But because emails, like all
digital materials, become inaccessible quickly, for reasons given in Chapter 3,
the determination of long-term value cannot be made well after the time the
emails were created. With digital materials, suggests Smith, ‘the critical depend-
ency of preservation on good stewardship begins with the act of creation, and
the creator has a decisive role in the longevity of the digital object’ (Smith,
2003, pp.2-3). For most creators of information in any form this is a new role.
Publishers are another stakeholder group whose responsibilities and roles
change as more of their output is distributed in digital form. Some national
libraries have developed cooperative arrangements with publishers to ensure
that preservation responsibilities for digital publications are understood and
shared. One example is the 2002 agreement between Elsevier Science and
the Koninklijke Bibliotheek (the National Library of the Netherlands)
through which the Library receives digital copies of all journals on Elsevier’s
ScienceDirect web platform. The level of participation by publishers in
CLOCKSS, a cooperation of publishers and libraries to preserve e-journals
(noted in more detail in Chapter 9), indicates increasing understanding of digital
preservation issues by publisher stakeholders.
Doubts have been expressed about the interest and willingness of for-profit
organizations to participate in digital preservation initiatives. Search engine
companies, for instance, ‘are not in the business of long-term archiving of the
web or even a portion of it, nor should they be expected to take on this respon-
sibility’. The entertainment industry is increasingly digital, and its products,
audio and video, have a well-established place as ‘critical resources for research,
historical documentaries, and cultural coherence resources’. Even given the
prevailing market-driven political ethos, it is difficult to envisage a situation
where market forces will be sufficient to ensure the preservation of this digital
material. The opposite is more likely to apply: ‘in some cases market forces
work against long-term preservation by locking customers into proprietary
formats and systems’ (Workshop on Research Challenges in Digital Archiving
and Long-term Preservation, 2003, pp.x-xi).
How much data have we lost? 33

How much data have we lost?


It is helpful to have some understanding of how extensive the preservation
problem is, although estimates must remain very inexact. We know something
of its parameters for the traditional artifacts, mainly paper-based, that make up
the collections of libraries and archives. Studies of paper deterioration have
been carried out since the 1950s, after Barrow’s influential study Deterioration
of Book Stock: Causes and Remedies (Barrow, 1959) raised alarm within the
library community. Later studies refined the conclusions of this study. Although
the evidence allows only general statements to be made, it is commonly
thought that paper embrittlement affects in the order of 30 per cent of the col-
lections of large research libraries in the United States. (This is explained in
more detail in Harvey, 1993, pp.9-10.) Nor is the problem of deterioration of
the artifacts typically found in traditional collections limited to paper. All
materials deteriorate: photographs, nitrate film, and cellulose acetate film are
but a few examples.
The concern of this book, however, is with digital materials. What is the
extent of the preservation problem for these materials? The dramatic increase
in the volume of digital materials (noted in Chapter 1) means that there is
much more digital content worth preserving. It follows, therefore, that the
quantity of what is worth preserving is also increasing dramatically, and our
ability to manage and preserve them is far outpaced by the rate of increase. It
was suggested in 2002 that the last 25 years have been a ‘scenario of data loss
and poor records that has dogged our progress’ and that, if this is not reversed,
‘the human record of the early 21st century may be unreadable’ (Deegan and
Tanner, 2002).
Rhetoric of this kind is common in the literature, but is regrettably poorly
supported with specifics and evidence. Alarmist descriptions abound: there
will be ‘a digital black hole … truly a digital dark age from which information
may never reappear’ (Deegan and Taylor, 2002) if we do not address the
problem, and we will become an amnesiac society. (An early use of this
evocative term in relation to digital materials was by Sturges (1990), whose
comments are still worth reading today.) Although we may be in ‘the best
documented era in history’ much of the documentation of the era, according to
some, has been lost – ‘the first email message, chat group session, and web
site’ (Vogt-O’Connor, 1999) are gone.
An extensive literature survey in 2004 located relatively few documented
examples of data loss. Usually the literature notes only general categories of
digital material that are, or are thought to be, at risk. One of these comes from
the US federal government, for whom, O’Mahony notes, it was the norm,
when web sites or internet files were changed, to overwrite the old information.
Consequently ‘the public is now experiencing losses of government informa-
tion, on a scale similar to that of the catastrophic fire of 1921 [in which the
34 Why do we Preserve? Who Should do it?

1890 census records were destroyed], on what seems to be a regular basis’


(O’Mahony, 1998, p.108). Another example is of business records in electronic
form. A study of the companies that were relocated after the World Trade Center
bombing in 1993 found that 40 per cent ceased trading, a major reason being
the loss of key business records in electronic form, and ‘43 per cent of compa-
nies which lose their data close down’ (cited in Ross and Gow, 1999, p.iii).
Some of the documented examples are of data recovery, in the sense that data
thought to have become inaccessible were recoverable, or those thought to
have been lost were only mislaid. Ross and Gow note four case studies of data
recovery: the Challenger Space Shuttle Tapes, Hurricane Marilyn (the Virgin
Islands, September 1995), video image recovery from damaged 8mm recorders
(from a crashed fighter plane), and German unification and the recovery of
electronic records from East Germany (Ross and Gow, 1999, pp.39-42). Another
documented example is the Functional Requirements for Evidence in Record-
keeping project administered by the University of Pittsburgh. The web site
with the working files of this project was accidently destroyed, but because it
was not updated since 1996 it can be accessed through the Internet Archive; it
can also be seen at www.archimuse.com/papers/nhprc.
From the literature it is only possible to conclude, as the authors of the
Digital Preservation Coalition’s handbook (2008, p.32) did for the UK, that the
evidence of data loss is ‘as yet only largely anecdotal ... [but] it is certain that
many potentially valuable digital materials have already been lost’.
We should document some specific examples to counter the charge of being
alarmist. Useful questions to pose are: Is the problem of digital preservation as
great as we have assumed? How much digital information has been lost and
how much has been compromised? To what extent have the data been com-
promised? There are excellent reasons to put some effort into attempting to
quantify the extent of digital information loss or compromise or, at the very
least, document some specific examples to supplement the still limited number
of case studies. The desirability of more documented examples and case studies
has been recognized for some time. For example, Ross and Gow concluded
that ‘information about data loss, recovery, and risk is very difficult to acquire
… more case studies about data loss and rescue need to be collected’ (Ross
and Gow, 1999, p.vi).
The term data loss used here also includes data that are compromised; they
are degraded to the extent that their quality is affected. (The phrases loss of data
integrity or loss of data authenticity might also be used.) The data may still be
accessible, but we have no clear idea of what they mean, what software was
used to create them, and so on.
The question ‘How much digital information has been lost and how much
has been compromised?’ is impossible, to answer. No general estimates of
quantity based on solid evidence (as opposed to conjecture) have been located,
and few specific examples or carefully documented case studies seem to exist.
How much data have we lost? 35

The same examples are presented, even when they are no longer in the ‘lost or
compromised’ category: the BBC’s Domesday Project, NASA data, the Viking
Mars mission, the Combat Area Casualty file containing prisoner of war and
missing in action information for the Vietnam war, the first email, the first web
site, as described in more detail below. Attempting to answer the question
requires, first, a consideration of the issue of selection for preservation (noted
in more detail in Chapter 4). A common argument is that anything significant
is likely to be maintained anyway; so should we be concerned about the rest?
Some of the examples that follow assume that the first (email, web site, and so
on) is worth preserving; but is this necessarily the case? It is often a view
developed in hindsight. Betts tells us that Ray Tomlinson, principal engineer at
BBN Technologies in Cambridge, Massachusetts did not save the first network
email ever sent in 1972 because ‘it just didn’t seem worth saving … Even if
backup tapes did exist, they might not be readable. They were just mag tapes,
and after seven or eight years, the oxide starts falling off, especially from tapes
of that era’ (Betts, 1999).
The small number of specific examples located indicate how great the
problem of loss or compromise of digital materials could be. The most often
quoted, indeed overused, examples are those cited in the 1996 report of the Task
Force on Archiving of Digital Information. Because they have been reported
very widely since, they warrant quoting at some length. The report notes the
case of the US Census of 1960.

In 1976, the National Archives identified seven series of aggregated data from the 1960
Census as having long-term historical value. A large portion of the selected records,
however, resided on tapes that the Bureau could read only with a UNIVAC type-II-A
tape drive. By the mid-seventies, that particular tape drive was long obsolete, and the
Census Bureau faced a significant engineering challenge in preserving the data from
the UNIVAC type II-A tapes. By 1979, the Bureau had successfully copied onto industry-
standard tapes nearly all the data judged then to have long-term value (Task Force on
Archiving of Digital Information, 1996, p.2).

The report notes other lost examples, one of them the first email message ‘sent
either from the Massachusetts Institute of Technology, the Carnegie Institute
of Technology or Cambridge University’ in 1964 (Task Force on Archiving of
Digital Information, 1996, p.3).
Rothenberg reminds us of some examples noted in a 1990 US House of
Representatives report:

hundreds of reels of tape from the Department of Health and Human Services; files
from the National Commission on Marijuana and Drug Abuse, the Public Land Law
Review Commission, the President’s Commission on School Finance, and the National
Commission on Consumer Finance; the Combat Area Casualty file containing POW
and MIA information for the Vietnam war; herbicide information needed to analyze the
impact of Agent Orange; and many others (Rothenberg, 1999b, pp.1-2).
Discovering Diverse Content Through
Random Scribd Documents
TRANSCRIBER’S NOTES
Typos fixed; non-standard spelling and dialect
retained.
*** END OF THE PROJECT GUTENBERG EBOOK THE READER'S
GUIDE TO THE ENCYCLOPAEDIA BRITANNICA ***

Updated editions will replace the previous one—the old editions


will be renamed.

Creating the works from print editions not protected by U.S.


copyright law means that no one owns a United States
copyright in these works, so the Foundation (and you!) can copy
and distribute it in the United States without permission and
without paying copyright royalties. Special rules, set forth in the
General Terms of Use part of this license, apply to copying and
distributing Project Gutenberg™ electronic works to protect the
PROJECT GUTENBERG™ concept and trademark. Project
Gutenberg is a registered trademark, and may not be used if
you charge for an eBook, except by following the terms of the
trademark license, including paying royalties for use of the
Project Gutenberg trademark. If you do not charge anything for
copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such
as creation of derivative works, reports, performances and
research. Project Gutenberg eBooks may be modified and
printed and given away—you may do practically ANYTHING in
the United States with eBooks not protected by U.S. copyright
law. Redistribution is subject to the trademark license, especially
commercial redistribution.

START: FULL LICENSE


THE FULL PROJECT GUTENBERG LICENSE
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK

To protect the Project Gutenberg™ mission of promoting the


free distribution of electronic works, by using or distributing this
work (or any other work associated in any way with the phrase
“Project Gutenberg”), you agree to comply with all the terms of
the Full Project Gutenberg™ License available with this file or
online at www.gutenberg.org/license.

Section 1. General Terms of Use and


Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand,
agree to and accept all the terms of this license and intellectual
property (trademark/copyright) agreement. If you do not agree
to abide by all the terms of this agreement, you must cease
using and return or destroy all copies of Project Gutenberg™
electronic works in your possession. If you paid a fee for
obtaining a copy of or access to a Project Gutenberg™
electronic work and you do not agree to be bound by the terms
of this agreement, you may obtain a refund from the person or
entity to whom you paid the fee as set forth in paragraph 1.E.8.

1.B. “Project Gutenberg” is a registered trademark. It may only


be used on or associated in any way with an electronic work by
people who agree to be bound by the terms of this agreement.
There are a few things that you can do with most Project
Gutenberg™ electronic works even without complying with the
full terms of this agreement. See paragraph 1.C below. There
are a lot of things you can do with Project Gutenberg™
electronic works if you follow the terms of this agreement and
help preserve free future access to Project Gutenberg™
electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright
law in the United States and you are located in the United
States, we do not claim a right to prevent you from copying,
distributing, performing, displaying or creating derivative works
based on the work as long as all references to Project
Gutenberg are removed. Of course, we hope that you will
support the Project Gutenberg™ mission of promoting free
access to electronic works by freely sharing Project Gutenberg™
works in compliance with the terms of this agreement for
keeping the Project Gutenberg™ name associated with the
work. You can easily comply with the terms of this agreement
by keeping this work in the same format with its attached full
Project Gutenberg™ License when you share it without charge
with others.

1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.

1.E. Unless you have removed all references to Project


Gutenberg:

1.E.1. The following sentence, with active links to, or other


immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project
Gutenberg™ work (any work on which the phrase “Project
Gutenberg” appears, or with which the phrase “Project
Gutenberg” is associated) is accessed, displayed, performed,
viewed, copied or distributed:

This eBook is for the use of anyone anywhere in the United


States and most other parts of the world at no cost and
with almost no restrictions whatsoever. You may copy it,
give it away or re-use it under the terms of the Project
Gutenberg License included with this eBook or online at
www.gutenberg.org. If you are not located in the United
States, you will have to check the laws of the country
where you are located before using this eBook.

1.E.2. If an individual Project Gutenberg™ electronic work is


derived from texts not protected by U.S. copyright law (does not
contain a notice indicating that it is posted with permission of
the copyright holder), the work can be copied and distributed to
anyone in the United States without paying any fees or charges.
If you are redistributing or providing access to a work with the
phrase “Project Gutenberg” associated with or appearing on the
work, you must comply either with the requirements of
paragraphs 1.E.1 through 1.E.7 or obtain permission for the use
of the work and the Project Gutenberg™ trademark as set forth
in paragraphs 1.E.8 or 1.E.9.

1.E.3. If an individual Project Gutenberg™ electronic work is


posted with the permission of the copyright holder, your use and
distribution must comply with both paragraphs 1.E.1 through
1.E.7 and any additional terms imposed by the copyright holder.
Additional terms will be linked to the Project Gutenberg™
License for all works posted with the permission of the copyright
holder found at the beginning of this work.

1.E.4. Do not unlink or detach or remove the full Project


Gutenberg™ License terms from this work, or any files
containing a part of this work or any other work associated with
Project Gutenberg™.

1.E.5. Do not copy, display, perform, distribute or redistribute


this electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the
Project Gutenberg™ License.

1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must,
at no additional cost, fee or expense to the user, provide a copy,
a means of exporting a copy, or a means of obtaining a copy
upon request, of the work in its original “Plain Vanilla ASCII” or
other form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying,


performing, copying or distributing any Project Gutenberg™
works unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or


providing access to or distributing Project Gutenberg™
electronic works provided that:

• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”

• You provide a full refund of any money paid by a user who


notifies you in writing (or by e-mail) within 30 days of receipt
that s/he does not agree to the terms of the full Project
Gutenberg™ License. You must require such a user to return or
destroy all copies of the works possessed in a physical medium
and discontinue all use of and all access to other copies of
Project Gutenberg™ works.

• You provide, in accordance with paragraph 1.F.3, a full refund of


any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.

• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project


Gutenberg™ electronic work or group of works on different
terms than are set forth in this agreement, you must obtain
permission in writing from the Project Gutenberg Literary
Archive Foundation, the manager of the Project Gutenberg™
trademark. Contact the Foundation as set forth in Section 3
below.

1.F.

1.F.1. Project Gutenberg volunteers and employees expend


considerable effort to identify, do copyright research on,
transcribe and proofread works not protected by U.S. copyright
law in creating the Project Gutenberg™ collection. Despite these
efforts, Project Gutenberg™ electronic works, and the medium
on which they may be stored, may contain “Defects,” such as,
but not limited to, incomplete, inaccurate or corrupt data,
transcription errors, a copyright or other intellectual property
infringement, a defective or damaged disk or other medium, a
computer virus, or computer codes that damage or cannot be
read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except


for the “Right of Replacement or Refund” described in
paragraph 1.F.3, the Project Gutenberg Literary Archive
Foundation, the owner of the Project Gutenberg™ trademark,
and any other party distributing a Project Gutenberg™ electronic
work under this agreement, disclaim all liability to you for
damages, costs and expenses, including legal fees. YOU AGREE
THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT
LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT
EXCEPT THOSE PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE
THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY
DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE
TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL,
PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE
NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you


discover a defect in this electronic work within 90 days of
receiving it, you can receive a refund of the money (if any) you
paid for it by sending a written explanation to the person you
received the work from. If you received the work on a physical
medium, you must return the medium with your written
explanation. The person or entity that provided you with the
defective work may elect to provide a replacement copy in lieu
of a refund. If you received the work electronically, the person
or entity providing it to you may choose to give you a second
opportunity to receive the work electronically in lieu of a refund.
If the second copy is also defective, you may demand a refund
in writing without further opportunities to fix the problem.

1.F.4. Except for the limited right of replacement or refund set


forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’,
WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied


warranties or the exclusion or limitation of certain types of
damages. If any disclaimer or limitation set forth in this
agreement violates the law of the state applicable to this
agreement, the agreement shall be interpreted to make the
maximum disclaimer or limitation permitted by the applicable
state law. The invalidity or unenforceability of any provision of
this agreement shall not void the remaining provisions.

1.F.6. INDEMNITY - You agree to indemnify and hold the


Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and
distribution of Project Gutenberg™ electronic works, harmless
from all liability, costs and expenses, including legal fees, that
arise directly or indirectly from any of the following which you
do or cause to occur: (a) distribution of this or any Project
Gutenberg™ work, (b) alteration, modification, or additions or
deletions to any Project Gutenberg™ work, and (c) any Defect
you cause.

Section 2. Information about the Mission


of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new
computers. It exists because of the efforts of hundreds of
volunteers and donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the


assistance they need are critical to reaching Project
Gutenberg™’s goals and ensuring that the Project Gutenberg™
collection will remain freely available for generations to come. In
2001, the Project Gutenberg Literary Archive Foundation was
created to provide a secure and permanent future for Project
Gutenberg™ and future generations. To learn more about the
Project Gutenberg Literary Archive Foundation and how your
efforts and donations can help, see Sections 3 and 4 and the
Foundation information page at www.gutenberg.org.

Section 3. Information about the Project


Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-
profit 501(c)(3) educational corporation organized under the
laws of the state of Mississippi and granted tax exempt status
by the Internal Revenue Service. The Foundation’s EIN or
federal tax identification number is 64-6221541. Contributions
to the Project Gutenberg Literary Archive Foundation are tax
deductible to the full extent permitted by U.S. federal laws and
your state’s laws.

The Foundation’s business office is located at 809 North 1500


West, Salt Lake City, UT 84116, (801) 596-1887. Email contact
links and up to date contact information can be found at the
Foundation’s website and official page at
www.gutenberg.org/contact
Section 4. Information about Donations to
the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission
of increasing the number of public domain and licensed works
that can be freely distributed in machine-readable form
accessible by the widest array of equipment including outdated
equipment. Many small donations ($1 to $5,000) are particularly
important to maintaining tax exempt status with the IRS.

The Foundation is committed to complying with the laws


regulating charities and charitable donations in all 50 states of
the United States. Compliance requirements are not uniform
and it takes a considerable effort, much paperwork and many
fees to meet and keep up with these requirements. We do not
solicit donations in locations where we have not received written
confirmation of compliance. To SEND DONATIONS or determine
the status of compliance for any particular state visit
www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states


where we have not met the solicitation requirements, we know
of no prohibition against accepting unsolicited donations from
donors in such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot


make any statements concerning tax treatment of donations
received from outside the United States. U.S. laws alone swamp
our small staff.

Please check the Project Gutenberg web pages for current


donation methods and addresses. Donations are accepted in a
number of other ways including checks, online payments and
credit card donations. To donate, please visit:
www.gutenberg.org/donate.

Section 5. General Information About


Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could
be freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose
network of volunteer support.

Project Gutenberg™ eBooks are often created from several


printed editions, all of which are confirmed as not protected by
copyright in the U.S. unless a copyright notice is included. Thus,
we do not necessarily keep eBooks in compliance with any
particular paper edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,


including how to make donations to the Project Gutenberg
Literary Archive Foundation, how to help produce our new
eBooks, and how to subscribe to our email newsletter to hear
about new eBooks.

You might also like