Digitisation Scoping Study I
Digitisation Scoping Study I
2. Pamphlet collections 7
2.2 Collection profiles 15
2.3 Recommended strategies 21
3.1 Digital images 28
3. Digital datasets 28
3.2 OCR text 30
3.3 Metadata 32
3.4 Production and QA workflow 34
4.1 Workflow 35
4. Project workflow 35
4.2 Work plan 37
4.3 Work packages 38
5. Conclusions and
recommendations 44
Appendix B – Glossary 48
PAGE
1. Introduction
1.1 Background to the 19th Century Pamphlets Online scoping study
A short scoping study was undertaken in July and August 2006 to clarify some
elements of a bid under the JISC Digitisation Programme (Phase 2). This study
received £6,239 in funding from the JISC Digitisation Programme. Several partners
to the bid also made contributions in kind.
1.1.1 Aims
The study had three main aims:
2. To provide information for a detailed Project Plan should the bid be successful
1.1.2 Activities
The scoping activities took five main forms:
1. Surveys into the format and condition of the pamphlets and the extent of
duplication between collections
3. Tests of scanning and OCRing 19th century pamphlets and analogous material
4. Discussions among partners and with external parties (e.g. over workflow,
technical specifications, transportation of original materials and data, and
potential for linking with related projects)
1.1.3 Deliverables
The initial plan for this scoping study proposed the following deliverables:
PAGE
19th Century Pamphlets Online
Digitisation Scoping Study
■ Short narrative report covering any additional findings – this current document
Some of these deliverables have already been included in the revised project
proposal. They are all included here for completeness and for those who do not
have access to that document.
The University of Bristol and other contributors license all the deliverables of this
study to the JISC non-exclusively and in perpetuity.
This section provides a brief summary of the proposed project for those without
access to the bid document. It provides the context for this report and its
deliverables.
The proposal is for Phase 1 of a larger 19th Century Pamphlets Online project. That
larger project has the vision of providing researchers, learners and teachers with
online access to the most significant pamphlet collections held in UK research
libraries. Phase 1 aims to digitise a proportion of the pamphlets that are held
within CURL1 libraries and searchable via Copac2. This first phase concentrates
1 See at www.curl.ac.uk
on collections with a strong political, economic and social focus, and is subtitled:
Pamphlets as a Guide to the Parliamentary Debates of the 19th Century.
2 See at http://copac.ac.uk/about
PAGE
19th Century Pamphlets Online
Digitisation Scoping Study
This scoping study has confirmed that the proposed project is viable, can be
3 See at www.bopcris.ac.uk
managed within the allotted timescale (2 years, from January 2007-December
4 See at www.jstor.org 2008), and would be extensible. Findings from this study have informed the revision
5 See at www.mimas.ac.uk of the bid for stage 2, particularly its methodology (e.g. standards and work
6 See at www.jorum.ac.uk packages) and its budget (which has risen). Much of the content of this report has
7 See at www.curl.ac.uk/rslpguide/
found a place within the revised bid.
guidehp.htm
8 See at www.google.co.uk
9 See at http://scholar.google.com
10 See at www.crossref.org
PAGE
19th Century Pamphlets Online
Digitisation Scoping Study
The deliverables of this scoping study have taken a very concrete form (criteria,
standards, strategies, diagrams, etc). All the deliverables promised in the scoping
study plan (see 1.1.3) are included in this report, along with several others. Here is
a full list, along with a brief summary of their findings or approach:
1. Results of a survey into the format and condition of the pamphlets (see 2.1.1).
The survey found that many pamphlets would pose challenges for scanning.
It also found a great deal of variation among the collections. An important
finding was that previously estimated page averages were too low, requiring
a reduction in the estimated number of pamphlets to be digitised within this
phase.
2. Results of a survey into the extent of duplication across the collections (see
2.1.2). The survey found significant duplication across some of the collections,
requiring some means of de-duplication, and resulting in a reduction in the
number of pamphlets expected from some collections.
3. Profiles of seven 19th century pamphlet collections (see 2.2). These identify
key issues likely to affect the scheduling and scan-time for the collections.
4. Selection strategy and criteria (see 2.3.1). As five whole collections have been
pre-selected for inclusion in the project, a de-selection criteria is outlined, along
with a selection criteria for the two remaining collections.
5. Copyright strategy and workflow (see 2.3.2). The proposed strategy requires
library partners to take primary responsibility for identifying and dealing with
any copyright concerns, but the project will provide support.
7. Gannt chart showing scheduling of collections (see 2.3.4). This chart shows
one possible schedule for selecting and digitising the seven collections, based
on their characteristics and the capacity of BOPCRIS.
8. Image capture standards (see 3.1). A mix of bitonal, grey and colour capture
is proposed in order to balance the need to provide easy access to intellectual
content with that of representing the pamphlets as historical objects. The
chosen formats are standard and conform to JISC IE11 and Minerva12 technical
guidelines.
9. OCR benchmarks (see 3.2). Based on tests, the project expects to achieve an
average accuracy, per character, of 97-98%. Additional specialist software
would be used to maintain high accuracy levels across more difficult material.
10. Metadata standards (see 3.3). A suite of established and new XML-based
metadata standards is proposed for the archival and delivery datasets. The
chosen standards conform to IE and Minerva guidelines.
11. Description of production workflow and quality assurance (QA) (see 3.4).
11 See at www.ukoln.ac.uk/distributed- Presents an overview of the production workflow, indicating how QA fits into the
systems/jisc-ie/arch/standards/
workflow.
12 See at www.minervaeurope.org/
publications/technicalguidelines/
tablecontents.htm
PAGE
19th Century Pamphlets Online
Digitisation Scoping Study
12. Diagram showing overall project workflow (see 4.1). Presents a graphical
representation of the flow of original pamphlets and digitised datasets,
indicating how these relate to other workflows and work packages.
13. Gannt chart showing work plan (see 4.2). Presents a graphical view of work
packages and associated activities
Apart from these tangible deliverables, important outcomes of the scoping study
were the good working relationships established between project partners (e.g.
libraries and the project team, BOPCRIS and JSTOR). The partners worked very
successfully together in completing the work required for this study in a short
timeframe. Partners were actively involved in the definition and negotiation of the
approaches chosen for the project. The combination of a determined methodology
and established relationships would be a powerful enabler in any extension of this
work or development of similar joint projects.
1. That scoping studies such as this are undertaken for all large digitisation projects of
this nature, especially where multiple partners are involved.
2. That the 19th Century Pamphlets Online project, should it proceed, evaluate the
findings and approaches chosen in this study in light of the practical reality of the
project, and disseminate its findings.
3. That this report has a wider application than this current project and should be
made available to others undertaking similar work.
PAGE
2. Pamphlet collections
This section provides profiles of the collections, including their condition,
duplication with other collections, and factors influencing their selection and
scheduling for scanning. It draws on survey work, visits and correspondence with
those responsible for the collections.
2.1 describes two collection-based surveys and their findings; 2.2 presents brief
profiles of the individual collections, drawing on the surveys, visits and additional
information; 2.3 recommends strategies for addressing selection, copyright,
de-duplication, and the scheduling of the collections, based on information in the
preceding sections. A later section of this report shows how these activities fit
within the overall project work plan and workflow (section 4).
In preparation for the initial project proposal, contributing libraries were asked
to provide information about the size and condition of their collections. Libraries
supplied a total number of pamphlets for their collections. Total page numbers
were then estimated by multiplying the number of pamphlets by an average of 25
pages per pamphlet (for 3 collections) or 35 pages per pamphlet (4 collections, who
said their average would be higher than 25). Based on these calculations, the initial
project bid estimated that 1 million pages (the BOPCRIS capacity for this project)
would approximate 30,000 pamphlets.
As part of the scoping work for the revised project proposal, a survey was
undertaken to collect more accurate page statistics and determine other factors
likely to affect scan time and image or OCR quality.
PAGE
19th Century Pamphlets Online
Digitisation Scoping Study
Methodology
Libraries were asked to check a random selection of 100 items drawn from the
collections chosen for inclusion in this project. In five cases a computer program
was used to generate this selection from lists supplied by the libraries; in two
cases (Durham and UCL) the sampling was made on a systematic basis by library
staff, but without sight of the pamphlet or its title.
Each library was asked to check a range of characteristics, such as size, location,
condition of binding, and presence of greyscale or colour illustrations. These
criteria were drawn up by Julian Ball (BOPCRIS Manager) and the author. They
were based on experience and some initial testing in Southampton (see Section 3
below).
The following criteria were chosen for the format and condition survey:
A copy of a page from the survey form is included as Appendix A. Questions 1-18
on this form relate to this survey and follow the order of the criteria listed above.
As the survey began it became clear that some collections included significant
proportions of non-English language pamphlets, so libraries were additionally
asked to gather this information. This was not always possible where the survey
work was already underway.
This survey was undertaken during 4-11 August, with some libraries spending up
to 10 hours on it. This time was provided as a partner contribution to the scoping
LSE pamphlets. Image with permission study.
from the LSE.
PAGE
19th Century Pamphlets Online
Digitisation Scoping Study
Results
The tables below provide findings from the format and condition survey. General
discussion is provided in this section, with the significant characteristics of each
collection detailed in section 2.2 below.
Table 2.1.1:A
Location and page averages (based on answers to questions 2 and 9)
Collection Location Number of items checked1 Total pages Average pages per
found pamphlet
Bristol Off-site store (83%) 100 4548 45.5
and
On-site archive (17%)
Durham2 On-site archive 94 6141 65.3
Liverpool On-site archive 99 4246 42.8
LSE On-site store 100 4564 45.6
Manchester On-site store 100 3466 34.7
Newcastle On-site store 100 2903 29.0
UCL On-site archive 99 4173 42.2
Note 1. In some cases a number less than 100 were checked, due to missing items or time constraints.
Note 2. Durham’s sample included some 20th century items and other forms of publication, which may account for its higher page average.
Table 2.1.1:B
Grey and colour pages (based on answers to questions 9, 10 and 11)
Collection Pamphlets Total grey pages % of total pages Pamphlets Total colour
containing that are grey containing colour pages
greyscale
Bristol 10% 108 2.4% 2% 2
Durham1 16% 256 4.2% 3% 3
Liverpool 0% 0 0% 3% 3
LSE 6% 55 1.2% 1% 6
Manchester 11% 38 1.1% 10% 10
Newcastle 6% 22 0.8% 0% 0
UCL 7% 12 0.3% 1% 2
Note 1. Durham’s sample included some 20th century items and other forms of publication, which may account for the higher number of grey
pages.
PAGE
19th Century Pamphlets Online
Digitisation Scoping Study
Table 2.1.1:C
Binding condition (based on answers to questions 3-6)
Table 2.1.1:D
Pamphlets with foldings, annotations and adverts (based on answers to questions 12-15)
Pamphlets folded to fit volumes (left) and fold-outs (right). Images with permission from UCL.
PAGE 10
19th Century Pamphlets Online
Digitisation Scoping Study
Table 2.1.1:E
Pamphlets with unusual typefaces, multiple copies and foreign languages (based on answers to questions 16-17 and an
additional question1)
Note 1. The foreign language check was introduced after the survey had begun when it appeared this would be significant for some collections.
Consequently, not all libraries recorded this information.
Table 2.1.1:F
Size, loose or missing pages and overall suitability for scanning (based on questions 1, 7-8, 18)
Note 1. Durham’s sample included some 20th century items and other forms of publication, which may account for the larger sizes.
PAGE 11
19th Century Pamphlets Online
Digitisation Scoping Study
Discussion
Some care needs to be taken in drawing conclusions based on this survey, since
only 100 (or less) items were assessed from each collection. For the smaller
collections (Durham and Liverpool) this represents 6 or 7% of the collection;
but for the larger collections (LSE and Bristol) it is less than 1%. Nonetheless,
it provides better information than the previous estimates and enables some
profiling to be done. Should the project go ahead it would enable some of these
statistics to be checked and would provide an indication of how accurate and useful
this approach to characterising collections is.
Section 2.2 provides a brief profile of each collection and highlights characteristics
identified by the condition survey that are likely to impact upon scan time. These
and other factors noted in that section have influenced the scheduling of the
collections in 2.3.4 below (both their order and allocation of time).
One of the most important findings from the survey was that the page averages for
these collections was much higher than the 25-35 previously estimated. Because
the project had set the number of pages to capture at 1 million (the BOPCRIS
capacity for this project), this meant reducing the overall number of pamphlets
we would expect to capture. The revised bid recalculated each collection’s page
numbers based on the averages found in this survey (see Table 2.2.1:A above for
averages). When combined with a reduction for anticipated duplication (see next
section), this reduced the overall number of pamphlets from nearly 30,000 to just
over 23,000. Table 2.1.2:B, below, presents new estimates for each collection. Note
that because selections are to be made from the LSE and Bristol the numbers
of pamphlets from these two collections were adjusted until 1 million pages was
achieved.
Methodology
It was hoped that duplication could be gauged by an automated means using the
CURL13 or Copac14 databases. This did not prove possible due to the existence
of multiple records and the lack of time and resources to develop suitable tools
for comparison. A part of the work of MIMAS in Work Packages 3 and 8 of the
project (see 4.3 below) will be to develop such tools to enable all libraries holding
a pamphlet that has been digitised to be provided with links. In the absence of an
13 See at www.curl.ac.uk/database automated means, the same 100 item sample used for the condition survey (2.1,
14 See at http://copac.ac.uk
PAGE 12
19th Century Pamphlets Online
Digitisation Scoping Study
above) was checked by libraries against Copac for duplicates: (a) across the six
other libraries contributing to Phase 1 and (b) against all holdings on Copac. This
check was included as questions 19 and 20 on the survey form (see Appendix A).
The duplication survey took place alongside the format and condition survey,
during 4-11 August. In some libraries the same staff members completed both
surveys; in others, special collections staff undertook the format and condition
survey (questions 1-18) and cataloguing staff undertook the duplication survey
(questions 19-20). Checking 100 records on Copac took up to 6 hours for some
libraries. This time was provided as a partner contribution to the scoping study.
Results
The table below presents findings from the duplication survey. General discussion
is provided in this section, with comments relating to individual collections
discussed in section 2.2 below.
Table 2.1.2:A
Duplication survey results (based on questions 19 and 20 of survey form)
Collection Unique on Duplicated within any Duplicated within individual Duplicated within any
Copac partner library partner libraries non-partner libraries
Bristol 23% 32% LSE (19%); Manchester (10%); 70%
Liverpool (6%); Newcastle (3%);
Durham (1%); UCL (0%)1
Durham 41% 40% LSE (20%); Bristol (19%); 53%
Manchester (18%); Liverpool (4%);
Newcastle (4%); UCL (1%)1
Liverpool 24% 45% Bristol (28%); LSE (24%); 61%
Manchester (16%); Durham (4%);
UCL (2%)1; Newcastle (0%)
LSE 37% 40% UCL (22%)1; Bristol (13%); 50%
Manchester (9%); Durham (5%);
Liverpool (3%); Newcastle (3%)
Manchester 44% 21% Bristol (10%); LSE (8%); Liverpool 55%
(5%); Newcastle (2%); Durham
(1%); UCL (1%)1
Newcastle 25% 44% Bristol (28%); LSE (18%); Liverpool 61%
(8%); Manchester (8%); Durham
(2%); UCL (0%)2
UCL1 32% 28% LSE (12%); Manchester (10%); 57%
Bristol (4%); Durham (3%);
Liverpool (3%); Newcastle (1%)
Note 1. There has been a delay in loading some of UCL’s records into the CURL/Copac databases, so the duplication with its collection is not
fully represented here. UCL’s records will be loaded before the project commences.
PAGE 13
19th Century Pamphlets Online
Digitisation Scoping Study
Discussion
As previously mentioned, care needs to be taken in drawing conclusions based
on this survey, since only 100 (or less) items were assessed from each collection.
Nonetheless, it provides better information than previous guesses and enables
some estimation to be done. Should the project go ahead it would enable the
accuracy and usefulness of this approach to be evaluated.
There is a fairly high level of duplication with the LSE and Bristol, which have large
19th century pamphlet collections (an estimated 15,989 and 22,150, respectively,
on Copac). However, the project intends to make a selection from these two
collections rather than capture them in their entirety, so the overlap with these
collections can be compensated for in the selection process.
It is important to note, however, that the records for UCL’s 19th century pamphlets
are not yet fully loaded onto the CURL and Copac databases. It is likely that there is
higher duplication with this collection than is apparent in the survey. UCL expect to
have their full records on Copac before the project start date (January 2008).
For the purposes of putting together the revised bid we have assumed a duplication
of half of the total amount found ‘duplicated within any partner collection’ for the
Durham, Liverpool, Manchester, Newcastle and UCL collections and reduced
the number of pamphlets we expect to capture accordingly. It may be that this
reduction is larger than is necessary and we would seek to refine these estimates
as the project proceeds. As selections are being made from Bristol and LSE,
their numbers were not reduced for duplication, but were adjusted upwards to
compensate for the reduction across the other collections, in order to make the
collection up to 1 million pages.
PAGE 14
19th Century Pamphlets Online
Digitisation Scoping Study
The table below shows the combined effects of the higher page averages (see 2.1.1
above) and the reductions made to account for duplication. Although it is hoped
that this is closer than the previous estimates, it is likely that further adjustments
will need to be made as the project proceeds and the numbers of pamphlets may
rise or fall.
Table 2.1.2:B
Effects of higher page averages and reduction for duplication
This section provides profiles for each of the seven collections, including overviews
of the collection content and a list of significant points that emerged from the
surveys, or from visits and discussions with collection contacts. All collections
were visited during the scoping study.
■ Collections were presented with the option of doing their own scanning for this
project, but all are happy to use BOPCRIS as a consortia scanning service
■ Everyone was happy with the suggested transport arrangements (Harrow Green
15
, Momart16 or equivalent) and most preferred the transport company to pack
the pamphlets
15 See at www.harrowgreen.com
16 See at www.momart.co.uk
PAGE 15
19th Century Pamphlets Online
Digitisation Scoping Study
■ Everyone was happy with the level of insurance cover provided by the transport
firms we asked to quote (minimum of £100,000 per consignment) and by
BOPCRIS whilst on premises (minimum of £100,000)
■ Everyone was happy with the storage conditions being offered by BOPCRIS,
which is temperature controlled, has low ultraviolet levels and is only
accessible by authorised personnel
■ No one said they would require the return of original items from BOPCRIS if
requested by users whilst away: each would notify users of their collection’s
absence and be happy to make do with photocopies or digital images (if already
scanned) from BOPCRIS
■ Collections will have the option to receive digital datasets relating to their own
items (images, metadata and OCR), but few intend to take up this option within
the life of the project (Durham, Manchester and possibly UCL)
University of Bristol
National Liberal Club Pamphlets (selection from this collection)
Overview of this collection
These pamphlets are from the libraries of, amongst others, Charles Bradlaugh, John Noble, the Liberation Society, the
Land Nationalisation Society and the Cobden Club. There are also many individual items given by W.E. Gladstone and
other prominent politicians. The collection is especially strong on 19th century commerce, economics, finance, politics,
religion and sociology. It includes publications by and about not only the Liberal Party, but also the Conservative and
Labour Parties.
For more information about the collection, please see:
www.curl.ac.uk/rslpguide/BristolNatLibClub.htm.
Key points emerging from surveys, visits and correspondence
■ Although this is a liberal club collection, its inclusion of other party pamphlets will provide a good political
representation
■ Collection is largely held off-site (83%) which is likely to slow the selection
■ Collection contains a lot of separate pamphlets (92%) which is likely to slow scanning
■ There is a high proportion of in-margin stitching (18% of the separates) which will slow scanning and may affect the
quality of imaging (due to page bowing)
■ The high proportion of separates means the collection may be useful in replacing poor quality copies bound into
other collections (we note the high duplication of other collections with Bristol)
■ The proportion of bound volumes is small (8%), but there may be difficulties capturing these because of the
tightness of binding (63% of bound volumes cannot be opened flat)
■ Bristol have no particular timing issues, but the collection will need to be broken into consignments because of the
volume and the added time required for selection
■ The bulk of the Bristol material should be staged towards the end of the project to fill gaps and replace poor
duplicates
PAGE 16
19th Century Pamphlets Online
Digitisation Scoping Study
Durham University
Earl Grey Pamphlets (all 19th Century pamphlets from collection)
Overview of this collection
This is a family collection accumulated largely by the 2nd, 3rd and 4th Earls Grey. Charles was Foreign Secretary
in 1806-1807 and Prime Minister 1830-1834. Henry George was Under-Secretary for Home Affairs in 1830, Under-
Secretary for the Colonies in 1830-1834, Secretary at War in1835-1839 and Secretary of State for the Colonies in 1846-
1852. Albert Henry George was Administrator of Rhodesia in 1896-1897 and Governor-General of Canada in 1904-1911.
The Greys were strongly interested in parliamentary reform, colonial affairs and Catholic emancipation.
For more information about the collection, please see:
www.curl.ac.uk/rslpguide/searinst.htm#D
Key points emerging from surveys, visits and correspondence
■ The Earl Grey Pamphlets collection is not owned by Durham but on loan from the family. Lord Howick, the current
owner, has given permission for the collection to be included within the project
■ The condition survey found surprisingly high page averages (65), greyscale counts (16% of pamphlets, 4.2% of
pages), and larger sized items (44% were nearer to A4 than A5). Unfortunately the survey was skewed by the
presence of some 20th century material and some books, journal issues, and official publications within the sample
(these form part of the Earl Grey collection). With this material excluded, as this project would do, the number of
items will drop to about 1,250 and these statistics would be likely to change.
■ None of Durham’s pamphlets are bound, but are all held separately in archive boxes, requiring special storage and
handling and slowing the scanning
■ There is a reasonable proportion of in-margin stitching (13% of items) which will slow scanning and may affect the
quality of imaging (due to page bowing)
■ Although not asked for in the survey criteria, the person completing the form for Durham noted that 7 items (out of
94) contained uncut pages. Southampton will need to cut these pages for scanning or seek duplicates (Durham have
approved cutting)
PAGE 17
19th Century Pamphlets Online
Digitisation Scoping Study
University of Liverpool
Knowsley Pamphlet Collection (all 19th Century pamphlets from collection)
Overview of this collection
This is a family collection, reflecting the careers of the Earls of Derby. Edward George was successively Irish Secretary
(1830-33), Colonial Secretary (1833-34 and 1841-44) and three times Prime Minister (1852, 1858-59 and 1866-68). His
career was summarised by Disraeli as follows: “He abolished slavery, he educated Ireland, he reformed parliament”.
His son, Edward Henry, 15th Earl of Derby (1826-1893) was Colonial Secretary and later Indian secretary in his father’s
administration of 1858-59.
For more information about the collection, please see:
www.curl.ac.uk/rslpguide/LiverpoolKnowsley.htm
Key points emerging from surveys, visits and correspondence
■ The Knowsley collection is 100% bound in volumes which are uniform (consistent binding, pages trimmed to size)
and open easily, factors which could speed the scanning process
■ However, a large proportion of these pamphlets (47%) are in volumes with loose boards, which will require special
care and handling and the use of a specialist scanner
■ Although there is high duplication, this is largely with Bristol and LSE and so can be avoided through the selection
and de-duplication processes
■ This collection would need to be scheduled in 2007, because collection is due to be wrapped for building work in
2008
■ Liverpool would prefer that the collection goes after Easter, since the first part of the year is the busiest for this
material, but will be flexible
LSE
Pamphlet Collection (selection from this collection)
Overview of this collection
This collection includes a comprehensive set of material from political parties, including election manifestos and
political cartoons. Issues in British political history include the corn laws, land question, Church and state and home
rule for Ireland. There is a wealth of material on the co-operative movement, including the Cooperative Women’s Guild,
and from long-standing pressure groups such as the Fabian Society and organisations which have long disappeared
such as the Cobden Club, the Imperial Federation Defence Committee, the Poor Law Reform Association, the
Workhouse Visiting Society and the Liberty and Property Defence League.
For more information about the collection, please see:
www.lse.ac.uk/library/pamphlets/
Key points emerging from surveys, visits and correspondence
■ The LSE pamphlet collection is 100% bound, which is good for scanning but not so good for selection – we will try to
select by the volume
■ The condition survey suggested that a high proportion of pamphlets might not be available for scanning due to their
condition (7%), but this can be avoided through the selection process
■ There are a high proportion of foreign language pamphlets, but this is also likely to be avoided through the selection
process (which with focus on UK debates)
■ Work is planned for the store where the pamphlets are housed in either 2007 or early 2008 – this will fit well with
the project because LSE needs to be scheduled late in the project in order to pick up duplicates and fill gaps around
other collections
■ As this is a larger collection, it will need to be broken into consignments
PAGE 18
19th Century Pamphlets Online
Digitisation Scoping Study
University of Manchester –
Foreign & Commonwealth Office Pamphlet Collection (all 19th Century pamphlets from collection)
Overview of this collection
This collection is on deposit from The Foreign and Commonwealth Office (FCO). It comprises two earlier collections.
(1) The Foreign Office pamphlet collection, consisting largely of pamphlets acquired by British ambassadors overseas
and sent back to London as being of value for the formulation of policy. This collection is rich in material from South
America (where the British government was the formal arbitrator in boundary disputes), the Near East (both the
last century of the Turkish Empire and the growth of Zionism) and the various great European “Questions”, from the
Congress of Vienna through to the aftermath of the creation of the German Empire. (2) The Colonial Office pamphlet
collection, consisting chiefly of local imprints including, e.g., unique early Australiana.
For more information about the collection, please see:
www.curl.ac.uk/rslpguide/ManchesterForeign.htm
Key points emerging from surveys, visits and correspondence
■ The Foreign and Commonwealth Office pamphlet collection is not owned by the University of Manchester but is on
loan from the FCO. Permission has been obtained for the collection to be included within the project
■ The larger Foreign Office component in this collection is bound (71%) and often in tight binding (28%), which will
require special scanning
■ The smaller Colonial Office component is unbound, but were held in folders with pillars, which have occasionally
punched through the text – this might necessitate replacement with duplicates, if available
■ A very large proportion of the separate pamphlets also have in-margin stitching (45%), which will slow scanning and
may affect the quality of imaging (due to page bowing)
■ During the visit, many older font styles were found (e.g. long s’s, diphthongs, and ligatures) – these will require the
use of specialist OCR software
■ There is a high proportion of foreign language material (23%), including non-Latin scripts (1%). Specialist OCR
software will be required for foreign languages, but cannot recognise the non-Latin pamphlets.
■ This collection contains a higher proportion of foldouts and (consequently) colour pages than other collections
– these will require specialist scanning
■ There is a high proportion of annotations (10%), which will require greyscale scanning
■ There is a higher proportion of unique items than others, so there is less reduction for duplication, but also less
chance of finding replacements for poor copies
PAGE 19
19th Century Pamphlets Online
Digitisation Scoping Study
University of Newcastle –
Cowen Tracts (all 19th Century pamphlets from collection)
Overview of this collection
This is a personal collection of Newcastle-born Joseph Cowen (1829-1900), Member of Parliament (MP) and social
reformer. On his father’s death in 1873, he was elected in his place as MP for Newcastle and, though he came into
conflict with both the Parliamentary and local Liberal parties, he remained MP for Newcastle until he retired in 1886.
The Cowen Tracts date, in the main, from Cowen’s active years of the late 1840s to early 1880s, though there is some
earlier and later material. The topics covered largely reflect his main interests of social, educational and economic
issues.
For more information about the collection, please see:
www.curl.ac.uk/rslpguide/NewcastleCowenTracts.htm
Key points emerging from surveys, visits and correspondence
■ The Cowen collection is 100% bound (in three types of binding) and will generally open flat (3% will not)
■ There are some potential issues with loose boards (15%), which will require special care and handling and a
specialist scanner
■ There is a high proportion of annotations (9%), which will require greyscale scanning
■ There are some unusual typefaces (4%) which will require use of special OCR software
■ Newcastle is happy for the collection to be done in one consignment and there are no particular timing issues
UCL –
Hume Tracts (all 19th Century pamphlets from collection)
Overview of this collection
This is the personal collection of Joseph Hume (1777-1855), Radical Member of Parliament. Its broad subject-matter
reflects the major political, economic and social developments and reforms taking place in Britain in the early part
of the nineteenth century, and includes some of the causes championed by Hume during his parliamentary career,
such as universal suffrage, Catholic emancipation, a reduction in the power of the Anglican church and an end to
imprisonment for debt.
For more information about the collection, please see:
www.curl.ac.uk/rslpguide/UCLHume.htm
Key points emerging from surveys, visits and correspondence
■ The Hume collection is very well bound (100%), with no loose boards or issues with opening flat
■ A very high percentage of the pamphlets are annotated (43%), which will require grey or colour, take longer to scan,
and may pose some OCR issues
■ There are some personal items (e.g. letters) bound between pamphlets – these will be excluded from the project
■ The collection contains a high proportion of fold-outs (9%), which will require specialist scanning
■ There is a low level of duplication with other partner libraries suggesting this is a fairly unique collection
■ The entire collection is pre-1855, so there are no copyright issues for this collection.
The key points identified for these collections have influenced the proposed
scheduling made in section 2.3.4 below.
PAGE 20
19th Century Pamphlets Online
Digitisation Scoping Study
■ The collections can be linked with the collection-level descriptions in the online
Guide to 19th Century Pamphlets;17 and found in collection-based searches.
De-selection criteria
For five collections, then, the appropriate strategy is not a selection strategy, but
a de-selection strategy. Although the goal is to capture as much as is possible of
these collections, the project anticipates de-selecting some pamphlets from digital
capture, for these reasons:
■ The pamphlet was published outside the bounds of the 19th century (either
earlier or later). We note that extending into the 20th century would greatly
increase copyright issues.
■ The pamphlet has already been captured or is better captured from another
17 See at www.curl.ac.uk/rslpguide/
guidehp.htm collection (see De-Duplication Strategy, 2.3.3 below)
PAGE 21
19th Century Pamphlets Online
Digitisation Scoping Study
Note that the project will not be deselecting on the basis of difficulties in scanning
or OCR. BOPRIS has specialist scanners that can capture any material a library is
willing to send.
Selection criteria
For the remaining two collections (Bristol and LSE), a positive selection of
pamphlets will be made and will be based on the following criteria:
■ Their relevance to themes of the great 19th century debates (e.g. universal
suffrage, relationship of church and state, colonial policy), as identified by
the collection curators, and the academics and teachers involved in the
management and steering groups.
■ Their usefulness in addressing gaps in the digital collection (e.g. themes not
well covered, formats not represented, particular authors who should have
a voice). Again, these gaps will be identified by curators, researchers and
teachers.
■ Feedback and demand from collection users: the bulk of Bristol and LSE
material will be selected later in the project, by which stage there will be
material available online and the possibility of tracking usage and surveying
users.
■ Replacements for copies held in the smaller collections that are in too poor or
fragile a state to scan.
■ Class 2. Items potentially within copyright due to age but of unknown status
(e.g. because the death dates of the author or identity of their inheritors are not
easily discoverable)
■ Class 3. Items known to be in copyright (based on age or the known dates of the
author)
PAGE 22
19th Century Pamphlets Online
Digitisation Scoping Study
Figure 2.3.2
Suggested Copyright Workflow
Is author known or
No discoverable through Yes
reasonable enquiry?
No
Class 2 : Might be in
copyright – library takes Yes
responsibility and liability
Class 1 : Out of
copyright –
library can send
*This is a conservative cut-off – the project will seek legal advice on this.
Note that this chart assumes a conservative cut-off publication date of 1857 (150yrs)
in case of an uncertain death date. We are aware of other projects who are using 120
years (i.e. 1887) as their cut off and will seek advice on this during the initial stages of
project planning. It is anticipated that the vast majority of pamphlets in this collection
will belong to Class 1. Many of the pamphlet records checked on Copac include dates
for their authors, which will assist in determining the pamphlet’s copyright status.
PAGE 23
19th Century Pamphlets Online
Digitisation Scoping Study
If, despite this strategy, issues arise with any of the pamphlets, the project would
seek to remove it from the digital collection.
Because the project is selecting from the two larger collections (Bristol and LSE)
there is an opportunity to de-duplicate these at the selection stage – provided there
is some means of catching duplication when the selection is being made.
There are several approaches that might be taken to de-duplication for the smaller
collections:
1. Do not de-duplicate. This may make sense given the focus on individual
collections and the level of annotations of some pamphlets. However, this must
also be balanced against the project’s aim to create as large and accessible a
selection of pamphlets as possible. Where there are a handful of personally
annotated copies in different collections, the occasional duplicate may be
permitted. But as a general policy, the approach of ignoring duplication would
lead to wastage and reduce the number of unique resources that could be
captured and delivered.
PAGE 24
19th Century Pamphlets Online
Digitisation Scoping Study
As a library prepares its collection (or selection) it searches the database and
identifies any duplicate records in other collections. It then checks the status of
these items.
1. Not checked – the default; for items that have not yet been checked or selected
from a collection
6. Scanned – pamphlet has been scanned and is awaiting signoff and return
If the librarian finds a duplicate with a status of 1-3, then the item should be sent
(unless that library’s own copy is also missing, not suitable or in poor condition). If
another copy has a status of 5-8, then the item should be returned to its place on
the shelf (or ignored within the volume) and the appropriate status recorded in the
database.
Some libraries may still wish to check their duplicate for any significant
annotations and make a case with the project team for a duplication.
PAGE 25
19th Century Pamphlets Online
Digitisation Scoping Study
useful in marking which parts of a bound volume are to be scanned. The project
would supply a record slip template and recommend that it be printed on an acid-
free or PH-neutral paper.
The diagram below shows how de-duplication might be achieved within the library
workflow using the proposed database.
Figure 2.3.3
Suggested De-duplication Workflow
Set status to
“duplicate –
Check project database
not sent” in
database
Yes
Has pamphlet already
been “sent”, “received”,
“scanned”, or
“returned”?
No
No
Check your copy of item
No
Set status to
“missing” in Is pamphlet found?
database
Yes
Set status to Yes
“not suitable” Is pamphlet too fragile to
in database send?
No
PAGE 26
19th Century Pamphlets Online
Digitisation Scoping Study
We have indicated single consignments for the smaller collections and two
consignments each for the larger collections, although some of the smaller
collections may also be broken in half or sent as two van loads. The collections we
are doing in their entirety are scheduled earlier, and those from which selections
are made (LSE and Bristol) generally occur towards the end of the schedule. This
will enable the selections to be more informed and to minimise duplication. It will
also enable the project to adjust the numbers from LSE and Bristol to ensure that
the 1 million page target is met.
It must be stressed that this is one possible scheduling for the collections. Should
the project go ahead the timetable would be negotiated with library partners.
Figure 2.3.4
A Scanning Schedule
PAGE 27
3. Digital datasets
This section describes the production of the digital datasets, focusing on the
technical standards being proposed for this project. It describes: (1) the capture of
the digital images; (2) generation of OCR text; (3) creation of metadata; and (4) the
overall production workflow and Quality Assurance (QA) activities.
This project will use the full range of book scanners available within the digitisation
laboratory at BOPCRIS:
■ Digitising Line Suprascan colour scanner – for colour capture and for grey or
bitonal capture from volumes or individual pamphlets requiring special support
(e.g. loose boards or tight binding) – pictured centre
■ Digitising Line robotic colour scanner – for sturdy bound volumes where the
bulk of the volume is being scanned and the pamphlets are of similar paper
weight and have been trimmed to fit the volume – pictured right
PAGE 28
19th Century Pamphlets Online
Digitisation Scoping Study
Image with permission from BOPCRIS Image with permission from BOPCRIS and Image with permission from BOPCRIS
model
All BOPCRIS staff are trained in handling materials appropriately and in operating
the scanners. Given the nature of the collections, only a small proportion of the
19th century pamphlets could be captured on the robotic scanner, with most
requiring a PS7000 or Suprascan scanner. This has increased the cost of digital
capture in the revised bid.
In the course of this scoping study, video- and phone-conferences were held
with JSTOR to discuss the digital capture specifications. These discussions were
informed by sample images BOPCRIS had generated from the Bristol collection or
from similar materials.
JSTOR’s standard approach is to capture pages of text as 600dpi bitonal scans and
pages with grey or colour (e.g. illustrations or photographs) as 300dpi 8-bit grey or
24-bit colour scans. JSTOR’s delivery images are downsized from the TIFF masters
and delivered as GIF images (for text) or JPEG images (where grey or colour is
present). The latter images are often created by overlaying and compositing 8-bit
or 24-bit illustrations with bitonal text.
PAGE 29
19th Century Pamphlets Online
Digitisation Scoping Study
The table below indicates the digital image standards adopted for this project.
Table 3.1
Image specifications
There was some discussion during the scoping study about where it was best
to undertake the OCR work: BOPCRIS or JSTOR. JSTOR generally requires its
vendors to use PrimeOCR18 while BOPCRIS have Abbyy19 software built into its
Agora production system20. Pages from some of the sample pamphlets were
OCRed and the data was shared with JSTOR. This data was in the .idx file format
Agora generates using Abbyy, which includes word coordinates (not currently used
by JSTOR). In addition to the standard Abbyy 8.0 software, an ‘Old English’ add-
on was trialed during the scoping study. This specialist software is designed to
capture older typefaces.
It was agreed that the OCR was best done by BOPCRIS as a part of its production
workflow, using Abbyy and Abbyy Old English where necessary (selectively used,
because its high license cost is based on the number of pages OCRed). JSTOR
would be supplied with both plain text (.txt) and the coordinated text (.idx). JSTOR
would use the text files initially, but be likely to use the idx if it moved to a delivery
system that highlights words from the text (it anticipates doing so in the future).
Several tests were made to determine the level of accuracy achievable from the
sample pamphlets. The tests suggested that a high level was possible for much
of the material, with even difficult texts achieving up to 99.9% character accuracy.
Further tests would be done were the project to proceed, but these initial tests
suggested an average character accuracy of 97-98% for the 600dpi bitonal images
and 300dpi grey/colour images specified for this project. For this project and
material JSTOR would apply an average accuracy level rather than a minimum
acceptance level and were happy with the averages BOPCRIS were obtaining. While
rescanning or re-OCRing may occasionally be necessary, the project does not
18 See at www.primerecognition.com envisage having to re-key any data.
19 See at www.abbyy.com
20 See at www.agora.de/eng/index.html
PAGE 30
19th Century Pamphlets Online
Digitisation Scoping Study
Currently any problems with the OCR are picked up visually and the accuracy levels
determined manually by comparing the OCR output with the original page or its
digital image. BOPCRIS hope to introduce accuracy monitoring software into their
automated workflow at an early stage of this project. This software would make it
easier to detect any issues and change the variables in order to optimise the OCR.
The following tables show examples of OCR drawn from an early 19th century
satirical pamphlet. The first two show the same text scanned at 600dpi and 300dpi
(the later would usually be used where illustrations are present). Both have
OCRed well for this typical text. The third table (3.2:C) shows the title page for this
pamphlet and illustrates the impact of complex fonts. These usually occur on title
pages, where they will be compensated by the presence of bibliographic metadata.
Table 3.2:A
600dpi bitonal using Abbyy 8.0 – 97.6% character accuracy
Table 3.2:B
300dpi greyscale using Abbyy 8.0 – 98.7% character accuracy
PAGE 31
19th Century Pamphlets Online
Digitisation Scoping Study
Table 3.2:C
300dpi greyscale using Abbyy 8.0 – 75.8% character accuracy
3.3 Metadata
BOPCRIS has a German production system called Agora21, which uses its own
proprietary XML (Extensible Markup Language) metadata format. This metadata
would be customised to ensure that all the necessary information is incorporated.
Once complete, it would be exported and transformed (via software routines)
into standards-compliant XML. This standard metadata would be delivered to
JSTOR for archiving and delivery to libraries or JISC (on request) and for further
transformation into JSTOR’s own metadata standard for delivery to end users.
The table below details the metadata standards adopted for this project.
21 See at www.agora.de/eng/index.html
PAGE 32
19th Century Pamphlets Online
Digitisation Scoping Study
Table 3.3
Metadata specifications
22 See at www.loc.gov/standards/mods
23 See at www.loc.gov/standards/
marcxml
24 See at www.loc.gov/standards/mix
25 See at www.niso.org/standards/
resources/Z39_87_trial_use.pdf
26 See at www.loc.gov/standards/
premis
27 See at www.loc.gov/standards/mets
28 See at http://dtd.nlm.nih.gov/tag-
library/2.1/index.html
PAGE 33
19th Century Pamphlets Online
Digitisation Scoping Study
Quality Assurance (QA) is a key part of any digitisation workflow (see TASI’s QA
documentation29). For this project, BOPCRIS would undertake QA at several stages
during the production phase, and then further QA would be done by JSTOR when
they receive the dataset.
1. Images are logged onto the Agora production system and passed on to
Scanning Operators.
2. Scanning Operators scan each page, checking as they go (the first QA, for
images). Images are rescanned as necessary. Once complete, the set of files
are passed on to Indexers.
3. Indexers check that all the pages are present and that the images are of good
quality (the second QA, for images). If there are any issues they request a
rescan from a Scanning Operator.
4. Indexers initiate the XML generation, which incorporates the data necessary for
later export and transformation into the standards described in 3.3.
5. Indexers identify any non-English language or old English fonts and flag these
in the production system so that the appropriate software settings are triggered
Scanning Operator. Image with
when the images are OCRed. The dataset then enters the automatic OCR
permission of BOPCRIS & model.
workflow.
6. The production system picks up the images and metadata, OCRs the images,
and generates associated .idx and .txt files.
8. JSTOR do further QA on the images and OCR (the fourth QA, for images and
OCR), checking an average of 10% of images and OCR files. JSTOR would liase
with BOPCRIS to address any issues. Because some rescanning might be
necessary, collections would be held at Southampton until signed off by JSTOR.
Figure 4.1:A in the next section of this report includes a diagrammatic version of
this workflow.
29 See at www.tasi.ac.uk/advice/
creating/quality.html
PAGE 34
4. Project workflow
This final section describes the overall workflow for the project, bringing together
several elements from previous sections of this report. It includes: a project
workflow diagram (4.1); a work plan, showing the project’s main activities and work
packages (WP) on a Gannt chart (4.2); and a detailed description of the work to be
undertaken within in each work package (4.3).
4.1 Workflow
The following diagram illustrates the overall workflow proposed for this project,
from pamphlet selection to discovery by users. The major work packages (WP4-8)
are also indicated.
Figure 4.1
Project Workflow
Pamphlet is selected
and checked on the
project database
Move to
next item See
copyright
Is pamphlet de-selected workflow in
due to copyright?
2.3.2
Work Package 4 - Libraries
Yes
No
Yes Move to
Is pamphlet de- next item
duplicated?
No
See de-
duplication
workflow in Prepare pamphlet for Pamphlet is
2.3.3 transport to BOPCRIS returned to
store
Continues
over page
PAGE 35
19th Century Pamphlets Online
Digitisation Scoping Study
Continued from
previous page
Pamphlet is received by
BOPCRIS and logged
onto production system
Scanning:
See 3.4 for description of this workflow
QA
Work Package 5 - BOBCRIS
QA Indexing:
Image QA, metadata
generation, OCR
initiation
Pamphlet
is signed
off
QA OCR:
Automatic generation
and QA of OCR text
Dataset is transferred to
JSTOR via hard disk or
FTP
QA
and
available
to libraries
or JISC Dataset approved
Continues over
PAGE 36
19th Century Pamphlets Online
Digitisation Scoping Study
Continued from
previous page
Digital pamphlet
available within
user
JSTOR collection search
The Gannt chart on the next page outlines the likely timing of the main activities
and work packages (WP) proposed for this project. This timetable would be
confirmed in the Project Plan to be prepared under WP2. Section 4.3 describes
each work package in detail.
PAGE 37
19th Century Pamphlets Online
Digitisation Scoping Study
Figure 4.2
Proposed Work Plan
Should the bid succeed, some work will be initiated immediately: (a) staffing
secondments and recruitment would be organised; (b) the project management
groups (described in WP2) would be constituted; and (c) library partners would
PAGE 38
19th Century Pamphlets Online
Digitisation Scoping Study
be asked to identify materials that may be in copyright and begin any clearance
required (see 2.3.2 above for a workflow to identify copyright issues).
This scoping study has provided much ground work for the project and would help
inform a more complete and detailed Project Plan. Other early documentation
would include Memoranda of Understanding (MoUs) between partners and a
project website.
The project would conform to the JISC’s Project Management Guidelines30 and
closely follow the PRINCE2 methodology31. The core Project Team would be
comprised of: a Project Manager (0.5 – to be seconded from TASI at the University
of Bristol); a Technical Project Manager (0.5 – the current manager of BOPCRIS);
and a Project Officer (1.0 – also from BOPCRIS). The Project Manager would take
responsibility for the overall project monitoring, risk management, reporting,
liaison and dissemination. The Technical Project Manager would be responsible
for managing the production processes, quality assurance, and liaison with other
partners over technical standards. The Project Officer would receive the pamphlets
and track them through the BOPCRIS production system. They would also maintain
the database described in section 2.3.3 above, which is used to aid the selection
and de-duplication of pamphlets. A Software Developer (0.5) would also be
employed to undertake WP3 development and provide support for other packages.
This core team would be supported by two groups: (a) the Project Management
Group, which would include the Project Director (chair), project managers, and
representation from among the partners and JISC, and would meet regularly and
as required to oversee the project and manage any exceptional circumstances; and
(b) the Project Steering Group, which would offer a strategic oversight. The Steering
Group would be chaired externally and be comprised of the Project Director and
managers, representation from the partners, JISC, research councils, and senior
members of the research, teaching, and information communities. The Steering
Group would meet at the beginning, middle, and near the end of the project. It
30 See at www.jisc.ac.uk/proj_
manguide.html would provide advice and contribute to maintaining a high level of visibility for the
31 See at www.ogc.gov.uk/methods_ project within the UK and internationally.
prince_2.asp
PAGE 39
19th Century Pamphlets Online
Digitisation Scoping Study
WP3 Development
Timeframe: Objective: To make adjustments to
Jan 2007-Aug 2007 existing systems or develop new
systems to support the major work
packages: i.e. WP4-8.
Lead: Main outputs: adjustments to the
Various BOPCRIS production system, JSTOR
delivery system, and Copac database;
project database for libraries.
The selection and flow of materials from seven libraries to Southampton requires
careful management. This study has explored the issues by assessing the volume
and condition of the pamphlets (both will affect scanning time) and the extent
of duplication across the primary partner collections. It also discussed with
libraries any issues that would affect the scheduling of collections (e.g. whether
the collections should go at once or in batches). As a result of these investigations
we have presented a possible schedule in section 2.3.4 above. The final timetable
would be agreed with libraries at the beginning of the project.
PAGE 40
19th Century Pamphlets Online
Digitisation Scoping Study
WP5 Production
Timeframe: Objective: To create high-quality
Mar 2007-Nov 2008 digital images, metadata and OCR.
Lead: Main outputs: Datasets comprising
BOPCRIS standards-compliant images,
metadata and OCR text.
WP6 Delivery
Timeframe: Objective: To effectively deliver the
Apr 2007-Feb 2009 collection to users.
Lead: Main outputs: Online collection of ca.
BOPCRIS 23,000 digital pamphlets.
Once the BOPCRIS datasets are received JSTOR would apply their own quality
assurance processes on the image and OCR files, checking an average of 10% of
each. They would work closely with BOPCRIS to address any issues discovered.
Once approved, an archival dataset would be preserved (WP7) and the data
transformed (images resized and metadata mapped) to create a delivery dataset
for incorporation into JSTOR’s systems. JSTOR would also generate URLs and
Document Object Identifiers (DOIs), passing these with the corresponding library
and record IDs to MIMAS for incorporation into Copac and distribution to libraries
(WP8). In addition, JSTOR would take responsibility for facilitating pathways into
the collection through its linking arrangements with organisations such as Google,
PAGE 41
19th Century Pamphlets Online
Digitisation Scoping Study
the History Cooperative32 and RePEC33, and its participation in CrossRef34. The full
OCR text of the pamphlets would be exposed to Google’s indexing spider, enabling
pamphlets to be found via a standard Google Web search.
WP7 Preservation
Timeframe: Objective: To ensure the long-
Apr 2007- term preservation (including
future upgrades and migrations as
technology changes) and accessibility
of the material.
Lead: Main outputs: Archive of image,
JSTOR metadata and OCR datasets.
JSTOR would receive from BOPCRIS a very rich dataset, including large archival
images and standards-compliant metadata, with accompanying full text. This
dataset would be archived by JSTOR and made available to contributing libraries or
the JISC upon request.
WP8 Linking
Timeframe: Objective: To achieve linking from
Apr 2007- Copac, from collection descriptions
on the RSLP Project website, and
from the individual OPACs of libraries
holding the pamphlets.
Lead: Main outputs: Hyperlinked records.
MIMAS
MIMAS would take the URLs and DOIs supplied by JSTOR and use them to update
the CURL and Copac databases and make available links or duplicate records to
partner libraries so they can update their own catalogues. MIMAS would develop
software to generate these additional records and also (as far as is possible)
identify all records in the database describing the same item so that all relevant
libraries could be informed. Any library participating in CURL/Copac would be able
to download a record or link to items held in the pamphlet collection. Collection-
based searching would also be provided by MIMAS via the Guide to 19th Century
Pamphlets hosted on the CURL website35. This would enable users to limit their
search to a particular collection or, if they prefer, to just the digitised content of
that collection.
PAGE 42
19th Century Pamphlets Online
Digitisation Scoping Study
The Project Manager would take responsibility for this work package, creating
or commissioning webpages and publicity materials, writing papers, making
presentations, and coordinating an event in the summer of 2008. This event
would be likely to take the form of a one-day seminar and to include a formal
launch of the collection (which by this stage would include a significant amount of
content). The Project Manager would also (a) prepare a online ‘toolkit’ or suite of
resources to assist with the future selection, digitisation, delivery and preservation
of pamphlet literature, and (b) commission a Research Officer to create a set of
e-learning resources to encourage the use of the resource within teaching. These
e-learning resources would include additional web content for the Guide to 19th
Century Pamphlets and a sample learning package for deposit within the JORUM
repository.
WP10 Evaluation
Timeframe: Objective: To commission an external
Oct -Dec 2008 evaluation study whose assessment
and recommendations will be
incorporated in the Final Report.
Lead: Main outputs: Evaluation Report.
Southampton
PAGE 43
5. Conclusions and
recommendations
This whole report might be regarded as a set of recommendations for the
proposed project (especially sections 2.3, 3 and 4), so this final section will not
reiterate everything previously presented. However some general conclusions and
recommendations can be usefully made here:
■ The scoping study has confirmed that the proposed project is viable, could be
managed within the allotted timescale (2 years, from January 2007-December
2008), and would be extensible.
■ Although the study has not addressed every single aspect of the project, it has
covered much ground and provided a good base from which to build a full and
detailed Project Plan.
■ All of the deliverables promised in the scoping study plan (see 1.1.3) have
been achieved and are included within this report, along with much additional
information.
■ Some information gathered in the process of the study has not been included
in this report for reasons of confidentiality (e.g. costs and contractual
agreements). Much of this information has been provided in the revised bid
and it will be important in preparing the detailed Project Plan and developing
Memoranda of Understanding should the project go ahead.
■ In addition to providing much information, the scoping study has been important
in establishing relationships between partners (e.g. BOPCRIS and JSTOR),
which would stand the project in good stead if it proceeds.
■ The proposed project has many complexities and the scoping study activities
have proved valuable in addressing many of these (e.g. image capture and OCR
standards, approach to de-duplication). It is recommended that that scoping
studies such as this are undertaken for all large digitisation projects of this
nature, especially where multiple partners are involved.
■ While the study has been able to provide a better base to certain assumptions
(e.g. page averages), some information can only be fully established in
practice. If the project proceeds, it would be able to test some of the findings
and assumptions made here, and to evaluate the chosen methodologies. It
PAGE 44
19th Century Pamphlets Online
Digitisation Scoping Study
■ Although the scoping study and this report have focused on a particular project,
some of its findings and approaches are expected to be of wider interest and
value. For example, we now have a clearer understanding of the condition of
19th century pamphlet collections – and of the challenges involved in digitising
them. It is recommended that this report should be made available to others
undertaking similar work.
PAGE 45
Appendix A – Survey Sheet
The table below reproduces the printed survey sheet used in the collection
assessment surveys described in section 2.1 above. An Excel spreadsheet was also
provided and used in the submission and analysis of results.
Sample number 1 2 3 4
Unique identifier for item
1. Would you happily send Tick if would send Tick if would send Tick if would send Tick if would send
item as is for scanning?
2. Where is item located:
Open shelves Open shelves Open shelves Open shelves Open shelves
On-site store On-site store On-site store On-site store On-site store
On-site archival On-site archival On-site archival On-site archival On-site archival
Off-site store Off-site store Off-site store Off-site store Off-site store
Off-site archival Off-site archival Off-site archival Off-site archival Off-site archival
Other (specify)
3. Is it in bound volume or Tick if in volume Tick if in volume Tick if in volume Tick if in volume
separate?
4. If bound, does volume Tick if vol. opens Tick if vol. opens Tick if vol. opens Tick if vol. opens
open flat (180 degrees)? flat flat flat flat
5. If bound, are there loose Tick if loose boards Tick if loose boards Tick if loose boards Tick if loose boards
boards?
6. If separate, is there Tick if margin Tick if margin Tick if margin Tick if margin
stitching in margin? stitching stitching stitching stitching
7. Are there loose pages? Tick if loose pages Tick if loose pages Tick if loose pages Tick if loose pages
8. Are there any obviously Tick if missing Tick if missing Tick if missing Tick if missing
missing pages? pages pages pages pages
PAGE 46
19th Century Pamphlets Online
Digitisation Scoping Study
14. Are there adverts? Tick if adverts Tick if adverts Tick if adverts Tick if adverts
15. Are there annotations? Tick if annotations Tick if annotations Tick if annotations Tick if annotations
16. Is a Gothic or unusual Tick if Gothic etc Tick if Gothic etc Tick if Gothic etc Tick if Gothic etc
typeface the main body text?
17. Do you have multiple Tick if multiples Tick if multiples Tick if multiples Tick if multiples
copies?
18. Which page size is Circle: A5, A4, A3 Circle: A5, A4, A3 Circle: A5, A4, A3 Circle: A5, A4, A3
closest?
19. Does another library
partner have a copy:
Bristol Bristol Bristol Bristol Bristol
PAGE 47
Appendix B – Glossary
BOPCRIS British Official Publications Collaborative Reader Information Service
(www.bopcris.ac.uk)
FE Further Education
HE Higher Education
PAGE 48
19th Century Pamphlets Online
Digitisation Scoping Study
QA Quality Assurance
WP Work Package
AHDS provides services to aid the creation, use and preservation of digital collections in the arts and humanities and is partially funded by JISC.
www.ahds.ac.uk
BUFVC – The British Universities Film and Video Council promotes the use of moving image and audio resources and provides very useful information
regarding digitisation from these formats. Partially funded by JISC.
www.bufvc.ac.uk
Digital Curation Centre provide a national focus for research into curation issues and to promote expertise and good practice, both national and
international, for the management of all research outputs in digital format. Partially funded by JISC. www.dcc.ac.uk/index.html
Digital Preservation Coalition fosters joint action on preservation of digital resources in the UK to secure our global digital memory and knowledge
base. www.dpconline.org
HEDS Digitisation Services provides papers and advice on planning and costing digitisation projects along with a complete digitisation service.
www.heds-digital.com
The JISC Legal Information Service provides legal resources for further and higher education and their website is an excellent starting point for
information on copyright and intellectual property and is funded by JISC.
www.jisclegal.ac.uk/ipr/IntellectualProperty.htm
TASI provides advice and guidance on the management of digitisation projects and creating, delivering and using digital images across all subject
areas and is funded by JISC. www.tasi.ac.uk
TechDis provides an advice and information resource via extensive web-based databases and an email helpdesk. These resources should be the first
port of call for anyone in education who has a question relating to disability and technology. Funded by JISC. www.techdis.ac.uk
UKOLN provides information and standards on how resources can interoperate and automated tools for testing website accessibility. Partially funded
by JISC. www.ukoln.ac.uk
PAGE 49
19th Century Pamphlets Online: Digitisation Scoping Study