Content Digitization
Content Digitization
CONTENT DIGITIZATION
1.0 INTRODUCTION
As the first Unit of the course we are going to learn what Digitization is
and other things related to digitization. We would see what the various
advantages and disadvantages of digitization. You will need to know about the
proper way to store the data while doing digitization. You will find the detailed
The Digitization
1
descriptions about Compression. We would see what the various file formats,
data models and compression software.
Digitization is the conversion of non-digital data to digital format. In
digitization, information is arranged into units. These units are called bits.
Digitizing information makes it easier to preserve, access, and share. Text and
images can be digitized similarly. For example a scanner is used to digitize
texts and image. Scanner converts this text or image into an image files, like
a bitmap. Optical character recognition (OCR) software detects a text image for
light and dark areas in order to identify each alphabetic letter or numeric digit,
and converts each character into an ASCIIcode.
The Digitization
2
Another disadvantage of digitization is it required expert staff and other
additional resources are often the increase costs in digitization projects.
Digitization is that only a part of the analog object can be represented in the
digital format. Thus it can never completely capture the original.
Digitization policies vary between different organizations. During material
selection, materials which have a huge demand benefit from the accessibility
which digitization offers. Material which has been used lithe may be stored on
magnetic tapes, while the improved accessibility of disk storage may be
selected for high demand and strategic online material; the method of storage
of digital information depends on the type of retrieval method used.
1.2.3 Storage for digitization
It is important to choose the right physical medium for storage of
information. It should be maintained under stable environment conditions so
that there is no failure during storage. One way to store data is storing similar
material in multiple locations and regular backup provide protection against
loss due to media failure or human error or both. Using less changeable
standard format will enable to maintaining data and later it will help in
changing data when required.
Magnetic tapes are often used to stored less material, but sometime the
improved accessibility of disk storage may be selected for high demand and
strategic online material. The storing method is also depend s on the type of
retrieval method used. Thus on the basis of the retrieval method, digital
information can be stored online, near-line or offline.
Another storage method called lossy compression is not a suitable
method for digital data storage for long period of time. This is because in lossy
compression the data may be lost forever during decompression or during
migration. During when the image is compressed and then decompressed, the
result is often different from the original.ie decompressed image is often
different from the original scanned image. This is known as lossy compression.
The Digitization
3
Check your progress-1
What is digitization?
How digitization is useful?
What are the two compression methods?
What is lossy compression?
1.4 COMPRESSION
Compression is a process of decreasing the number of bits which are
useful to represent data. Compressing data technique is useful as it can
save storage capacity, speed file transfer, and decrease costs for storage
hardware and network bandwidth. For example text compression can be as
simple as removing all unwanted characters, inserting a single repeat character
to indicate a string of repeated characters. Compression can reduce a text file to
50% or a significantly higher percentage of its original size. Compressed data
can be understood only if the receiver is knowledgeable about the decoding
method. Thus, compressed data communication will be beneficial only if both
the sender and the recipient of the information are knowledge about the method
of encoding adopted.
Data compression means bit rate reduction. Data compression means
encoding information using fewer bits than the original data. Thus,compression
is the process of decresing the size of a data file. Compression can be lossy or
losseless.
1.4.1Advantages of compression
• Time-Consuming
Compressed data can be transfer faster to and from the disk. File
compression is a mathematically intense operation. Compressing hundreds of
megabytes of files can take the less than an hour, depending on your
computer’s speed. It is easier to compress small files. Large files take more
time for compression and decompression.
• Disk Space Savings
Data compression increase disk bandwidth. Compress is useful because
it reduce resource usage also .If you have many small files on your hard drive,
it is advisable to compress them into one or more files having a smaller size
than the originals.
• Easy Downloads
Compressing is useful if you need to send several files as an email
attachment. It is easier and convenient to compressing many files into a single
file and attaching them all to the email.
1.4.2 Disadvantage of compression
The Digitization
4
• Data Compression can only be used if both the transmitting and
receiving modems support the same compression procedure.
• Data damage can occur while decompressing the compressed
data.
• A compression mechanism for video required expensive
hardware for the video to be decompressed.
The Digitization
5
Image compression is different from compressing raw binary data. The
general purpose compression programs are used to compress images, but result
can be different than original. JPEG is an image file format that supports lossy
image compression. Formats such as GIF and PNG use lossless compression.
Image compression is reducing the size in bytes of a graphics file
without degrading the quality of the image. It also reduces the time required for
images to be sent over the Internet or downloaded from Web pages. There are
many different ways in which image files can be compressed. Most of
compression techniques are,
• JPEG
• GIF
• PNG
The jpg format is good with photographs. But it is not so good with
high contrast pictures like screenshots or computer art. It is not possible for
human eye to detect small changes in colors. This format recompresses each
time it is saved and repeated saving may lose quality. You should always work
with uncompressed formats before saving in the target format.
The Graphics Interchange Format is based on limiting the colors used
in the image. Typically up to 256colors are used to make the palette - a table
assigning up to 256 colors to the number 0 - 255. The pixel data for the image
is then stored using the 8 bit number that represents the color’s position in the
table.
Portable Network Graphics uses lossless data compression. PNG is an
open-source format that was created to improve upon the GIF. GIF was not
open source and there were licensing costs to developers using the format. This
was also a motivating factor in the uptake of the PNG.
(b)Text
Text is a very big part of most files that digital technology users create.
There are many advantages of compressing a Microsoft World file. Therefore
being able to compress text for storage or transmission is extremely important.
It is advisable to use lossless method for text compression. It means no data
loss during compresses the data.
The Digitization
6
You can use zip file to compress text. To create a .zip file, right click a file,
such as srajan.doc,and then click a command that appears on the shortcut
menu, like add or zip file ;srajan.zip is created.
Some techniques used by general purpose compressors such as zip, gzip,
bzip2, 7zip, etc, and some types of models of text compression are,
• Static
• Semi adaptive or Semi static
• Adaptive
A static model is a fixed model that is known by both the compressor and
the de compressor and does not depend on the data that is being compressed.
A semi adaptive or semi static model is a fixed model that is constructed from
the data to be compressed. An adaptive model changes during the compression.
(c)Audio
Audio compression the amount of data in are corded waveform is reduced
for transmission. This is used in CD and MP3 encoding, internet radio, and the
like. Audio level compression is the dynamic difference between loud and
quiet of an audio waveform is reduced. Audio compression is a form of data
compression designed to reduce the size of audio files. Let us look some of the
steps of compressing audio files.
• In order to compress a .WAV files, you simply have to load the same
into a sound recorder. Chose ‘File’ and ‘Properties’. Now you will see a
button labeled ‘convert now’. Upon clicking on it, you will get the
option of changing the format and attribute of the sound.
• Format: this is the compression scheme that is used on the sound file.
Each compression scheme acts on the sound files in a different manner.
• Attributes: you will choose the sound’s frequency range-larger the
range, better the sound quality. You will choose the number of bits that
comprise each sound section-more the bits, higher the quality.
• After having chosen the compression method, choose ‘OK’ after which
you will get back to the main playing window. Play the sound. If the
quality is good and the compression satisfactory, save the file.
The Digitization
7
Fig 1.2: Audio compression wave
(d)Video
To represent video images, process of reducing quantity of data is
called video compression images. Video compression is generally lossy
.Compressed video can effectively reduce the bandwidth required to transmit
digital video via cable, or via satellite services. It is the process of converting
digital video into a format that takes up less storage space or transmission
bandwidth. One of the big advantages of digital video is that it can be
compressed for reduced bandwidth applications including transmission over
satellite, cable TV and Internet-based networks.
Compressed video
is particularly useful for reducing storage requirementsespecially in the
broadcast and government markets. Now, let’s look at some steps that are
required in compressing video.
• The most extension of a video file is MPEG (Moving Picture Experts
Group) or AVI (audio video interleave) etc. These extensions can be
seen along with a filename that is separated by a dot (.).for example
srajan.avi or srajan.mpg
• Select the video file that you want to compress. Here we are taking file
name ‘Srajan’. When highlighted, the file will show an extension, If it
doesn’t then go to ‘tools’ menu, click on the ‘folder option’, where you
can see few tabs. Here you have to select ‘view’ and unchecked the
option ‘hide extensions for known files’.
• Now, you can change the extension you want to zip as txt. For example,
if the file is ‘Srajan.mpeg’ change the extension to txt, now the file is
‘Srajan.txt’.
• After that use the zipper software to compress the file after which you
will see a noticeable decrease in the file size.
• To play the file,you have to extract it and change the extension to
‘mpeg’.
The Digitization
8
rates up to 1.5Mbit/sec. This is a popular standard for streaming videos
as .mpg files over the internet.
• DV is a high-resolution digital video format which employs lossy
compression where certain redundant information in a file is
permanently deleted, so that even when the file is uncompressed, only a
part of the original information is still there.
• DivX Compression is an application that employs MPEG-4
compression standards to facilitate fast downloads over DSL/cable
modem, without compromising on video quality. The most popular
compression format for videos on the internet is .flv or
• Flash Video. FLV and F4V are two formats used to play videos on the
internet using Adobe Flash Player.This format helps to compresses
video to low bitrates on the web and also maintains its quality because
of this it is very popular for embedded video.
2. WinRAR
3. IZArc
IZArc is a full featured archiving tool, compatible with windows that you
can use to open and create compressed files in
several formats. It offers many features, such
as repairing of broken archives, searching with
archives, emailing of archives, password
protection and much more. It is very adjustable
with a many number of archive formats. You
can very easily compress and decompress files
with IZArc. However, IZArc cannot create GZ
archives.
Fig 1.6: IZArc Format
The Digitization
10
1.5 PATHWAYS
For better digitization it’s always better to represent data as close to the
original as possible for best result during digitization. The amount of
differences between the original and the digitized form, directly affects the
number of errors in the file. Depending on the nature of the source this could
mean capturing directly using, for example, a flatbed scanner to digitize a text
document, a digital camera to capture an object, or a digital camcorder or audio
recorder to capture moving images or sound. When depiction to digitization it
is thought best to capture as close a performance of the analog object as
possible. It may be that the source is one step removed from the original in,
say, capturing a slide or photograph of an object, or digitizing an interview
stored on analogue audio tape.
Taking decision regarding about the method of capture is very much
self evident and easy to make and also based on the type of source material
being digitized, the equipment and staff skills available, and the budget
allocated, for both equipment and staff time. For example, if we want to
digitize slides of 35mm, then a slide scanner is probably the best solution.
Same as if want to scan a series of flat documents, than an A4 flat bed scanner
would be a good choice. Of course, decisions are not always as straightforward
as this. Making the decision of capture is very much a project decision. For this
decision there are some aspects effects like what type of source material being
digitized? The equipment and staff skills available, and also about the budget
allocated for both equipment and staff.
Naturally, decisions are not always straight forward. For example, in
some historical museum, you want to capture different things like some flat
pantings, some three dimensional objects, some written documents, there are
several ways to go about digitization the collection.
The Digitization
11
1) Text Transcription
2) OCR
Digitization can occur due to any of the following four methods adopted by
OCR software:
Neural Network: In these networks, each character is compared with
characters the software has been trained recognize. They therefore
evolve and grow over time. Each character has a confidence level, and
this is better with texts of poor quality.
Feature Network: In this method, characters are identifies on the basis
of their shape, high quality prints benefits from this method.
Pattern recognition: in this method, documents with a uniform typeface
are recognized based on the pre-recorded images in a database.
Structural analysis: This method involves analyzing the structure of
each character along with the number of horizontal and vertical lines.
These method suites for text which are of poor quality.
Most images you see on your computer screen are raster graphics.
Raster images are made up of pixels commonly referred to as a bitmap.
Pixel is a packet of color. Each pixel stores information about the color
The Digitization
12
of an image. For RGB image, there are commonly 8 bits per channel of
red, green and blue, respectively. , making it a 24- bit image. A
grayscale image is made up of 8 bits, going from white to black through
shades of grey.
2) Vector image:
Vector images are different from raster image. Raster images are made by
pixels where a vector image is made up of lines and dots. It is also called path.
Each path contain a mathematical formula, that guide the path how it is shaped
and which color should be used to borders or to fill in the shape. Vector
graphics are comprised of paths, which are defined by a start and end point,
along with other points, curves, and angles along the way. A path might be any
shape like a line, a square, a
triangle, or a curvy shape. These
paths can be used to create simple
drawings or complex diagrams.
Vector graphics finds application
most often in virtual reality and 3-d
modeling, as well as in
macromedia Flash applications.
The Digitization
13
Animation images are also usually created as vector files. The advantage of
this is that vector images can be magnified to any extent without compressing
the picture quality. In other words, there will be no pixilation. Other fields like
architecture, cartography, and computer –aided design (CAD) also use vector
graphics.
3) Resolution:
The Digitization
14
Sampling refers to the process of converting a signal from analog to
digital sound. Sound is a continuous wave that travels through the air. The
wave is made up of pressure differences. Sound is detected by measuring the
pressure level at a location. Sound waves have normal wave properties like
reflection, refraction; diffraction etc.The frequency of this sample is measured
in Hertz. The range of each sample is measured in bits. The minimum sampling
rate for lossless digitization is 36 kHz and the highest frequency for most
computers is 44.1 kHz. In terms of bit rate 16bits per sample is considered
good enough that gives an overall bit rate of 192 kb/s.
MP3 is the most common compressed format. A sample rate of 44.1 kHz
and bit rate of 192 kbps or higher is recommended to preserve quality.
Specialist codecs are used to compress audio format.
2) Moving Image
Usually accompanied with audio data played in tandem, a digital video file
is a sequence of still images played in rapid succession. When played at a set
rate, the image sequence creates the illusion of a moving object.
The Digitization
15
supports digital rights management, making it a popular format for
computer viewing. Apple's iPod players use a version of MPEG-4.
The Digitization
16
II. What is the purpose of the resources?
III. What are the intended user’s expectations and experience while using
resources of a similar nature?
1.7.2 List
1.7.3 Hierarchy
1.7.4 Sets
Sets are an effective method of
storage particularly for objects that have
clear relationship with one another. One
popular example of the set of data model
is the relational database, where one
object can have numerous related
object.e.g.relational databases that contain
ID may have one main ID with more
related information showing different
views. The main Id information is entered
only once.Thus, relational database avoid
Fig 1.9: Relational Database
The Digitization
17
unnecessary duplication of the same information in a database.
1.7.5 Geography/geometry
Modern and historic maps are plotted together with observations taken in
the field or digitization from another sources.GIS (geographic information
system) is a common model for storing such data combines many of the
features of relational database with image processing tools. It uses geography
as a primary key for data. For example, if you ask question such as ‘show me
all the information within 1 km of where I am’-tasks would be impossible or
very time-consuming. This is possible when the different sets share a common
vocabulary of coordinates.
The Digitization
18
What is data modeling?
1.9 SUMMARY
Digitization is the process of converting information into
a digital format.
Digitization can be stored and delivered in a variety of ways; and can
be copied limitless times without degradation of the original. Digital
data can be compressed for storage, meaning that enormous amounts of
analogue content can be stored on a computer drive, or on a CD-ROM.
Digital content can be browsed easily, and can be searched, indexed or
collated instantly.
lossless compression, every single bit of data that was originally in the
file remains after the file is uncompressed.
lossy compression reduces a file by permanently eliminating certain
information, especially redundant information.
The Digitization
19
10) What factors should be considered while choosing a data model?
The Digitization
20
Check your progress -4:
BIBLIOGRAPHY
Korn, D.G, and K.P Vo. 1995.Vdelts: Differencing and Compression,
Practical Reusable Unix Software. Edited by B. Krishnamurthy. John Wiley &
Sons
The Digitization
21
UNIT 2 CAPTURING VIDEO IN MOVIE
MAKER 2
Program Name: BSc (MGA)
Written by: Mrs.Shailaja M. Pimputkar, Srajan
Structure:
2.0 Introduction
2.1 Unit Objectives
2.2 Choosing the Format
2.2.1 DV-AVI format
2.2.2 Windows Media Video 9
2.3 Improving Capture Performance in Movie Maker
2.3.1 Defragmenting Your Hard Drives
2.3.2 Install a Faster Hard Drive
2.3.3 Partition Your Drive as NFTS
2.3.4 Get a second hard drive
2.3.5 Use the Windows Media Codec
2.3.6 Turn Your Preview Monitor Off
2.3.7 Decrease Your Monitor Display Settings
2.4 Project Files in Movie Maker
2.5 Editing within Moviemaker 2
2.6 Summary
2.7 Key Terms
2.8 End Questions
.
2.0 INTRODUCTION
Earlier video capturing was considered to be very difficult job as it
involved number of problems, such as system crashes, roped frames and
hardware issue. To capture the video and then transfer it from digital
camcorder required lot of hard work and experience. In this unit of the course
we are going to learn about Movie Maker 2.Microsoft developed Windows
Movie Maker2 is video editing software. It has many features such as effects,
transitions, titles, audio track, and timeline.
The vision of Windows Media 9 Series is to deliver compressed digital
media content to any device over any network. Windows Movie Maker also
known as Windows Live Movie Maker in Windows 7.It is a video editing
software by Microsoft. It is a part of Windows Essential software suite.
Windows Media 9 Series provides many audio and video codec for different
applications.
The Digitization
22
After studying this unit you will be able to
Choose the format to use
Explain how to improve capture performance in Movie maker
Explain the process of saving project files in Movie maker
Explain how to edit in Movie Maker2
The Digitization
23
gigabytes of hard drive space. The format is huge that many old computers face
problem while capturing and saving video. Whenever the computer’s hard
drive slows down below a critical level, this signifies that it has leads to
‘dropped frames’. It is very commonly happened that most advanced video
users have back up for the work using spare hard drive to save their video
projects.
The Digitization
24
2.3 IMPROVING CAPTURE PERFORMANCE IN
MOVIE MAKER
As we come to know in early sections that videos captured from camcorder
has large in size and its difficult to save it on hard drive. Transferring video
from a digital camcorder and capturing it onto your hard-drive is a very
difficult task and somewhat frustrating also. It is not possible for every system
to handle the confirmed speed needed to transfer your video or movie onto a
hard drive.
For this problem there is a solution. If you have installed Windows XP on
your computer then you can easily run Movie Maker 2 on your computer. Even
if your computer has slow processor speed, Windows XP are powerful enough
to capture.
However, if you get into trouble while capturing, there are several helpful
ways to seep up your system.
A hard drive is really a circular platter. For example like CD, data is
written onto this platter in a circular pattern, and each hard drive platter can
only hold a predetermined amount of data. Throughout the disk, any single file
may be broken up into many little sectors. All the broken fragments are placed
together when you defragment your drives. Thus, your hard drive gets a large
"physical area" of available space to write your video.
The Digitization
25
final video frames will be out of synchronization and these dropped video
frames are risky.
Most hard drives run fast, some of them run slow. Different hard drives
have different time for slowdown. To avoid hard drive temporary problem you
have to close down any background programs, and empower the system with
faster 7200 rpm hard drive.
The Digitization
26
2.4 PROJECT FILES IN MOVIE MAKER
Project files
The project file is a ‘linking file’ that keeps track of every item in your
home movie. This includes every Picture, music song, voice track and video
clip. The project file knows how they are laid out on the movie timeline, what
effects and transitions should be applied to each and where each of these items
are located on your computer.
Actually these video objects are not attached or fix with the project file. If
you notice your project file itself is very small in size somehow 1 megabit your
movie size is quite bigger and may comprise several gigabytes, this is
happened because the multimedia files are just linked to the project file. At any
point in time if you ever want to re-edit your project, you would need to
organize all your files.
If you ever want to cleanup or reorganized your computer, you should
be very careful. If you move files unknowingly or delete them, it is difficult for
moviemaker to detect your file. This will ultimately result in a valuable project
loss.
To avoid this problem always create a new folder for each of your
video projects. This will keep your project intact as it is. Then save every
movie element into this folder prior to importing them into Movie Maker. Your
video, pictures, images, background music, voice narration should be in this
folder only. This method help you to move your entire project easily to another
computer without losing its contains.
Latest way to save movie maker project files:
There is some particular procedure in order to save your Movie Maker
projects; this should keep in mind before you start editing. It is important to
keep back up your video project to another computer to re-edit your project in
the future. When you first save a project in Movie Maker 2, the program
generates a “movie maker project file” on your computer’s hard-drive. You
have option of renaming file and save this project file anywhere you want.
Movie Maker will try to place the file within your “My Movies folder.”
The Digitization
27
2.5 EDITING WITH MOVIE MAKER 2
Almost 95 per cent of home movies are boring, mostly because you
have to sit an hour an hour to find out the actual intersecting material from that
shoot. The useful aspect of computer editing is that all the ‘junk video’ can be
combed out. It is also observed that keeping movie under 5 minutes time
interests the audience.
Generally a new videographer overuse the camcorder’s zoom function.
Zooming should only be used for framing shots, as it tends to make audience
bored. For this editing is the option. You can edit these zooms right out of the
videos and only shows the interesting shots. Good video needs motion, action.
For example, if you are filming a birthday and it takes your small child two
minutes to open his birthday present. So cut down the middle 1.5 minutes.
Your audience wants to see the main portion of the event that your child’s
delight at seeing the present. Also while filming a family member, there’s
always that couple of seconds where they say “Ok. Are you recording?” Now
you can cut that part out and start right with your interview.
A video editing program like Movie Maker 2 makes it easy and there
are several ways to get rid of junk video. Some of the important ways are
discussed as follows:
1. Trimming the ends of clips:
While working on timeline, simply ‘drag the ends’ of each clip to the
exact point where you would like to start and stop. You can set in and out
points of each clip with the help of very easy controls on timeline. if you
zoom in on each clip using magnifying glass, you can accomplish very fine
control of each clips start and stop points by trimming.
The Digitization
28
clips in half. This is a nice way to freed from large chunks of junk files.
Just find the location you want to cut and click the ‘cut button’ located
under the preview monitor in Movie Maker.
2.6 SUMMARY
Movie Maker is very powerful and effective video editing software.
Video capturing process has been made relatively easy with the help of
Movie Maker 2.
Windows Movie Maker is a video editing program, included in
Microsoft Windows. It has many features such as effects, transitions,
titles, audio track, and timeline.
Movie Maker has several ways to remove junk files from your video.
The Digitization
29
2.8 END QUESTIONS
11) Explain Windows Media Video 9.
12) Write a note on project files? How to save project files in Movie
Maker 9?
13) Write a note on DV-AVI format.
14) What are the advantages and disadvantage of DV-AVI format?
15) How do we improve capture performance in Movie Maker?
16) How do we defragment our hard drives? Elaborate.
17) How do we apply effects in our files?
18) How NTFS file system is useful to improve the capture
performance in Movie Maker?
19) How to improve capturing performance in Movie Maker 2?
20) How to edit videos using Movie Maker 2?
BIBLIOGRAPHY
http://www.microoft.com/windowsxp/using/moviemaker/default.mspx
http://www.atomiclearning.com/k12/moviemaker2?from_legacy=1
The Digitization
30
Written by: Mrs.Shailaja M. Pimputkar,Srajan
Structure:
3.0 Introduction
3.1 Unit Objectives
3.2 Sound
3.2.1 Units for sound measuring
3.2.2 Characteristics of Sound
3.3.3 Sound Pressure Level
3.3 Analog Audio
3.4 Digital Audio
3.4.1 Sampling
3.4.2 Resolution
3.4.3 Quantization
3.4.4 Dithering
3.4.5 Clipping
3.4.6 Bit-Rates
3.4.7 Dynamic Range
3.4.8 Signal-to-noise Ratio
3.4.9 Encoding
3.5 Advantages and disadvantages of Digital Audio
3.6 File size and bandwidth
3.7 Compression
3.8 Summary
3.9 Key Terms
3.10 End Questions
.
3.0 INTRODUCTION
As the third Unit of the course we are going to learn about sound.
Knowledge about the sound will help you understand its importance in our life.
I have always fascinated by sound. I am sure you will feel the same
fascination when you learn about the sound. Sound is present everywhere
.Sounds can be impossible to ignore but at the same time it is difficult to notice
also. We are going to learn about advantages and disadvantages of sound. We
will learn about how sound interact with digital world and also importance of
sound in our life.
Sound waves are created by vibration. The human ear can hear most
sounds. When moving air passes through an object, vibration creates and it
creates sound. This type of sound can be heard on a windy day. A vibrating
object such as a guitar ring produces rapidly varying air pressure and thus
sound reaches our ears. Because string moves in one direction, it presses on
nearby molecules, and causes them to move close together. A decibel is a unit
The Digitization
31
to measure sound's volume. Frequency means total number of vibrations per
second.
3.2 SOUND
When an object vibrates that time sounds create. That means any
vibrations that travel through the air or another medium and can be heard when
they reach a to the person's ear is called sound. For example, when you play a
guitar, the strings of the guitar vibrate up and down. Because of the vibration
sound creates. When the string moves up, the air compressed which is above it
and when the strings moves down the air moves with it and expands. Due to
this compression and expansion it creates differences in air pressure. The
pressure differences in the air move away from the drum surface and creating a
sound wave. This is way we can hear the sound comes out from the Guitar.
The Digitization
32
Sound can be characterized by the following three properties:
1. Pitch/frequency:
Pitch is the frequency of a sound that understands by human ear.
Frequency is measured in the number of sound vibrations in one
second. To measure frequency unit is called Hertz (Hz). A low
frequency produces a low pitch note and high frequency gives produce
a high pitch note.
2. Loudness/amplitude:
Loudness means the volume of a sound. Amplitude measures force of
the sound wave. Decibels or dBA is a unit to measure loudness. Normal
speaking voices are around 65 dBA. Sounds that are 85 dBA or above
can permanently damage your ears.
3. Quality/timber:
Tone is a measure of the quality of a sound wave. Timber means the
quality of a tone that distinguishes it from other tones of the same
pitch. A violin has a different timbre than a piano.
The intensity of a sound is called sound pressure level. Decibel is a unit for
measuring SPL.SPL is actually a ratio of the actual Sound Pressure and a fixed
reference pressure. Reference pressure is the lowest intensity sound that can be
heard by most people. SPL can be measured with a Sound Pressure Level
Meter in decibel. Decibel is a logarithm scale representing how much audio
signal or sound levels varies from reference level or another signal.
The Digitization
33
Fig 3.3: Analog Signals
As refer the above diagram, the analog sound wave makes an exact
copy of the original sound wave. Analogue audio recordings, such as tape,
capture continuous changes of sound during recording. For example sound
pressure recorded through a microphone is converted to electrical voltage. The
changes in voltage represent changes in amplitude and frequency and are
recorded onto a medium such as tape. The first machine used to capture analog
sound called Phonograph. This machine was invented by the Thomas Edison in
1877.
3.4.1 Sampling:
The Digitization
34
The value is sampled at regular intervals, thousands of time per second
to convert an analog signal to digital form. On a scale, the value of each sample
is rounded to the nearest integer that varies as per the resolution of the signal.
Thereafter, the integer is converted to the binary numbers.
Sampling Rate or sampling period: sampling rate refers to how many
times the value of analog signal is measured per second. Sampling rate
measured in Hz or kHz. Sampling rate of audio CD is 44.1 kHz (44100 Hz) as
shown in fig 3.5.Human hearing range roughly 20 kHz is the highest
frequency.
In above figure each line represents a new sample. The time between each
line represent the sampling period, which equals to 1/44,100 of a second.
3.4.2 Resolution:
The range of numbers that can be assigned to each sample is the
resolution of a digital signal. Bit depth is the number of bits of information in
each sample, and it directly corresponds to the resolution of each sample. For
example CD uses 16 bits per sample, DVD audio and Blu-ray disc support up
to 24 bits per sample. Higher resolution reduces quantization distortion and
background noise and increase the dynamic range.
3.4.3 Quantization:
The Digitization
35
The process of converting a continuous range of values into a finite
range of discreet values is called Quantization. In simple words, Values can be
"rounded" to a commonly-agreed standard for simplicity. For example, our age
is usually calculated to the number of years we have been alive as of their last
birthday. This is the function of analog digital converter. Quantization also
forms the core of essentially all lossy compression algorithms. The difference
between an input value and its calculated value (such as round-off error) is
referred to as quantization error.
3.4.4 Dithering:
Dither is a process of adding nice to the signal. It is help to preserve the
information, which would be lost otherwise. Basically, Dithering is a process
that adds broadband noise to a digital signal. Digital audio is highly
advantageous and produces results that are much better than many analog
systems. Quantization errors occur due to reduction of bit resolution. This
process is also known as truncation distortion.
To understand why dithering is important, let’s take one example.
Mostly, all mastered audio files are 16bit. Although 24bit audio has more
details and is a higher quality sound. If you try and play a 24 bit audio file
through one of these 16 bit playback devices, it will create horrible sound. For
this you have to use dithering tool in your production chain.
The Digitization
36
Fig 3.7: Dithering Process
3.4.5 Clipping:
Clipping means when an amplifier is pushed to create a signal with
more power than its power supply can produce. The amplitude of the electrical
signal should not exceed the maximum. It was observed that the clipped
sample often sounds quite different from the original.
For example your audio files have some space limitation. If you keep
increasing the audio level up, after some extent when we reach at maximum
level, this is the time where clipping would occur and the audio signal craps
out. In digital audio, there is a limit on how far an input sound can be
represented.
3.4.6 Bit-Rates:
The Digitization
37
To represent the signal, the term ‘bit- rate’ is used to know how many
bits are transfer in one second. For digital audio, the bit rate is expressed in
thousands of bits per second (kbps) and directly relates to sound quality and
file size. If the sound quality is poor and bit rate is high then it gives better
quality but larger file size.
To calculate the bit rate of uncompressed audio you have to multiplying
the sampling rate with resolution and number of channel. For example CD
audio has a resolution of 16 bits. It has two channels and sampling rate is
44,100 times per second. So using above formula, result is approximately 1.4
million bits per second would be the bit rate.
3.4.9 Encoding:
The process of converting uncompressed digital audio to a compressed
format such as MP3 is called encoding. A codec is the algorithm used in
encoding software. For a particular format, there is often more than one codec.
Even for the same format, different codec can vary widely in quality and speed.
The Digitization
38
What is dynamic range?
In analog recording, information is lost many times and with every copy
there is a noise in recording. Even the best analog systems lose about 3dB
of signal-to-noise ratio when a copy is recorded. But we can copy digital
audio from one digital device to another without a losing any information.
So the advantage is perfect copies can be made with digital recording.
Digital copies can also be created much faster than analog copies, which
usually must be made in real time. For example, to copy 60 minutes of
music from CD, with the help of an analog device like a cassette deck takes
at least 60 minutes to record. But to copy the same 60 minutes of music
with digital audio takes less than 5 minutes on a system with a fast CD-
ROM drive.
The ability of making perfect copies creates a problem also. And for this
reason RIAA has gone to so much trouble to introduce the SCMS (Serial
Copy Management System) for consumer audio equipment. SCMS prevents
multiple generations of copies and is required by the Audio Home
Recording Act of 1992 to be used on all consumer digital audio recording
devices sold in US.
It will take the same amount of time as with analog equipment, if you
are making a master recording with digital equipment. But once a digital
recording is done on your PC, you can make as much as copies in fraction
of time.
The Digitization
39
signal back to an analog voice, so the signal goes from analog to digital and
back to analog.
Durability:
Digital media like CDs, Minidisc are more durable than analog media. Because
of this people are preferred CDs then vinyl records. Each time when you play
vinyl records the oxide coating are rub away. Vinyl records are particularly
prone to warping and scratching. But you can play CD hundreds of time,
without losing quality.
Both digital and analog tapes can suffer degradation from magnetic
fields, but on the other hand, some tape is stronger and has thicker oxide
coating, such as DAT are much more durable than analog tapes.
The Digitization
40
compensate for volume variations. A variety of previously tricky tasks are
being made easier or fully automated by advanced digital technology.
You should have enough hard disk space, CPU processing power,
RAM; otherwise digital audio won't work properly.
Digital audio files are bigger than MIDI (musical instrument digital
interface) files.
Digital system can have poor multi-user interfaces.
44,100 X 16 X 2 X 60 / 8 = 10,584
,000
3.7 COMPRESSION
As explained in unit 1, Data compression is the process that help to
reduce the data file size. The inverse process is called decompression
(decoding). Software and hardware that can encode and decode are called
decoders.
.Compression helps reduce resource usage, such as data storage space or
The Digitization
41
file size to around 1/10th to 1/15th of the original size. Following table shows
some file format used for audio compression.
Audio Formats
File Format Compression
WAV .wav Uncompressed Full size-Full quality
AIFF file .aif Uncompressed Full size-Full quality
SDII files .sd2 Lossless Reduce size-full quality
.ALAC(Apple lossless Lossless Reduce size-full quality
Audio codec)
.FLAC(Free Lossless Lossless Reduce size-full quality
Audio Codec)
MP3 .mp3 Lossy Reduce size-reduce quality
WMA.wma Lossy Reduce size-reduce quality
AAC .m4a Lossy Reduce size-reduce quality
MP4 .mp4 Lossy Reduce size-reduce quality
3.8 SUMMARY
In digitization, computers convert a sound wave first into an analog
signal, and then convert that into the digital form.
Sound waves are created by vibration.
The Digitization
42
Some of the benefits of digital representation of sound are higher
fidelity recording than was previously possible, synthesis of new
sounds by mathematical procedures, application of digital signal
processing techniques to audio signals, and so on.
Dithering is a process that adds broadband noise to a digital signal.
Data compression is the process that help to reduce the data file size.
The inverse process is called decompression (decoding).
Dynamic range compression reduces the range in dB between the
lowest and highest levels of signal, but does not affect the file size or
bandwidth requirement.
The Digitization
43
Any vibrations that travel through the air or another medium
and can be heard when they reach a to the person's ear is
called sound.
Pitch, Loudness, Quality.
Analog audio refers as the method used for recording audio
that make exact copy of original sound waves.
The intensity of a sound is called sound pressure level.
Loudness means the volume of a sound. Decibels or dBA is
a unit to measure loudness.
BIBLIOGRAPHY
‘What is Digital Presentation’? Library Technology Reports 44:2
(Feb/March 2008)
The Digitization
44
Program Name:BSc(MGA)
Written by: Mrs.Shailaja M. Pimputkar,Srajan
Structure:
4.0 Introduction
4.1 Unit Objectives
4.2 Digital video recording
4.2.1 Digital Video Recorder (DVR)
4.2.1.1 Types of DVRs
4.3 Video Capture Device-Analog Video to PC
4.4 High-Definition (HD) Options for Digital Video recording
4.4.1 Satellite Alternatives
4.4.2 Cable Alternatives
4.4.3 Sony HD DVRs
4.5 Capture Cards
4.6 Summary
4.7 Key Terms
4.8 End Questions
4.0 INTRODUCTION
In this unit we are going to learn about digital video. To capture means
to save. Video capture means to store or save video images in a computer. In
technical term, Video capture is the process of converting an analog
video signal.
The Digitization
45
4.2 DIGITAL VIDEO RECORDING
Digital video recording is a technique to compressing video signal by
using a video encoder like MPEG-2.it is used for digitally recording TV and
videos. The encoder can also be used for DVD movies.
The following are the three primary methods to record TV and video
digitally.
The Digitization
46
Replay TV and TiVo are the two early consumer DVRs that were
launched at the 1998 Consumer Electronics Show in Las Vegas.
In 1999 Dish Network’s DISH player receivers come with full DVR
features. These digital set-top devices have a feature to record television
programs without using a videotape.
Fig 4.2:
Replay TV
TIVO is one
of the most popular brands of DVRs. TiVo were launched in 19998 at
Consumer Electronics Show in Las Vegas. The television signal comes into
DVR through cable, satellite or antenna. After that signal goes into MPEG-2
encoder for compression and it converts analog signal to digital. From encoder
signal goes in hardware for storage and MPEG-2 decoder, which convert signal
back to analog and send to television for viewing.
• Dual Tuners:
These Devices have two independent tuners within the same receiver. Both
the tuners function separately to one another. The main use for this
characteristic is the ability to record a live program while watching another live
program at the same time or to record two programs simultaneously time. Some
dual-tuner DVRs can also operate two television sets simultaneously.
The Digitization
47
Fig 4.4: Dual Tuner
• PC based DVRs:
A standalone system looks a lot like an old VCR with all their
components encased in one cabinet. This includes the CPU, IC chips, and
power supplies. Main drawback of this system is everything you need to
have to operate the unit is located in one motherboard. If one component
fails then you have to change the whole unit. Also it has limited storage
capacity.
The Digitization
48
What is digital video recording?
What is DVR?
How is Dual Tuner work?
When was the TiVo DVR launch?
Who launched the first inbuilt DVR in television and when?
In this section we are going to see how to capture video from an analog
video source to a Windows XP computer using an external Video Capture
device. We need source, a capture device, and capture software. For editing the
video, we will need Video Editing Software. If you want to record your video
to DVD you will also need DVD Recording software. You will need a DVD
Burner to physically record the DVD.
To understand the process, we are taking VCR as source, ADS Techs
DVD Press as the capture device and Pinnacle Studio Plus 9 as capture
software. This works with any other combination of capture hardware using
USB 2.0 cable, capture software or analog source. For e.g. Hi8, a VHS-C
camcorder, or 8 mm.
The Digitization
49
Fig 4.6: PC with Video Capture Devic
There are two varieties of Satellite TV: DirecTV and Dish Network. Each
company offers a High Definition Digital Video Recorder that also works as a
Satellite Receiver.
The Digitization
50
Dish Network: Dish Network is a dual tuner receiver. It can record one
show while watching another. It provides customers the ViP722 DVR,
two-TV HD DVR receiver. This is Dish Network's top-of-the-line
Receiver, because of this it lets you to watch and record both HD and
SD broadcasts, while also using the receiver as a DVR. It also has hard
Drive for up to 350 hours of SD recording, and up to 55 hours of HD
recording. It also provides an Electronic Programming Guide (EPG) for
scheduling recordings in advance.
Direct TV: DirecTV provides an HD DVR that includes the TiVo
Service built-in to the Receiver. One not only obtain HD broadcast for
recording, but also receive a fully operational TiVo DVR. It has 250GB
hard drive.
4.4.2 Cable
Cable TV suppliers provide HD DVR at a very low –cost price, which is
much less price than satellite service supplier. Most Cable provider
organizations are now providing HD DVR service for a low monthly fee,
and provide their customers with either a Scientific Atlanta 8300HD DVR
or a Motorola DCT6412 HD DVR, depending on the Cable provider.
The Digitization
51
cards can externally attach via USB 2.0 or internally into a PCI slot. Analog
videos can also be recorded by these cards from a camcorder, DVD
player/recorder or VCR. These types of cards do not input digital signals or
capture from coaxial cable. They typically come bundled with TV and Video
Capture software.
• Video only capture cards:
These kinds of cards usually used by professionals. These are higher-end
Video Capture Cards. These cards are basically used for editing video. These
cards capture with IEEE1394 (Firewire) inputs from digital camcorders that
usually bundled with high-end video editing software.
• TV tuner Card:
TV tuner cards is kind of television tuner that record video or television
programs on to computer’s hard disk. In other words, A TV tuner is a device
that allows connecting a TV signal to your computer. Most TV tuner works
as video capture cards. They provide an Electronic Programming Guide
(EPG) to schedule recordings in advance. They also function as a Digital
Video Recorder, so users can pause and rewind live TV.
4.6 SUMMARY
Video signal generated by video camera that is analog signals
converting into digital format and store this digital video on a
computer’s mass storage is called video capturing.
Video capture from analog devices requires a special video capture
card that converts the analog signals into digital form and compresses
the data.
Digital Video Recording doesn’t need any extra storage for recording
program onto. A DVR has an internal hard drive with a
specific memory capacity. The device includes s software for personal
computers, portable media players and complete set-top boxes that
facilitate playback and video capture to and from disk.
High Definition digital video recording devices are easily available
these days and cost less for user.
The Digitization
52
DVR provide all practicalities of a standard DVR like TiVo, but also
allow to view and record HD broadcasts.
Capture Cards are internal or external devices. Video capture cards that
convert the analog signal into digital form and compresses the data.
Capture cards record video into computer’s hard drive.
The Digitization
53
LG in 2007
BIBLIOGRAPHY
1. Jenkins, Henry,2006. Convergence Culture: where Old and New
Media Collide. Buying into American Idol. New York and London:
New York University Press.
2. Maxine K.Sitts. (ed).2000. A management Tool For Preservation
and Access. Andover ,Massachusetts: Northeast Document
Conservation Center,2000.
3. Digital Projects Guidelines. Arizona State Library, Archives and
public Records, http:// www.lib.az.us/digital/
4. Green, David, 2003. The NINCH Guide to Good Practice in the
Digital Representation and Management of Culture Heritage
Materials. New York: NINCH
The Digitization
54
HTTP://WWW.UCD.IE/IVRLA/WORKBOOK2/
WDIGFILEFORMAT.HTML
DO I HAVE TO FOLLOW THE SAME
BECASE ITS ACTULY A PROJECT
DONE BY IRISH VIRTUAL
RESEARCH LIBRARY AND
ARCHIVE TEAM, AS PART OF THE
PILOT PROJECT
The Digitization
55
5.2.2 JPEG
5.2.3DjVu
5.2.4 PDF
5.2.5 WAV
5.2.6 MP3
5.2.7 Real Audio
5.2.8 MPEG 21
5.3 Digitization file format
5.3.1 Images
5.3.2 Text
5.3.3 Data set
5.3.4 Audio
5.3.5 Video
5.4 Summary
5.5 Key Terms
5.6 End Questions
5.0 INTRODUCTION
In this
Following are some popular file formats for different file types:
The Digitization
56
5.2.1 TIFF
It is a file format for storing images, including line art and photographs.
TIFF graphics can be any resolution, in black and white, gray-scaled, or
color.TIFF is very suitable format for high-color depth images. TIFF doesn’t
suitable for vector data. TIFF files only contain bitmap data. TIFF used the
LZW compression. LZW compression is the compression of a file into a
smaller file using a table-based lookup algorithm.
TIFF describes image data that typically comes from scanners, and
paint and photo retouching programs. TIFF includes a number of compression
schemes that allow developers to choose the best space or time tradeoff for
their applications. TIFF is portable. It can suit with any operating systems, file
systems, compilers, or processors.
5.2.2 JPEG
5.2.3 DjVu
The Digitization
57
DjVu developers report that for color document images that contain text
and pictures, DjVu files are typically 5 to 10 times smaller than JPEG at
similar quality and for black white document images, DjVu files are 10 to 20m
times smaller than JPEG. The main technology behind DjVu is these files are
typically separated into three images - the background and foreground (around
100 dpi) and the mask image which is higher resolution (e.g. 300 dpi). It
separate the text from the backgrounds, DjVu can keep the text at high
resolution, while at the same time compressing the backgrounds and pictures at
lower resolution with a wavelet-based compression technique. DjVu is used by
many commercial and non-commercial web sites on the Web today. For DjVu
file type file extension is .djvu.
5.2.4 PDF
5.2.5 WAV
WAV files are becoming less popular because of its large file size.
WAV files are larger than MP3 files. WAV file format does not use a form of
lossy compression so file sizes are therefore much bigger and now less popular.
Another drawback of WAV files is sending and downloading of the files takes
much more time and space. WAV files are based on Resource Interchange File
Format (RIFF) method for storing data. Data is stored in chunks which contain
a four character tag and the number of bytes in the chunk. All the system
sounds like when sound comes when you log in, are in the .wav format. These
sounds are uncompressed WAV files.
5.2.6 MP3
The Digitization
58
and sending. MP3 files remove those sounds the human ear is incapable of
hearing and processing.MP3 file stores audio information only.
MP3 files are portable. A three minute song that require about 32 Mb of
disk space in its original form can be compressed using MP3 into a file of
about 3 MB without losing sound quality. Using a 56K modem, the song can
then be transmitted over the internet in a few minutes. It is possible for us to
create virtual libraries by downloading from the internet. The user can also
‘rip’ MP3 files from their own CDs using free software easily available on
internet.
5.2.7 RealAudio
Real Audio files are played with RealNetworks' RealPlayer. You can
play Real Audio file in free Real Alternative or JetAudio, but for that you need
to install an additional, free plug-in. That is the reason many users convert RA
files to other more popular audio formats like MP3, AAC, WAV, WMA.
5.2.8 MPEG 21
The Digitization
59
Digital Item can be conceived as the center of the Multimedia Framework and
the users who interact with them inside the Multimedia Framework.
5.4 SUMMARY
Different file formats are used for both preservation and surrogate files,
based on the type of content in the original resource.
Preservation Master Files are created for deep storage purpose.
Compressed Web files are created from PM files for use as surrogate
files in the repository and on the information website.
PM files must be uncompressed in order to retain archival integrity.
Surrogate files are compressed file formats with title ,perceivable loss
quality.
The Digitization
60
• Tagged Image File Format (TIFF): Originally created by Aldus
for use in desktop publishing .This type of files store images
including photographs and line art.
• Optical Character Recognition (OCR): OCR is the process of
taking an image of letters or typed text and converting it
into data the computer understands
• Joint Photographic Experts Group (JPEG): A commonly used
standard method of compression for photographic images.JPEG
uses lossy compression algorithm for images.
• DjVu: An alternative to PDF, since it gives smaller files than PDF
for most scanned documents. It uses image layer separation of text
and background images, progressive loading, arithmetic coding, and
lossy compression for monochrome images.
• Portable Document Format (PDF): Files that preserve the original
graphic appearance online for all types of documents, such as
magazine articles, brochures etc.
• WAV or WAVE: WAV is an audio format developed by Microsoft
and IBM.
• MP3: An audio compression file format that employs an algorithm
to compress the music files, achieving significant data reduction
while retaining near, CD-quality sound.
• RealAudio ram: A proprietary audio format developed by Real
Networks that uses a variety of audio codecs, ranging from low-
bitrates formats to high fidelity formats for music.
The Digitization
61
LZW compression is the compression of a file into a smaller
file using a table-based lookup algorithm.
Moving Picture Experts Group.
BIBLIOGRAPHY
5. Douglas J Hickok, Daine Richard Lensniak, Michael C.Rowe ,
2005. “File type Detection Technology’, Midwest Instruction and
Computing Symposium.
6. Karresand Martin, and Shahmehri Nahid, 2006. ‘File type
identifications of Data Fragments by their Binary Structure’.
Proceeding of the IEEE Workshop on Information Assurance.
7. Ryan M. Harris. 2007. ‘Using Artificial Neural Networks For
Forensic File Type identifications’, Master’s Thesis, Purdue
University.
8. Roussev, Vassil, and Garfinkel, Simon. ‘File Classification
fragment- the case for Specialized Approaches’.Systemetic
Approaches to Digital Forensic Engineering. Oakland, California.
9. Sarah J. Mood and Robert F. Erbarcher.2008. ‘SADI- statistical
Anlysis for data Type Identification’. 3rd International Workshop on
Systematic Approached to Digital Forensic Engineering.
10. Robert F. Erbacher and John Mullholland, ‘Identification and
Localization of Data Types within Large-Scale File Systems’,
Processing of the 2nd International Workshop on systematic
Approaches to Digital Forensic Engineering, Seattle, WA, April
2007.
The Digitization
62
6.4.1 Imaging Issues
6.4.2 OCR Issues
6.5 Re-Keying
6.6 Summary
6.7 Key Terms
6.8 End Questions
6.0 INTRODUCTION
In this unit we are going to know detail about digitization in term of
scanning and rekeying. As we learn earlier is that digitization is a process of
converting analog materials such as book, paper, film, and tapes in to digital
format which are readable by an electronic device. In other words, creating a
computerized representation of a printed analog is known as digitization.
The Digitization
63
Each link attains a level of importance, so that the entire project would
fail if one piece of the chain were to break. Though this is a very important and
useful concept in project development, in this section we will more learn about
the Peter Robinson’s concept of the digitization chain.
Project will flow more smoothly if it is having very few links in the digitization
chain. Firstly, the results will depend on the quality of the image being
scanned, regardless of the technology used by the project. It is very oblivious
that scanning image directly from the journal itself is going to make a huge
difference in quality. Scanning a copy of a microfilm of an illustration
originally found in a journal is also helpful. This is the main reason for
carefully choosing the hardware and software.
The Digitization
64
Fig 6.1: A Flatbed Scanner
A flatbed scanner sees the images and converts the printed text or
image into electronic codes that can be understood by the computer. After that
the scanning unit moves across the image to be scanned. It reads the image as a
series of dots and the generates the digitize image. Scanner has bit depth
feature. Because of bit depth scanner can capture different colors. The
resolution and color of the scanned image is depending on bit depth of the
scanner. Digitized image sent to the computer and stored as a file.
A Flatbed scanner has ADF (Automatic Document Feeder)
feature.ADF helps to take more than one pages and feed them one at a time
into the scanner. This will help to user to scan many pages without having
manually replaced the each paper. With the help of ADF we can scan both the
sides of an image.
There are different levels of flatbed scanner available in the market. We
can choose it as per our requirements. Different levels of flatbed scanner have
different capacity to scan. Entry level flatbed scanner can scan 8.35 to 11.7
inch size of document with 300-600 dpi resolution. Mid level flatbed scanner
can scan 12-14 inch size of document with 600-1200 dpi resolution. High –end
flatbed scanner can scan 14-24 inch size of document with more than 1200 dpi
resolution. A flatbed scanner can print approximately 11 color pages and 27
black and white pages in one minute.
The main advantage of flatbed scanner is it can scan any document
irrespective of its quality. It is very user friendly. Drawback of the flatbed
scanner is it often very large in size. It needs more space. Another disadvantage
of this scanner is that they are very expensive.
Digital cameras:
Digital cameras are very portable and easy to handle. Some large documents
that won’t fit to flatbed scanner can be digitized with the help of digital
camera. In flatbed scanner the document or page should be lie completely flat
on the scanning bed. This poses the problem with books. Digitizing with a
stand-alone digital camera could be a solution to this problem, as has been
taken up by many digital archives and special collections departments. Today
many digital cameras has voice capture feature that records vice also.
The Digitization
65
Fig 6.2: Digital Camera
Most of the digital camera has LCD, which helps to view images in
memory and in viewfinder. In digital camera we can see photos immediately.
These stored photos or images can be uploaded to a computer. Digital camera
as ability to digitize image with changing lighting is highly beneficial as it
would not harm the composition of work. Images can be produces at great sizes
as a result of these specifications. Sony, Canon, Nikon, Kodak, Olympus and
many other companies make digital cameras.
6.3.2 Software
Taking decisions regarding the specific recommendations for software
is a difficult task. In digitization process, there are no specific rules to be
followed. The method and process vary from one project to another depending
upon the use, suitability and personal preferences. Irrespective of the method of
digitization, all digitization projects use text scanning software and image
scanning software. There are wide ranges of text scanning software available,
all with varying capabilities. With the condition of the text being scanned, the
primary consideration with any text scanning software is how well it works
when working with old texts. It is import to find software that has the ability to
work through more complicated fonts and low quality page, as this software is
optimized for laser-quality printouts.
There are more choices of software depending upon what needs
to be done in terms of image manipulation. Adobe Photoshop is the most
common software for image-by image manipulation, including converting
TIFF to web deliverable JPEG and GIFs.
The Digitization
66
What is digitization?
Name two most important hardware devices required for image capture.
What is the use of flatbed scanners?
What is the use of digital cameras?
Name one software used for image manipulation.
The Digitization
67
• Are there preservation issues that must be considered or are the
images simply for Web delivery?
The reason for this is simple: the higher the settings necessary
for scanning, the higher quality the image need be. Once this
decision has been made, there are essential image settings that
must be established.
• At what resolution?
1. Image types
Basically, there are four main types of images. They are as follows:
• 8-bit grayscale
• 1-bit black and white
• 24-bit color
• 8-bit color.
With a single bit being represented by either a '1' or `0', a bit is
the key unit of information read by the computer. '1' represents
present while '0' denotes absence, with more complex
representations of information be adapted by multiple or
gathered bits.
The bit can either be black or white if it is a 1-bit black and white
image. This completely unsuitable for almost all images and is a rarely used
type. The only conformable image for this format would be line graphics or
printed text for which poor resulting quality does not matter. Saving it as a
PEG compressed image is not a feasible option and is yet another drawback
of this type, which is one of the most popular image formats on the Web.
As they encompass 256 shades of grey, 8-bit greyscale images are an
improvement from 1-bit images. It provides a clear image rather than the
resulting fuzz of a 1-bit scan and is often used for non-colour images. There
are times when non-colour images should be scanned at a higher colour
because the finite detail of the hand will come through distinctly, whereas
greyscale images are often regarded more than adequate. The uniform
recommendation is that images that are to be considered archival copies or
preservation should be scanned as 24-bit color.
Colour image of 8-bit is similar to 8-bit grayscale with the exception
that each bit can be one of 256 colors. As the format is appropriate for web
page images but can come out somewhat grainy, the decision to use 8-bit
color is completely project dependent on your requirement. Another factor to
be considered is the type of computer the viewer is using, as older ones
cannot cover an image above 8-bit. So it will need to convert a 24-bit image
to the lower format. Hence, storage space is the key factor to be taken into
consideration here. Likewise, an 8-bit image will be markedly smaller, when
it does not have the quality of a higher format.
The Digitization
68
In practice, the best scanning choice is 24-bit colour image. With each
bit having the potential to contain one of 16.8 million colours, this option
provides the highest quality image. The debates against this image format are
the cost and time necessary and size. Moreover, to make this decision,
knowing the objectives of the project will assist in this regard. If one tries to
create archival quality images, this is taken as the default setting. Even if the
original is greyscale, a 24-bit colour makes the image look more photo-
realistic. With archival quality imaging, the thing to remember is that if you
need to go back and manipulate the image in any way, it can only be copied
and adjusted. However, any kind of retrospective adjustment will be
impossible if you scan the image as a lesser format. An 8-bit greyscale image
cannot be converted into millions of colours, whereas a 24-bit colour
archived image can be made greyscale.
2. Resolution
The second issue we need to consider is a resolution of the image. In
simple language resolution means number of dots or pixels per inch i.e. dpi or
ppi. If there are more dots or pixels per inch then resolution of the image is
high. And obviously image looks clearer. Again, resolution is depend on
purpose for image being used. The resolution will need to be relatively higher,
if the image is being archived or needs to be enlarged. But the resolution drops
drastically, if the image is simply being laid on a web page. File sizes are
altered by the dpi ranges as with the options in image type. The file size will be
larger as per higher dpi. To explain the deviations, an informative table
(created by the Electronic Text Centre) can be replicated as follows, examining
an uncompressed 1" x 1" image in different resolutions and types.
In addition to being one of the best choices for archival imaging, the 400
dpi scan of a 24-bit colour image makes up the largest file size. Because
screen resolution rarely exceeds 100 dpi image resolutions, this is known to
be appealing for its small size. So, the dpi choice relies upon the project
objectives.
3. File Format
While finalizing the capture when using an imaging software program,
by clicking on the function 'save as', it can be seen that there are quite a few
image formats to choose from. There are three types of key image formats of
The Digitization
69
the process in terms of text creation, i.e., JPEG, GIF and TIFF. These key
image formats are the most common as these formats can be transferred to
nearly any software system or platform.
For archival image creation and retention as master copy, TIFF
(Tagged Image File Format) files are the most widely accepted formats.
Almost all platforms can easily read TIFF files, thus making it one of the
best choices while transmitting important images. With the TIFF format,
most of the digitization projects use image scanning, because it allows a
person to assemble as much information as possible from the original and
then saves the data. Moreover, the only demerit of the TIFF format can be
ascribable to the image size. But, once the image is saved, it can be read by
a computer with a completely different hardware and software system and
can also be brought forward at any point. Also, if there is any necessity to
modify the images, and then TIFF scanning of the images should be made.
For systems that have space restrictions, JPEG (Joint Photographic
Experts
Group) files are the securest data formats for Web viewing and transfer.
JPEGs are popular formats not only for their compression capabilities, but
also with image creators in addition to their quality aspects. TIFF is a
lossless compression, whereas JPEGs lossy compression formats. It is a
common phenomenon that the image loses bits of information when a file
size squeezes. But there will be no significant loss in image quality. At 24-
bit scanning, every dot has the alternative of 16.8 million colours, which is
more than what the human eye can really distinguish on the screen. With
the condensation of the file, the image misses some information to the
lowest degree which is likely to be detected by human's eye. The lossy
compression is the main disadvantage of this popular format. Once an
image is preserved by using the 'save as' option, the cast-away information
is lost. The significance of this is that certain parts of the image or the total
image cannot be magnified. Furthermore, re-working on the image results
in loss of more information. Therefore, archiving for JPEG formats is ran
recommended as there is no other possible way to retain all of the
information scam from the source. Nonetheless, in terms of storage size
and viewing capabilities, JPE formats are one of the best methods for
online viewing.
There are some older formats that are limited to 256 colours. These are
GI (Graphic Interchange Format) files. Without requiring as much storage
space like TIFFs, lossless compression formats are used by GIFs. Although
GIFs have no compression capabilities like a JPEQ they are still solid for
line drawings and graphic arts. They also, have the potential to be
transferred to transparent GIFs, in which the background the image can be
furnished invisible, thus permitting it to mix in with the web pa background.
Although frequently used in Web design, this can have a good use creating
text. It is plausible that a text character cannot be converted so that it can
translate by a Web browser. It could be that ISOLAT1 or ISOLAT2 has not
defined the character, or it is not defined by inline images (e.g., a
The Digitization
70
headpiece). For instance when an online version of the journal Studies in
Bibliography was created by the UVA Electronic Text Centre, there were
cases of inline special characters that simply could not be depicted through
the available encoding. The journal being a searchable full-text database,
furnishing a readable page image was not an option. Their solution to this
was to make a transparent image GIF; one that did not break up the flow of
the digitized text. In order to match the size of the surrounding text, these
GIFs were made and were afterwards introduced quite successfully into the
digitized document.
Continuing on the discussion of types of images, the topic of file size
generally arises frequently in the digitization process. It is the archive or
lucky project that has limitless storage space; therefore, most creators must
reflect so as to how to obtain quality images without taking up the 55mb of
space needed by a 400 dpi, archival quality TIFF. The wider aspect to this is
that lower the bit the better is the compression, which is not true. The
Electronic Text Centre has developed a figure that represents how 24-bit
images instead of8 bit images will give rise to a smaller JPEG in addition to
higher quality image file.
• 300 dpi 24-bit colour image: 2.65 x 3.14 inches:
Uncompressed TIFF: 2188K
`Moderate loss' JPEG: 59 K
• 300 dpi 8-bit colour image: 2.65 x 3.14 inches:
Uncompressed TIFF: 729 K
`Moderate loss' JPEG: 76 K
• 100 dpi 24-bit colour image:
2.65 x 3.14 inches:
Uncompressed TIFF: 249 K
`Moderate loss' JPEG: 9 K
• 100 dpi 8-bit colour image:
2.65 x 3.14 inches:
Uncompressed TIFF: 85K
`Moderate loss` JPEG: 12K
Although the image sizes might not seem to be significantly different, it
should be kept in mind that these results were estimated with an image
measuring roughly 3 x3 inches.
The storage space all of a sudden becomes problematic while turning these
images into page size. The compressed JPEG will take less space. Moreover, a
24-bit scanning provides a better image quality.
The Digitization
71
After discussing these three image formats, a decision has to be made as
to which one should be used for a relevant project. The best answer will be to
use a combination of all three image formats. TIFFs are not suitable for online
delivery. But if the images have any future use, either for printing or simply as
a master copy, later enlarging, manipulation or archiving, then there is no
other suitable format in which to stock the images. JPEGs and GIFs are the
best formats for online presentation. JPEGs cannot be enlarged (or else they
will pixelate), but these have smaller file size and better quality. JPEG
condition almost matches the TIFF formats in terms of viewing quality. The
types of images are associated with the project depend on how GIFs are used.
But, GIFs are a popular option for making thumbnail images that exhibit the
JPEG version linking to a separate page.
There has been much debate in terms of the creation of archival digital
image. As per the Electronic Text Centre, there is an uprising duality between
archival imaging and preservation imaging. Preservation imaging can be
specified as 'high-speed, 1-bit (simple black and white) page images shot at
600 dpi and stored as Group 4 fax-compressed files'. The results of this are
related to microfilm imaging. It ignores the source as a physical object in
addition to preserving the text for reading purposes. In order to protect the
source from constant handling (an international means of accessibility),
archiving often supposes that there has been digitization of objects. But, any
chance of presenting the object as an artifact is eliminated by this type of
preservation. Entirely different set of requirements are needed for archiving an
object. Film imaging is the only imaging that can be considered of having an
archival value. This is believed to last at least ten times as long as a digital
image. However, the idea of archival imaging cannot be neglected, and it is
still discussed amidst funding bodies and projects.
The Digitization
72
storage on writeable CD-ROMs is another option, the master copies do
not have to be held online.
• 24-bit: As the example shows, the file size of the subsequently
compressed image does not benefit from scanning at a lower bit-size.
But there really is trivial ground to scan an archival image at anything
less. Whether the source is greyscale or colour, the images have a
higher quality at this level and are more realistic.
• Standard type font (for E.g. Times, New Roman etc.) Fancy
fonts may not be recognized.
• Single-column layout
OCR limitations
Following are some drawbacks of OCR:
• During the text scanning, besides paragraph marks and tab stops,
most documents formatting are lost. (Italic, Underline, Bold).
• A single –column editable text file is the output from a finished text
scan. This text file always requires proof reading and spellchecking
in addition to reformatting to desired final layout.
The Digitization
73
• Before scanning plain text files or printouts from a spreadsheet, the
text must be imported into a spreadsheet and reformatted to match
the original.
• Using text from a source with font size less than 12 points or from a
fuzzy copy results in more errors.
• Handwritten text
• Mathematical expressions
The Digitization
74
computer-set texts of the late 20th century, the black letter and exotic typeface
found in the hand-press period contrast noticeably.
6.5 RE-KEYING
There are still many situations where the documents or project prevents the
use of OCR for the text creator. If the text is of a degraded or poor quality, then
it will take quite a good amount of time correcting the OCR mistakes by simply
typing in the text from scratch level. There is also an issue of the amount of
information to be digitized. There might not be enough time to sit down with
560 volumes of texts (as with the Early American Fiction project) and process
them through OCR even if the document is of relatively good quality.
Although this varies from study to study, the general rule thumb is that a best-
case scenario is three pages scanned per minute. The process putting the
document on the scanner, flipping pages, or the subsequent proofreading not
taken into consideration though. When addressing these concerns, the viable
solution becomes re-keying the text if OCR is found incapable of handling the
project digitization.
Whether to handle the document in-house or outsource the work
becomes next question to address. All the necessary elements such as
hardware, software, and time are taken into consideration while deciding to
digitize the material in-ho Moreover, a few issues that come into play with in-
house digitization are to be kept mind. The speed of re-keying is the primary
concern. Research assistants working the project, or graduate students from
The Digitization
75
the text creator's local department generally the re-keying. Paying someone to
re-key the text on an hourly basis often pro more expensive than outsourcing
the material. Another problem is that a single person typing in material
generally misses keyboarding errors. On the other hand, if the member is
familiar with the source material, then there is a chance to automatic correct
those things that seem incorrect. Therefore, during in-house digitization
processing, these concerns should also be treated from the beginning.
The most popular choice with many digitization projects is to
outsource material to a professional keyboarding company. Also, by hiring
keyboarders who not have a subject specialty in the text being digitized—
many often do not speak language being converted—they avoid the problem
of keyboarders subconsciously altering the text. Established by the project
creator, keyboarding companies are able to put in a base-level encoding
scheme into the documents, thereby getting rid some of the more basic tagging
tasks.
As with most steps in the text-creation procedure, the answers to
these questions will be dependent on projects. For a project that plans to
digitize a collection of works, there will be a marked difference in the
decisions made from those made by at academician who creates an electronic
edition. Thus, it reflects back on the significance of the document analysis
stage. The requirements of the project must be recognized la addition to
identifying the external influences (such as equipment availability, project
funding and staff size,) that affect the decision-making process of the project.
6.6 SUMMARY
The main concept to establish the digitization chain is based on the
fundamental concept that to achieve the best quality image. It will help
to digitize the original data.
There are few methods of image capture exits that we used today. The
equipments like high-end digital cameras to different types of scanners
like Flatbed, Sheet fed, Drum, Microfilm).for project we should the
most available option and one that is affordable also. In this aspect, the
two most common accessible image capture solutions are high-
resolution digital cameras and flatbed scanners.
The Digitization
76
Digital cameras are very portable and easy to handle. Some large
documents that won’t fit to flatbed scanner can be digitized with the
help of digital camera.
The main goal of recognition technology is to re-create the text, the in
addition to elements of the page, including layout and tables.
Resolution means number of dots or pixels per inch i.e. dpi or ppi. If
there are more dots or pixels per inch then resolution of the image is
high. And obviously image looks clearer.
The Digitization
77
Digital cameras are very portable and easy to handle. Some
large documents that won’t fit to flatbed scanner can be
digitized with the help of digital camera.
Photoshop
BIBLIOGRAPHY
11. Robert F. Erbacher and john Mullholand.2007.’Identification and
localization of data types within Large-scale file systems’,
Proceedings of the 2nd international workshop on systematic
approaches to digital forensic Engineering, Seattle, WA.
12. Ryan M. Harris. 2007.’Using Artificial Neural Networks For
Forensic File Type Identification’, Master’s Thesis, Purdue
University.
13. Douglas J. Hickok.Daine Richard Lesniak, Michael C. Rowe, 2005.
‘File Type Detection Technology’, Midwest Instruction and
Computing Symposium.
14. Karresand Martin, and Shahmehri Nahid, 2006. 'File Type
Identification of Data Fragments by their Binary Structure'.
Proceedings ofthe JAYE Workshop on Information Assurance,
pp.140-147.
15. Sarah J. Moody and Robert F. Erbacher. 2008. `SARI —
StatisticalAnalysis for Data Type Identification', 3rd International
Workshop on Systematic Approaches to Digital Forensic
Engineering.
16. Roussev, Vassil, and Garfinkel, Simson. 'File Classification
Fragment-The Case for Specialized Approaches', Systematic
Approaches to Digital Forensics Engineering (IEEE/SADFE 2009),
Oakland, California.
17. IrfanAhmed, Kyung-suk Lhee, Hyunjung Shin and ManPyo Hong,
'On Improving the Accuracy and Performance of Content-based
File Type Identification', Proceedings of the 14th Australasian
Conference on Information Security and Privacy (ACISP 2009),
pp.44-59, LNCS (Springer), Brisbane, Australia, July 2009.
18. IrfanAhmed, Kyung-suk Lhee, Hyunjung Shin and ManPyo Hong,
'Fast File-type Identification', Proceedings of the 25thACM
Symposium on Applied Computing (ACM SAC 2010), ACM,
Sierre, Switzerland, March 2010.
The Digitization
78
19. Robert F. Erbacher and John Mulholland, 'Identification and
Localization of Data Types within Large-Scale File Systems',
Proceedings of the 2nd International Workshop on Systematic
Approaches to Digital Forensic Engineering, Seattle, WA, April
2007.
UNIT 7 MICROFORM
Program Name:BSc(MGA)
Written by: Mrs.Shailaja M. Pimputkar,Srajan
Structure:
7.0 Introduction
7.1 Unit Objectives
7.2 History
7.3 Uses of Microfilm
7.4 Advantages and Disadvantages of Microfilm
7.5 Readers and Printers
7.6 Microfilms and Cards used in Media
7.7 Image creation
7.7.1 Film
7.7.2 Cameras
7.7.2.1 Microfiche camera
7.7.2.2 Roll film camera
7.7.2.3 Flow roll camera
7.7.2.4 Flat film
7.7.2.5 Computer output microfilm
7.8 Storage and preservation
7.9 Duplication
7.10 Digital conversion
7.11 Format conversion
7.12 Summary
7.13 Key Terms
7.14 End Questions
7.0 INTRODUCTION
The Digitization
79
In this unit we are going to learn about the term ‘Microform’.
Microforms are any forms. It can be either paper or on film. Microform can
contain micro reproductions of documents for transmission, storage, reading
and printing. Images of microform are generally reduced approximately
twenty- five times from their original document size.
7.2 HISTORY
The Digitization
80
In 1920, George McCarthy, New York City banker, was
developed the first practical use of commercial microfilm. In 1925, He got a
patent for his Checkograph machine. This machine designed to make
permanent film copies of all bank records.
The American Library Association endorsed microforms at the
annual meeting held in 1936. Before it was officially accepted, microfilms
were used in related fields. In the years between 1927 and 1935, the library of
Congress microfilmed more than three million pages of books and manuscripts
in the British Library.
In 1934, the first microform print –on –demand service was
implemented by the United States National Agriculture Library, which was
successively followed by similar commercial concern, Science Service. And in
1938, University Microfilms was established and the Harvard Foreign
Newspapers Microform Project was implemented.
Early cut-sheet microforms and microfilms were printed on
nitrate film. It was risky for holding institutions because nitrate film is
explosive and flammable. From late 1930s to 1970 Microfilms were usually
printed on a cellulose acetate base, which prone to tears, vinegar syndrome,
and redox blemishes. The result of chemical decay is vinegar syndrome, which
produces ‘buckling and shrinking, embrittlement and bubbling’. Redox
blemishes are red, orange and yellow spots 15-150 micrometer in diameter
created by oxidative attacks on the film, and are largely due to poor storage
conditions.
In 1970 also developed computer output microform
applications. Computers are directly used to produce Microforms, which used
to produce parts catalogs, hospital and insurance records, telephone listings,
college catalogs, patent records, publisher's catalogs and library catalogs.
Although this technique is widely used, the permanence of microfilm masters
on film is the standard for most libraries and those applications where
preservation is an issue. Microforms will have a future not only in the short
term but probably in the more distant future as well.
The Digitization
81
• If you want to keep your document for more than 7 years, then
microfilm is probably the best media to use. Microfilm is used
for long term storage of your documents.
• Microfilm enables libraries to greatly expand access to
collections. Besides being compact, its storage cast is so far less
than paper documents. 98 document size pages normally fit on a
fiche, thereby reducing to about 0.25 per cent original material.
• Microfilm can be last up to 400 years and is readable to eye.
This means you do not have to need any software to read these
files.
• A roll of microfilm can hold up to 2500 images on them.
• In the mid 20th century, libraries started using microfilm as a
preservation strategy for deteriorating newspaper collections.
• The use of microfilm was also used to save space.
The choice of using microfilm will depend on the application and length of
time the document or image needs to be stored.
The following are the advantages of using microfilms:
• Strength and stability
Without putting rare, fragile or valuable items at risk of theft or
damage, it enables libraries to great expand access to collection.
Microfilms are breaks rarely.
• Storage Capacity
Besides, being compact, its storage cost is far less than paper
documents. Ninety eight documents sized pages normally fit on one
fiche, thereby reducing to about 0.25 per cent original material.
Microfilms can reduce space storage requirements up to 95 per cent,
when compared to filling paper.
• Cheaper Cost
Distribution of microfilm is cheaper than paper copy. It has lower
reproduction and carriage cost than that of printed paper.
• Storage Condition
This film can have a life expectancy of 500 years; if appropriate storage
conditions are maintained. Microfilms are stronger than any traditional
The Digitization
82
film. Instead of cellulose microfilms are made of polyester. The
polyester will not change with humidity or temperature.
• Data Retrieval
It is easy to view because it is analog CC. The format, unlike digital
media, does not need any software to decode the data stored thereon. A
person having knowledge in language can instantly comprehend with
the only need of a simple magnifying glass. The problem of software
obsolesce is eliminated by it.
The following are the advantages of using microfilms:
• Image produced through microforms are generally too small to be read
with the naked eye. To make it reader-friendly, libraries must use
special readers used for projecting full-size images on a ground-glass or
frosted acrylic screen.
• With loss of clarity and halftones photographic illustrations reproduce
poorly in microform format.
• It is often very difficult to use the microfilm viewed by using Reader
machines. It requires users to carefully wind and rewind until they have
reached the point where the data being looked for is stored.
• As reader-printers are not always available, it limits the user’s
capability to make copies for their own uses. Also, users cannot use
conventional photocopy machines.
• A user can easily misfile a fiche when stored in the highest-density
drawers which is not available after that. To solve this problem,
microfiche is stored in a restricted area and can be retrieved when
needed and lower-density drawers with labeled pockets for each card
are used by some services.
• Color microform being very expensive, it discourages most libraries
supplying color films. Besides this, color photographic dyes are also
likely to degrade over a long period, results in the loss of information,
since color materials get usually photographed by using black and white
film.
• A user devoting some time in reading microfilms on a machine may be
prone to headache and/or eye strain.
• Microfiche, like all analog media formats, lacks the interesting features
found in digital media. While digital copies have much higher copying
fidelity, analog copies degrade with each generation. A user can also
index and easily search digital data.
The Digitization
83
Who is known as “father of Microphotography”?
How nitrate film was harmful in 1930s?
• Portable Reader
Portable readers are made by plastic and can be folded easily for
carrying; when open hey project an image from microfiche on to a
reflective screen. Following image is the example of the portable
reader.
The Digitization
84
Fig 7.2 : Indus 456-HPR portable Reader
• Reader Printer
A reader printer was developed in the mid 20th century. This reader
printer allowed for the viewer to see the microfilm, but also print
what was shown in the reader. Positive or negative films and
positive and negative images on paper can be accepted by
microform printers. Using new machines, a user can scan a
microform image and save it as a digital file.
The Digitization
85
Check your progress-2
Flat Film
Microfilm
To use roll films, the standard length is 30.48m (100 ft). The user can
store roll microfilm is open reels or put into cassettes. Around 600 images of
large engineering drawings can carry by one roll of 35MM film. Also one roll
of 35 mm film may carry 800 images of broadsheet newspaper pages. A film of
16 mm may carry 2400 images of letter sized images as a one stream of micro
images along the film set so that lines of text are parallel to the sides of the film
or 10,000 small documents, perhaps cheques or betting slips, with both sides of
the originals set side-by-side on the film.
Microfiche
Microfiche, an ISO A6 certified flat film, is 105*148 m in size. As
shown in (Fig 7.5). It contains a matrix of micro images. All microfiche are
read with text parallel to the long side of the fiche. In simple words we can
say microfiche is a sheet of film that has very small photographs of the
The Digitization
86
pages of a newspaper, magazine, etc., which are viewed by using a
special machine. There may be two types of frames: landscape or portrait.
Ultra fiche
Aperture Cards
The Digitization
87
7.7 IMAGE CREATION
7.7.1 Film
7.7.2 Cameras
All microfiche cameras are terrestrial with a step and repeat mechanism
to advance the film after exposure, the film is processed individually by hand
or by using s dental X-ray processor. For getting high output, cameras are
loaded with a roll of 105 mm film. The film, which is exposed, is developed as
a roll. Sometimes, this roll is cut individual fiche after processing or kept in roll
form to duplicate.
The Digitization
88
with the help of the machine, takes one document after the other automatically
for advancement. The documents are seen by camera lens as they pass a slot
and the film behind the lens advances exactly with the image.These cameras
records cheques and betting slips.
Flat film is the simplest microfilm camera that is still in use is a rail
mounted structure at the top of which is a bellows camera for 105*148 mm
film. The original drawing is held vertical by a frame or a copy board. The
horizontal axis of camera passes through the centre of the copy. It is designed
in such that the structure may be moved horizontally on rails.
In the dark room, a dark slide may be inserted with a film or the camera
may be attached with a roll film holder which after an exposure advances the
film into a box and cuts frame off the roll for processing as a single film.
COM devices are used when there are large amount of data.
Within the equipment, a light source makes images; this is the negative
of text on paper. The main advantage to using computer microfilm for
document archival is that a single microfiche card can hold 230 images, and a
1-cubic-foot storage box, containing 6,000 cards, can hold 1,380,000 images.
Another advantage of COM is that it gives best image quality at a very
reasonable cost as compare to paper printing. A microfilm plotter, sometimes
called an aperture card plotter, accepts a stream that may be sent to a computer
The Digitization
89
pen plotter. It produces corresponding frames of microfilm. They produce
microfilm as 35 or 16mm film or aperture cards.
7.9 DUPLICATION
Diazo Duplication
Diazo duplication is an economical, convenient method of reproducing
technical documents, blueprints, graphs, and textual materials of any format.
Diazo material is responsive to ultra-violate light but can be handle in hours of
daylight. Special photosensitive papers such as SK-5, SSN-2 and Mp types
which has high resolution, coloring and contrast are used to make diazo
duplication.
It is uncovered by placing it in touch with a master film and passing
extrapolative UV light through the master on to the copy film. Areas covered
The Digitization
90
by dark parts of the master are sheltered from light while those in adjacent with
clear parts of the master are sensitized.
Vesicular Duplication
Vesicular film and diazo film have the same characteristics. Vesicular
duplicating process employs diazo compounds, but the content which is light
sensitive is incorporated in a thermoplastic colloid.
The diazonium salt release nitrogen gas when exposed to UV light and
then heated to about 130oC. It forms minute bubbles within the emulsion layer.
Vesicular film is sensitized with a diazo dye, which after exposure is developed
by heating. The diazo compound present in the dark areas is destroyed quickly
even though the source of light remains unclear. The dissociation or breakdown
of the diazo compounds results in the generation of millions of minute bubbles
of nitrogen into the film. It helps in producing an image that diffuses light. It
produces a good black appearance in a reader; however, it cannot be used to
create further copies.
The Digitization
91
Following are some reasons to convert from microfilm to digital:
The Digitization
92
Transparent jackets are made A5 size each with six pockets into which
strips of 16 mm film may be inserted. The equipment lets an operator insert
strips from a roll of film, which is particularly useful as frames may be added
to a fiche at any time. Pockets are created by using a thin film, so that
duplicates may be made from the assembled fiche.
7.12 SUMMARY
Microfilm enables libraries to greatly expand access to collections.
Besides being compact, its storage cast is so far less than paper
documents. 98 document size pages normally fit on a fiche, thereby
reducing to about 0.25 per cent original material.
Color microform being very expensive, it discourages most libraries
supplying color films. Besides this, color photographic dyes are also
likely to degrade over a long period, results in the loss of information,
since color materials get usually photographed by using black and white
film.
Microfiche is a sheet of film that has very small photographs of
the pages of a newspaper, magazine, etc., which are viewed by
using a special machine.
Ultra fiche is generally used to store the data collected from highly
data-intensive operation, such as remote sensing.
Aperture card is basically punched with machine readable metadata
associated with the microfilm image, and printed across the top of the
card for visual identification.
Diazo duplication is an economical, convenient method of reproducing
technical documents,
blueprints, graphs, and textual materials of any format.
Vesicular duplicating process employs diazo compounds, but the
content which is light sensitive is incorporated in a thermoplastic
colloid.
Digital conversion is a conversion where microform is converted into
digital. It is possible by using an optical scanner, which projects film
onto a CCD array and captures it in a raw digital Format.
The Digitization
93
• Microform: A generic term that can refer to any medium, whether
it is transparent or opaque, bearing micro images.
The Digitization
94
Diazo duplication is an economical, convenient method of reproduc
ing technical documents,
blueprints, graphs, and textual materials of any format.
BIBLIOGRAPHY
20. Meckler,Alan marshall. 1982. Micropublishing.Westport, CT :
Greenwood.
21. Baker, Nicholson, 2001. Double Fold: Libraries and Assault on
Paper. New York: Random House
22. Dictionary.com Unabridged
8.0 INTRODUCTION
As we learn in the previous units about the digitization. Digitization is
the conversion of non-digital data to digital format. In digitization, information
is arranged into units. Also we know that microform can be either paper or on
The Digitization
95
film. Microform can contain micro reproductions of documents for
transmission, storage, reading and printing. Images of microform are generally
reduced approximately twenty- five times from their original document size.
Microfilms are one of the formats of microform.
The Digitization
96
respect to the s these elements. For bitonal digitization, the following formula
is used to calculate the quality index:
qi = (0.039h)/3, where his the height of the small 'e' in millimeters. The
resolution ‘a’ is represented in dpi.
For digitization with grayscale, the formula is:
qi = (a H 0.039)/2 is used for digitization with grayscale.
In the background, given the good quality reserves of the microfilm, it
will be sufficient for most purposes to aim for a digital standard form of
medium quality. As a result user then able to calculate the required
resolution based on the quality index qi = medium quality, e.g. resolution in
dpi a = 3 H 5/0.039h (here h is the height of small 'e'. Where the height of the
small 'e' is 1 mm, it throws a value of 384. Formula is a = 2 H 5/0.039h for
digitization with grayscale. Applying this formula value of 'a' is 256 for an
`e' of the same height. For digitization with grayscale, formula is a = 2 H
5/0.039h, which gives a value of 256. Often letters of this s (about 7 pt) are
used in footnotes.
As an indication, the aim should be 350 400 dpi for bitonal
digitization, 300 for grayscale. To decide the quality required for each
purpose, test runs with typical films should be used.
The Digitization
97
Another format known as The Joint Photographic Experts Group format or
JPEG format is often used to transfer half-tone and colour pictures, has
variable compression ratios that are all lossy, and hence, it should not be
recommended. This is a commonly used standard method of compression for
photographic images. JPEG uses lossy compression algorithms for images. It
specifies how an image is transformed into a stream of bytes, and not how
those bytes are encapsulated in any particular storage medium. Another
standard developed by the Independent JPEG Group, called JFIF (JPEG File
Interchange Format) mentions the method of developing a file suitable for
computer storage and transmission from a JPEG stream.
Lossy compression is not a method for long-term storage as it could result
in irretrievable loss of data while decompressing or while migrating from one
lossy compression to another. Lossy compression refers as when an image is
compressed and then uncompressed, the decompressed image differs from the
original scanned image.
The advantage of digital information stored in compressed forms is that it
occupies little area, thereby considerably minimizing the storage cost.
Distortions can be particularly severe for high compression ratios; it is possible
to find out the degree of loss by adjusting the compression parameters.
It is wise to agree with the service provider on the organization of the
material appropriate to each application because image data can be organized
differently. Conventionally, each picture is stored in a separate file. Collection
of related pictures in one file (multiple tiff) is possible only with documents
containing no more than a few pages.
To use additional data in the Internet, it is recommended to convert data
into platform-independent formats, which allow the inclusion of the widest
variety of documents. Today, such conversions are part of the service offered
by most of the specialist companies. Depending upon the requirement, this
format should be added to the contract.
The Digitization
98
showing the relative quality of various jpg settings and also compares saving a
file as a jpg normally and using a 'save for web' technique.
The Digitization
99
9660 for CD-R), readability independent of hardware is guaranteed for both
media. In the near future, the current storage capacity of 650 Mb per CD-R and
2 Gb per DAT tape will increase.
When loss-free compressed or uncompressed image data has been
secured on at least two data carriers then the digital conversion form is
dependably secured. It has been proved that their contents are identical and
readable with no difficulty. In the simplest case, the two data carriers (the
'primary data carrier' and the 'working duplicate') with the same content will
be created by repeated successive transfer of the image data.
It is essential to reach a binding agreement with the company
undertaking the digitization, which will help it store the transferred material
for at least as long as it takes for the customer to check and secure the data.
Multiple working duplicates should be created from it to ensure that
independent hardware is guaranteed for both media. Performance of a
decompression test for each stored digital copy further enhances data security.
The Digitization
100
• Reduction of the whole image
• Use of whole screen for display
• Option of return to the original image
• Image inversion
• Image rotation
• Display of technical information from the headers, such as picture
size, format, resolution, bit depth and print
Another important feature is that it is very necessary to the software should
be capable for converting image into other formats as well as for the
compression of the image. For instance, xv is available as shareware in the
UNIX world. Depending upon the hardware installed, suitable viewers are
contained in the supply range of the operating systems; e.g., hp-ux image view.
For PCs, Imaging for Windows is a feature that is available without any extra
charge with Windows 95. The other examples of suitable software are PixView
2.1 from Pixel Translation, Scan Mos uvp from ms Electronic Service, or, with
limits, Hijaak Pro 2.0 from North American Software.
The Digitization
101
8.7 LONG TERM PRESERVATION OF THE
DIGITAL CONVERSION FORM
(MIGRATION)
The Digitization
102
The option between digitization with a general rising of the resolution
on the one hand and with grayscale on the other has an indirect attitude on the
cost involved in conversion. Higher densities of data mean higher costs in the
supply of data, storage, and handling. It is also important to take account of
the consequential costs of any planned migration. If needed, it may be more
inexpensive to digitize a second time from the microfilm, rather than to
constantly migrate the data.
The cost factors mentioned take account only of digitization itself.
According to experience, further costs are incurred by manual turning,
splicing images out of the general frame and marking. Programming costs as
well as the initial cost of programming the film scanner must also be
considered based on customer requirements. Finally, there are the costs of
downloading data, the carrier medium, operating the CD-RR, and packing and
transport. The cost of production increases where individual work and image
enhancement using special software are necessary to improve quality
The Digitization
103
incomplete or irregularly printed letters results in disruption of text
recognition. Reliability also depends on the density of image information. The
greater amount of image information being processed increases the higher
recognition rate. Therefore, higher resolution in digitization can improve the
recognition rate, as with digitization in grayscale.
In essence, the quality criteria that have been mentioned also apply to
microfilm. To achieve high resolution and adequate contrast, the correct
standard background density and minimal ground shade are important.
Digitizing negative film avoids the disruption caused by dirt and scratches. In
practice, there has not yet been enough experience with machine text
recognition in conjunction with microfilm to allow the formulation of reliable
views.
8.10 SUMMARY
Digitization of microfilm should not aim at the best possible
result in the way that is mandatory for direct digitization of
endangered original material.
The reproduction quality of the digital conversion form will be
determined by the purpose for which it is to be applied where
good-quality microfilm is available as a long-term storage
medium.
To achieve high resolution and adequate contrast, the correct
standard background density and minimal ground shade are
important.
Digitizing negative film avoids the disruption caused by dirt and
scratches.
In practice, there has not yet been enough experience with
machine text recognition in conjunction with microfilm to allow
the formulation of reliable views.
The Digitization
104
• Data Compression: A process of encoding information using fewer
bits (or other information-bearing units) than an encoded
representation would use, through use of specific encoding
schemes.
• Joint Photographic Experts Group (JPEG): A commonly used
standard method of compression for photographic images. JPEG
uses lossy compression algorithms for images. It specifies how an
image is transformed into a stream of bytes, and not how those
bytes are encapsulated in any particular storage medium.
• Lossy Compession: When an image is compressed and then
uncompressed, the decompressed image is usually not quite the
same as the original scanned image. This is called lossy
compression.
• Microform: A generic term that can refer to any medium,
transparent or opaque, bearing microimages.
• Tagged Image File Format (TIFF): This type of file formats
stores images, including photographs and line art. Originally created
by Aldus for use in desktop publishing, the TIFF formats is widely
supported by image-manipulation applications, by publishing and
page layout applications, by scanning, faxing, word processing,
optical character recognition (OCR) and other applications.
The Digitization
105
Lossy compression refers as when an image is compressed
and then uncompressed, the decompressed image differs
from the original scanned image.
Image compression can be defined as the use of data
compression on digital images.
This process reduces the redundancy of the image data so
that data can be stored or transmitted efficiently.
BIBLIOGRAPHY
23. Dictionary. coin Unabridged (v 1.1). New York: Random
House, 2006. Meckler, Alan Marshall. 1982.
Micropublishing. Westport, CT: Greenwood.
24. Bourke, Thomas A. 'The Curse of Acetate; or a Base Conundrum
Confronted', Microform Review 23(1994): 15-17.
25. Saffady, William. 2000. Micrographics: Technology for the 21st
Century. Prairie Village, Kansas: ARMA International.
26. `Seidell MicrofihnViewer in Production'. American
Documentation1.2 (Apr, 1950): 118.
27. Arlitsch, Kenning, and John Herbert. 'Microfilm, Paper, and OCR:
Issues in Newspaper Digitization', Microform and Imaging Review
33, (Spring 2004): 59-67.
28. Baker, Nicholson, 2001. Double Fold: Libraries and the Assault on
Paper. New York: Random House.
29. Rider, F. 1994. The Scholar and the Future of the Research Library.
New York: Hadham Press.
The Digitization
106
Written by: Srajan
Structure:
9.0 Introduction
9.1 Unit Objectives
9.2 Working of a Scanner
9.3 Types of Scanner
9.4 General Features Of a Scanner
9.5 Types of Scanning
9.6 Processing of Scanned Document
9.7 Choice of Scanning or Digitization
9.8 Accuracy of Scanned Images
9.9 Scanning Products
9.10 Summary
9.11 Key Terms
9.12 End Questions
9.0 INTRODUCTION
In this unit we are going to learn about the scanners. Now a day we can
see scanner can be used everywhere like schools, offices, institutes, shops and
also in GIS. Scanning is basically a process that converts paper maps in to
digital format.
The technology used for this conversation is called scanning and the
device used for the operation is called scanner. We will learn the whole process
of scanning a document in this unit. We are also going learn about different
types of scanner that are available such as flatbed scanner, transparency
scanner, handheld scanner etc.
The Digitization
107
Describe the different types of scanner
Explain the general features of a scanner
Explain the processing of a scanned document
Explain the difference between scanning and digitization
Describe the accuracy of scanned images
Describe the scanning products
9.2WORKING OF A SCANNER
In this section we are going to learn how the scanner works. As we all
know scanning is the process in which we can copy contain on paper into our
computer. In scanner there is a beam of bright white light. This light passes on
the image by the scanner; the light is reflected back to the photosensitive
surface of the sensor of the scanner head. Each pixel transfers a gray tone
value. As we all know values given to different shades of black in the image
ranging from 0 (black) to 255 (white), i.e., 256 values to the scan board
(software). The software interprets black in terms of 0 and white in terms of 1.
Thus, a monochrome image of the scanned portion is obtained. The complete
image is scanned in tiny strips by the scan head as it moves forward; all the
while the relevant information is continuously stored by the sensor. The
software running the scanner receives the information from the sensor into a
digital image and is known as one pass scanning.
The most essential component of a scanner is its scanner head that can
move along the length of the scanner. The scanner head incorporates either a
charged-couple device (CCD) sensor or a contact image (CIS) sensor. A CCD
is composed of numerous photosensitive cells or pixels packed together on a
chip. For ensuring the best image quality, the most advanced large format
scanners employ CCDs with 8000 pixels per chip.
Scanning a colour image is different as the scanner head needs to scan
the same image for three different colours: red, green and blue. For scanning a
colour image, older colour scanners had to scan the same area thrice over for
these three different colours. This type of scanner is known as three-pass
scanner. Now most colour scanners, however, use colour filters to scan in one
pass scanning all these three colours at once. Theoretically, a colour CCD
works similar to a monochrome CCD. But each colour is created by blending
red, green and blue. Therefore, a 24-bit RGB CCD presents 24 bits of
information for each pixel. A scanner using the different three colours (in full
24 RGB modes) can normally make up to 16.8 million colours.
Now a new technology has emerged which is full width, single-line
contact sensor array scanning that enables the scanner to operate at previously
undoable speeds. With this new technology, the document to be scanned
passes under a line of LEDs that capture image.
The Digitization
108
Check your progress-1
What is the most essential component of a scanner?
What is a color range of grey tone?
The Digitization
109
Second, they’re meant for professional market, they rarely come
bundled with "value-added" software such as Photoshop. Third, and
most importantly, they have significantly better specifications. For
example, a typical mid-level flatbed scans at 600x1200 spi and 10-bits
per color, which result in scans of significantly higher quality. Some
mid-level scanners may also provide a larger scanning area.
• High-end flatbed scanners have more features which are more useful
to professionals. Features like a noise-free design, large scanning area,
high dynamic range, and high resolution. In addition, a premium price
is offered for these scanners.
2. Film Scanners
The Digitization
110
produced by digital CCD array. Using specialized hardware and
software, the analog video signal can be digitized. Video capture
software is very much similar to traditional scanning software, while
the hardware is usually a board that fits inside your computer.
4. Drum Scanners
Drum scan has the highest quality but they are very expensive. Drum
scanners are used by professional color trade shops for producing color
separations for high-end printing. For greater dynamic range and color
accuracy Drum scanners use PMT (Photo Multiplier Tube) technology
instead of CCD technology. In PMT, the document to be scanned is
mounted on a glass cylinder. At the center of the cylinder is a sensor
that splits light bounced from the documents into three beams. Each
beam is sent through a color filter into a photo multiplier tube where the
light is changed into an electric signal.
Even though they are costly, drum scanners offer features which are
unavailable to desktop scanners including direct conversion to CMYK,
auto sharpening, batch scanning, greater dynamic range, and huge
image scanning areas. Drum scanner are different from other scanners
because of their productivity. Drum scanners can produce more scans
per hour than a desktop unit because the process of scanning to CMYK
is done automatically.
5. Handheld Scanners
Hand scanners are portable and low price than a flatbed scanner. These
scanners generally plug into a computer's printing port, as opposite to a
SCSI port, thus allowing them to be conveniently transported between
workstation to workstation. Many people use them with a notebook or
The Digitization
111
laptop. The major drawback of hand scanners is that they are less
accurate than flatbeds. Reason behind this is they have weaker light
sources and often produce uneven scans. Now most hand scanners
provide an alignment template that helps users guide through the
process of scanning. To help stabilize its scanner, the manufacturer
ships a motorized "self-propelled" unit.
The Digitization
112
Using a standard fax-modem and proprietary Trio software, pocket-
sized device from Trio Information Systems lets you convert any fax
machine into a 1-bit scanner or printer.
Pacific Crest offers a business card scanner. As its name
suggested, this scanner is helpful for those who need to input and file
tons of business cards.
7. Digital cameras
The Digitization
113
Check your progress-2
What is a popular type of desktop scanner?
What are the features of high-end flatbed scanner?
What is Leaf’s Lumina?
What is the importance of Pacific Crest?
What is a full form of PMT used in Drum scanner?
The Digitization
114
an electronic apparatus such as a computer. The image that a user can scan and
convert is graphics, colored or black and white texts, and pictures.
1. Black and white raster scanning
It is a simplest way to convert a document which can be performed on
line drawings, reduce media, text or any one color document. This
method provides the best solution for archiving and storage projects, in
which documents are viewed and printed but never changed. Therefore,
it is an ideal solution as the first stage in a planned document
conversion project.
Applications
• Archival drawing libraries
• Electronic document distribution
• Vectorization templates
A user can convert drawings into files for quick and low cost
library access. But if the original drawings are of poor quality it will
create a problem. When a document is scanned, imperfections such as
background, dirt, residue, or stray markings on original source
documents are introduced and stored along with original drawing
content. These imperfections, besides reducing legibility, can enlarge
file size, often by a factor of two or three times. Much of the
background ‘noise’ and ‘dirt’ contained in poor quality source
documents are electronically removed by a raster clean-up process.
Clean file results in files that are smaller and easier to store and
retrieve. The media storage cost of Clean File is also less.
2. Grayscale and color raster scanning
Grayscale and color images can be quite large. A user must make sure
that the system has the ability to handle files whose size is often
measured in terms of megabytes because every pixel is virtually
populated with a value. If a user tries to compress the file, it results in
little or no reduction in file size.
3. Grayscale or color scanning
Grayscale or color scanning is most commonly used to:
• Load background images into high-end drawing or mapping
software as an information base for advanced project work.
• Capture images for use in desktop publishing applications.
• Analyses the frequency of the color ranges, mainly for infrared
and vegetation photos.
Applications
• Navigation Charts (air and nautical)
• Full color maps
• Aerial photography
The Digitization
115
• Brochures and artwork
• Toposheets
• Full color maps
• Cartographic base data for high-end mapping systems
Sometimes, a user needs to collect only the selected information from
source documents such as toposheets and other colour originals. This
information may include components such as hydrology, oil and gas fields,
contours and transportation networks. Instead of using black and white printing
plate separately, separate images of map features are created that can be
differentiated by colour. For example, you can extract elevation contours from
a colour image of a toposheet. This extraction process is much faster; therefore,
the manufacturing cost involved is more than capturing data directly from the
colour image.
In addition, the colour image can be preserved for using as a visual
background reference or just as archived information. The resulting file is
more manageable and much smaller than the image containing all the colours
found on the source document.
Scanning can be done for both the raster as well as vector images. For
raster images scanning converts an image into an array of pixels, thus creating
an image in raster format. When an image is created by a series of pixels and
also arranged in rows and columns, the image is called raster file. The image
is captured by a scanner by assigning a row, a column, and a colour value (a
grayscale, black or white, or a colour) to each pixel. A continuous image is
'painted', one pixel at a time and also one row at a time. The concept
associated in raster scanning is 'resolution'. Resolution means number of
pixels per each in the image. Documents are generally done at resolutions
between 200 and 500 ppi for scanning of large formats.
If user wants the higher quality then image required higher resolution
which increases file size. An increase in the resolution from 200 to 300 dpi
increases the size of the file not by 50 per cent but by 125 per cent, from
The Digitization
116
40,000 to 90,000 pixels per square inch. A black-and-white scale scan needs
less storage than a grayscale scan at the same resolution, and a colour image
needs even more.
The Digitization
117
Though scanning is fast and easy and speed yet the resulting raster
images are without the intelligence required for vector-based GIS. To keep the
file to a manageable size, a fair degree of operator expertise is also required to
apply compression techniques. An operator can apply vectorization
automatically or interactively to produce intelligent vector files.
Table digitizing gives advantage to employ low-cost digitizing
equipment. However, operator training is required to obtain good results. On
the other hand, the procedure is laborious and time-consuming; therefore, it
costs more.
One more option, such as raster-to-vector conversion and pattern
recognition in this trade between cost, quality, productivity and usability.
While scanning, digitization of table needs bulk conversion from text
documents to line art and even video images. Advanced techniques have been
developed to enter material from other sources. These techniques range from
simple programs facilitating the keyboard entry of survey co ordinates to the
techniques reconciling aerial photographs with base maps. Remotely sensed,
photogrammetric and CAD-generated data represent further potential as input
sources.
Scanned images being the main source of input data for GIS, have
increased the use of scanners in the GIS environment. The excessive use of
scanners in this environment has further forced to give a second thought about
the limitation of scanners in producing accurate scanned images. Since most
GIS software need very specific accuracy, the accuracy of input data has to be
quantified before its use. Generally, the average GIS database requires the
input data be accurate to at least 0.018". It indicates that the location of an
input data must fall within 0.018" of its actual geographic location at the scale
of the map. Therefore, a scanner cannot produce more positional accuracy
error than the maximum error permissible in the GIS. A user can easily
quantify standard accuracy issues such as source availability, media stability
and differences in data collection mechanisms and can decide whether the
resultant data is acceptable for their GIS before integration. With the recent
entry of scanned data, a new issue has been raised to be dealt with—the
accuracy of the input scanner. Scanners still have the tendency to be quite
expensive; therefore, the effect of scanning large amounts of data that do not
meet the accuracy requirements of the GIS can be damaging. Given this, users
must be capable of measuring the accuracy of their own scanners and service
providers must be capable of proving the accuracy of the scanner to their
customers. The accuracy of a scanner is defined as its capacity to produce an
image with output dimensions exactly proportional to input document. A user
can dimensionally correct the scanned image within the specified tolerances;
however, it cannot be predicted about the data within the body of the image.
Even though the image may have exactly the right amount of pixels, features
inside the image may be approximately three-or four-tenths of an inch from
The Digitization
118
their correct location at the scale of the map, despite the fact that the scanner is
operating within stated accuracy specifications. Depending on the scale of the
source map, three-tenths of an inch can translate to several hundred meters of
error on the ground. This specification is usually unacceptable for any GIS.
Hence, it is necessary to have an idea about the accuracy of the scanned
image, so that corrective measures can easily be incorporated in the analysis.
The Digitization
119
5. Widecom scanners
Widecom is one of the leading providers of high performance wide format
scanners. The following are the main features of the scanners: It can scan
documents up to a 1/ 2-inch thick at high speeds and can digitally save items
such as foam boards, artwork, and other unusually thick documents. The
advanced filters such as deskew, diffuse, despeckle, and sharpen, individual
RGB adjustment, smooth, and Gamma Correction of WIDECOM help in
providing better scanned documents.
9.10 SUMMARY
A scanner is an electronic input device that converts analog information
of a document such as a map, a photograph or an overlay into a
computer usable digital format.
Scanning automatically captures map features, text, and symbols as
individual cells or pixels, and produces an automated image.
The essential component of a scanner is its scanner head that can move
along the length of the scanner. The scanner head incorporates either a
charged-couple device (CCD) sensor or a contact image (CIS) sensor.
A CCD is composed of a number of photosensitive cells or pixels
packed together on a chip. F
Providing better image quality, the most advanced large format
scanners use CCDs with 8000 pixels per chip.
Colour scanners use colour filters to scan in one pass scanning all the
three colours red, green and blue at once.
A new technology has emerged which is full width, single-line contact
sensor array scanning that enables the scanner to operate at previously
undoable speeds. With this new technology, the document to be scanned
passes under a line of LED's that capture image.
Different types of scanners are available for performing similar jobs;
however, they handle the job differently using different technologies and
the results they produce depend on their varying capabilities.
The most popular type of desktop scanner is the flatbed scanner.
Reflective art is mostly scanned by using flatbeds.
The Digitization
120
9.11 KEY TERMS
• Digital cameras: Used to take photographs of three-dimensional
objects; much like a regular camera, users are not required to wait
for film developing and processing.
• Drum scanner: used by professional color trade shops for
producing accuracy. Drum scanner use PMT (photo Multiplier
Tube) technology instead of CCD technology.
• High-end flatbed scanners: these substitute for drum scanners and
provide features that meet user demands such as a noise-free design,
high dynamic range, large scanning area, and high resolution.
• Leaf’s lumina: Lumina is actually a scanner, but it resembles a
digital camera. Lumina uses standard Nikon bayonet lenses; and
therefore, it is extremely flexible.
• Transparency scanners: Multi-format transparency scanners
enable you to scan everything from 35mm slides to 4*5”
transparencies.
• Widecom scanners: One of the leading providers of high
performance wide format scanners.
The Digitization
121
Leaf's Lumina camera cum scanner is recently developed
equipment. The Lumina is actually a scanner, but it
resembles to be a digital camera.
This scanner is helpful for those who need to input and file
tons of business cards.
Photo Multiplier Tube
BIBLIOGRAPHY
30. Digital Projects Guidelines. Arizona State Library, Archives and Public
Records http://www.hb.az.us/digitaV
31. The NINCH Guide to Good Practice in the Digital
Representation, and Management of Cultural Heritage Materials
(Version 1.1 ofthe First Edition, published February 2003,
http://www.nyu.edu/its/humanities/ninchguide/)
32. RLG Tools for Digital Imaging
http://www.r1g.org/preserv/RLGtools.html
33. SOLINET. Disaster Mitigation and Recovery Resources
http://www.solinet.net/ preservation/preservation templ.cfm?doc_id=71
c.
10.0 INTRODUCTION
As we discussed in earlier unit about what digitization is and other
thing related to digitization. In this unit we are going to learn how to manage
that digitizing project. As we know digitization is the conversion of non-digital
data to digital format. In digitization, information is arranged into units.
The Digitization
123
factors that affect the digitization project management and identifies methods
for successful address of these issues.
The Digitization
124
a collection management policy, it should also have a policy for generating
and maintaining digital assets, which form new type o f valuable ' collection' .
The policy should define at least the following:
• Copyright and legal policies for staff
• The method of managing digital images after they are
created
• The method of documenting image content and technical
information
• Plans for safe conservation, storage and preservation of
surrogate images and master images to ensure their longevity
• Making plans for migration to new formats and technologies
as needed
• Making plans for digitization and documentation of new
objects
The policy should be reviewed periodically to determine whether project
plans or policies need any adjustment.
The Digitization
125
Decisions regarding these requirements should be made well before the
start of the process. This is because the method of using images will determine
their quality and the resolution required, which will later affect both the choice
of scanning technology and overall system requirements.
Though future use determines the choice ofquality and resolution,
images digitized at the highest resolution possible serve the greatest number of
purposes.
Evaluating assets
A careful assessment is required of what images the organization currently
has, considering the following questions:
• In what formats are those images?
• What objects have already been photographed?
• How are the images stored?
• Are digitized images from a previous project available?
• What is the quality of the images?
• At what resolution have the digital images been stored?
To determine what images are held in all parts of an organization and to
know what formats these images are currently available, a survey of all of the
photographic holdings should be carried out. In a large organization, generally
all the departments have images for their own use; whereas a smaller a
organization possesses fewer existing digital or photographic resources.
Next is an assessment of the currently available images. Digitization of
already available images such as colour transparencies will cost less and
consumes more time than beginning 'from scratch'. If you have scanned
images from photographs or transparencies, it is recommended to use only
good-quality images. Retaking of the photographs of some objects may be
required in case if the images are not in a good condition or do not represent
the original object properly. The images that are good, professionally
photographed images created with a colour bar or greyscale should be ideally
digitized.
Even if the digital images exist already; you have to make sure that their
resolution is appropriate for the current needs, and whether the related
documentation is sufficient. New photography adds significantly to the time
and money required for a digitization project, especially when the objects to
be photographed require considerable time for preparation. For instance, large
objects, such as canoes, may need to be transported from storage to a suitable
place to be photographed; complex objects, such as costumes, may require a
great deal of preparation.
Appropriate information (metadata) relating to object in an image,
technical capture information, and attribution must be provided or created
simultaneously as images are produced to present for more information. The
procedures followed in the documentation require significant amounts of staff
The Digitization
126
time, but are important for the longstanding success of an imaging project and
the future management and repurposing of the digital assets created- in the
project.
The following are the other important aspects of an evaluation of assets:
• Think about the quality of documentation available for each image.
• Ensure that the institution has copyright to both the photograph and
the object.
• Survey the current software and equipment.
• The requirements for physical space (both physical space for staff
and equipment and disk storage spaces) should be considered.
• Examine the existing staff resources to help define needs.
Understanding the importance of planning:
If we are to use digitization as a tool to provide worthwhile, enduring
access to some treasured cultural and historical resources, we necessarily must
take time at the beginning to become informed, to establish guidelines, and to
proceed in rational, measured steps to assure that such reformatting of visual
matter is accomplished as well and as cost-effectively as possible.
Once the current image assets of the organization are determined, the
scope of the project must be defined.
Only few organizations systematically digitize all or very large parts of
their collections, but most organizations take a 'project' approach to
digitization. Whether the aim of the project is to digitize all or only part of the
collection, before the start of the proceeding, a plan outlining what is to be
digitized and in what order is needed should be formulated.
Digitizing projects successfully require sufficient resources, including the
following:
• Trained personnel
• Digitization technology and equipment (hardware and software)
• Adequate physical space for the process
• Funding
Consider also the following issues:
• Does the text to be digitized have enough intrinsic value to
guarantee digitization?
• What institutional or project goals (institutional process, internal
or external visibility) might be secured by digitizing?
• Will the process of digitization significantly facilitate or
increase use by an identifiable constituency?
• What are the benefits/costs of digitizing images vs digitizing an
entire collection for which there is a particular requirement?
• Does the existing product meet the identified needs?
The Digitization
127
• Are rights and permissions for electronic distribution secured or
securable?
• Does the current technology produce images of high quality to
meet the stated requirements and uses?
• Does technology permit digital capture from a photo
intermediate? Does the project required to begin 'from scratch'
with either a new photo or using the technology of digital image
capture?
• Does the institution have capability in the necessary
technology?
• Will all or part of the collection be digitized to support effectual
collection management practices or public access to collections
information?
• How will the objects to be digitized be selected?
• Will the ongoing activities or exhibit development help
determine what objects are digitized?
• Will digitization take place in-house or be outsourced?
• What quality of digitization is needed? Is the cost reasonably
priced? What compromises might be needed between cost and
quality?
• How will digital objects be categorized and stored? What
metadata (or information) about each one will be inserted?
• How will they be linked to the original object? How will digital
objects be searched for and located once they have been
created?
• How will digital assets thus created be managed on an ongoing
basis?
The Digitization
128
the amount of documentation can dictate how the images themselves are used.
Because of poor initial choices of technology or documentation, the images
should be rescanned in a few years to make the project successful.
The following broadly defined tasks or phases should be part of the overall
Plan:
i. Planning
• Define the purpose, goals, scale and scope of the project.
• Survey the current images to assess the strengths of the collection.
• Evaluate the current documentation and standards used for creating
it.
• Analyze technical standards.
• Look at available equipment for inventory.
• Set priorities.
• Develop and document a plan, including the workflow strategy
• Identify the staffing needs.
• Assess the costs and implications of implementing projects in-house
vs contracting out the work.
• Secure funding.
• Select/hire/recruit and train staff to form a working group or project
team.
The Digitization
129
• Where photographs already exist, scan the photographs of objects
(or send them to an outside source, with explicit instructions about
requirements).
• Store high-resolution images securely.
• Perform quality control and evaluation.
Make a realistic timeframe for the project, realizing that the time allocated
to each stage depends upon the size of the collection, the preparation time
required, the staff available for the project and the current state of the
documentation and collections management system. It is also required to
decide whether all the matter needs to be digitized or only parts of the
collection.
Prioritizing the work
Although the long-standing goal is to digitize the entire collection yet the
project can probably be accomplished over time in accordance with financial
and staff constraints. The work to be carried out should be prioritized in
accordance with the project plan defined earlier. Usually, priority should be
given to the following:
• Images for which the copyright clearance of both the object and
the image is available.
• Iconic images much associated with the institution
• Images for which good documentation is available.
• Objects used in exhibits, current or upcoming projects
• New objects
• Images of the museum that could be developed into a
promotional publication or virtual tour
The Digitization
130
• Well-maintained collections of particular significance or special
public and/or educational appeal
• Images depicting a certain theme or following subject area
• Natural groupings in the collection
Documenting the plan
It is essential to document the plan and process. Normally, a project plan
consists of a timeline, which indicates the start and end dates for the major
activities as well as milestones or major deliverables. Staff members or
departments responsible for each activity may also be identified by the
documentation. Moreover, this documentation would be of much help in
identifying the appropriate staff in case some members leave the organization.
While devising a long-term strategy, the key plan should include periods
of assessment for determining whether strategies should be altered. A well-
defined project uses resources optimally, thereby yielding good results.
Defining the resources required
A digitization project has an impact on staffing, budget, workload,
available space and equipment. It is essential to hire or train staff with the
necessary skills (at the least, to document and manipulate images, if the work
is outsourced). In case an existing staff is trained, it is important to assess how
the ongoing workload will be affected. Consider how the digitization project
will affect the overall plans of the institution and if the institution has other
major plans that need to harness more of these resources.
Skills required
The following are the skills required in a digitization project:
Collections management/subject specialists
• Knowledge of cultural material documentation practices
• Descriptive information about objects and information about images
• Familiarity with requirements for reproducing cultural materials
• Cataloguing and properly documenting digital objects
Administration
• Project leadership
• Project management
• Supervision of production
Preparation
• Preparation of detailed instructions for digitization, whether the
work is accomplished in-house or outsourced
The Digitization
131
• Preparation of objects for digitization
• Preservation, archiving and disposal of digital objects
Systems support
• Technical expertise in the operation of digitization hardware and
software
• Experience with image scanning, processing and quality control
Reproduction services
• Performing quality reviews and upgrading the procedure of
monitoring digitization
In small organizations, the same people perform many of these tasks;
some of them may be volunteers. In other cases, many of these works may be
outsourced.
Securing preservation through proper storage
Digitization helps preserve original materials. It is unnecessary to expose
objects to handling and light very frequently. If one has a high-quality digital
image, he/she may derive other image formats from it.
However, high-resolution digital assets have preservation and storage
needs as well. Although it may still be possible to retrieve the images in the
future, it is better to plan for the preservation of the image collection.
Retrieving images in the future become costly if the software or hardware
used to store and retrieve them become old-fashioned.
Considerable storage space is required by digital images; it costs more.
This should be added to the project budget. High-resolution archival copies of
the images should be kept even if the most important aim is to add images to
the collections management system at low-resolution. A separate storage area
other than that of the working collection management is required for high-
resolution archival copies of the images. Theses archival copies should be
stored in a fixed medium such as CD-ROM, Digital Versatile Discs (DVD),
and tape backup or related device. Even though such storage mechanisms cost
much during implementation, they eventually turn out to be profitable in the
long run. There is a need of prioritized plan, with built-in review periods to
assess potential charges to technology and storage media.
Establishing responsibility
For any digitization project become successful, it should have strong
support from the management.
The capabilities of the current staff and their interest in learning new
technologies must are realistically assessed. The project leader can survey
various divisions of the station to make sure that staff members understand the
goals of the project. The managerial and departmental tasks will change as
new priorities are fixed and new obtained. Instead of forcing staff members to
take on new assignments that they had not anticipated assuming, it is much
The Digitization
132
better to stress the positive opportunities for professional development that the
digitization project makes available.
After determining the responsibilities for these tasks, it is important to
ensure members understand that all staff members understand that this
responsibility has been assigned. Proper communication among the staff
members is the key to a successful project.
Ensuring access
Technology keeps changing. Refreshing and migrating data are the two
recommended strategies for avoiding obsolescence. They are described as
follows:
• Copying of digital files from one storage device to another of the
same type is known as refreshing. It is all the same like creating
a duplicate CD-ROM. When the digital files are in a non-
proprietary format and independent of hardware and software
this method becomes viable. In order to read non-proprietary
formats hardware and software will still be required. When the
files in a proprietary format are refreshed, problems may arise as
the specifications of the file format may have changed. In these
cases, there may be difficulty in accessing the files.
• Changing or converting data into newer or non-proprietary
standard formats and then transferring onto a newer type of
storage media is known as data migration.
The above strategies protect valuable data. However they also entail some
cost involving time and equipment.
The Digitization
133
the rights can be obtained. First, ensure that the organization
holds the copyright on the photograph through an agreement
with the photographer. Second, negate these rights when the
photograph is digitized in the later phase. If the photograph
which is being digitized falls into the public domain, then these
authorizations are no longer needed.
• When the digitized image will be modified: In the due course
of digitization, if the image is somehow modified or either
cropped or discolored then rights associated with copyright, such
as moral rights, may become an issue. Moral rights are always
held by the artist or author of the original work which is the
subject matter of the image. When the copyright of a photograph
has been assigned to another party photographers also hold moral
rights in their photographs. Moral rights can never be transferred
from one thing to another but can be waived since they run for
the length of the copyright.
In the above cases if the image is manipulated by discoloring,
cropping or modifying in any way that may prejudice the artist or
creator or photographer, the organization should be ensure that it
obtains a waiver of moral rights from the artist or creator and/or
photographer. The moral rights of the artist or creator or
photographer are no longer an issue if the work—the image or the
photograph being digitized—falls into public domain.
The Digitization
134
• Visible and invisible watermarking
• Digital fingerprint
• Various rights management systems along with the secure
container technology
• Encryption technology
The Digitization
135
• Quality control
• Ongoing maintenance
Cost sharing with another institution along with pooling resources for
equipment and/ or staff costs may be helpful to consider.
The largest expense will be the subject expertise required for
documentation, locating, reviewing and assembling source material, preparing
and tracking it, and quality control rather than the actual scanning or
photography. In order to do the project in-house, the cost will be in the form of
training the current staff, hiring new staff and purchasing new equipment. The
best way to do the image manipulation is by investigating possibilities such as
hiring interns or students from a community or technical college. Salary of
each member of the team involved in the project can be estimated on hourly
basis in case of short-term projects. Imposing work on a particular staff may
lead to stress; so redistribution of task is very much essential for these types of
projects. Though a project is contracted out it may still require some staff
training to carry out the work.
Expertise staff is required for preparation time of project, transportation
of heavy objects, unbinding manuscripts, conservator checking of objects for
damage, the photographic setup if photographs suitable for scanning are not
available for all the objects.
Digitizing Images In –house v/s Contracting Out
Advantages:
In-House
i. Retain control over all aspects of imaging
ii. Some flexibility in defined requirements
iii. Learn by doing and developing in-house expertise
iv. Build production capability
v. Security of source material
Contracting Out
i. Lower labour cost.
ii. Costs of technological obsolescence are absorbed by the digital service
provider.
iii. Expertise and training of the digital service provider.
iv. Set-up cost per image, prices can be negotiated based on volume which
facilitates budget and project planning.
v. Limited risk.
vi. Variety of options and services.
Disadvantages:
In-House
i. Limits on production capabilities and facilities.
ii. Institution incurs costs of technological obsolescence.
The Digitization
136
iii. Need to set up technical infrastructure: space,
digitization equipment, and computers.
iv. Larger investment.
v. No set-up price per image.
vi. Impact on other activities.
vii. Need for trained staff, training.
viii. Institution pays for equipment, maintenance and
personnel rather than for product.
ix. Equipment support.
Contracting Out
i. Quality control not on site.
ii. Images will still need to be manipulated by museum
staff. Random samples of the images produced should be
conducted.
iii. Possible inexperience with organization needs.
iv. Transporting material—security and handling issues,
especially with 3-D objects.
v. Needs must be clearly defined in contract or there will
be communications problems.
vi. Vulnerability due to instability of digital service
providers (companies in business for over 2 years are
considered viable).
Museums are more capable of managing their collections. They make use
of the proper database management technologies and documentation in
conjunction with digital imaging projects. The type of data along with the
digitized material determines how they can be searched, sorted and displayed.
10.5.1 Metadata
An indispensable part of any responsible digitization program is known as
metadata. For high-quality metadata standards and for various purposes a
considerable attention has been paid to the definition of metadata. In many
instances institutions will already have substantial metadata about the analog
The Digitization
137
object (for instance, catalogue records) much of which can be applied to the
digital object. The availability of accurate metadata is as important as the
digital surrogates themselves which can be helpful for accessibility, usability
and effective asset management. The cost of creating a metadata will be
reduced by building on existing metadata. While selecting material for
digitization you may wish to give priority to the material for which partial
metadata already exists.
The Digitization
138
Low resolution formats are required for the digital images used for
visual inferences in an electronic database, such as the World Wide Web.
Without having to a repeat the image capture process many master images,
surrogate images, or working copies, can be produced for a variety of
purposes. For thumbnail access, a lower linage resolution may be required. A
substantially higher resolution image may be required 'Ex digital images of
high-quality printing. Each type of surrogate image may require afferent
image editing and enhancements.
First it must be necessary to determine the intended uses for the images
to ascertain the quality required for digital imaging. The larger or more
detailed reproductions require images of higher quality and the most common
use for digital images is to make them available over the World Wide Web as
low quality thumbnail images via a collections management system. Digital
reproduction or printing is less common, but is increasing in importance.
Detailed analysis of works of art, etc., requires substantially higher quality of
images especially images for conservation work.
Depending on the requirements previously determined such as the
resources available, the size and scope of the project are the recommended
rule of thumb to capture images in the highest quality feasible.
10.5.3 Preservation and Storage Standards and Guidelines
In the overall digitization project the preservation and storage of digital
assets must be an integral component. For continued access to digital
resources long-term provisions should be allowed.
A significant factor in determining image quality and image storage
requirement is the resolution of the images, along with the colour depth.
High-quality images such as digital master images require substantial
amounts of computer storage. Therefore, higher the quality of image selected,
the greater the storage requirements. Generally much less storage space is
required for surrogate images created from master images.
Offline or semi-online storage formats are generally used when the
master images are generally stored offline and are accessed rarely. Although
CD-ROM is a common storage device but it has a limited space for storage
digitized data. Nowadays, Digital Versatile Discs (DVD) has become quite
popular as they far exceed from that of CD-ROM's storage capacity:
Although digital tape has the drawback of relatively slow access but it
is also another format used primarily for large storage requirements. Digital
Audio Tape (DAT) and Digital Linear Tape (DLT) are the common formats
of digital tapes. This allows digital access as large-capacity jukeboxes (large
CD changers) are also available for each of these formats.
One of the devices that can be used to store digitized images is the
magnetic tape. However, it is relatively impermanent owing to its inherent
instability, which leads to chemical deterioration and physical wear from
use. Failure of optical discs may take place because of warping, corrosion or
cracking in the reflective layer, dye deterioration, or delaminating.
The Digitization
139
Cooler and dryer storage conditions will extend life expectancy as
storage conditions are also important in preserving digital images. The
conditions which are required for storing the digitized devices are
temperatures in the range of 10-20° C, and a relative humidity between 20
per cent and 50 per cent. As a security measure, a backup copy of all
masters should be generated and stored offsite.
The key factors which are responsible for the transmission of digital
images are the size of the image files and the speed of the network.
Therefore, smaller the size of the image file, faster would be the access.
Display monitors are mostly low-resolution devices and the primary reason
for transmission of images is for their display. Therefore, for both internal
and external networks such as the World Wide Web a low-resolution
surrogate image should be created for the display.
The Digitization
140
which must be considered while choosing a PC equivalent imaging
workstation are as follows:
• CPU: A Pentium 400 MHz or better processor is
recommended for intensive image editing since digital images
make heavy demands on the central processing unit and
results in slow function of the system.
• RAM (Random Access Memory): A 30-MB image file
requires 90 MB of memory. Therefore, an advanced imaging
software application is normally required which is three times
the size of the image file. Moreover, more memory may be
required if additional software is used simultaneously.
• Disc storage: Auxiliary storage is also recommended using
high-density floppy drives such as Zip drives and a CD-ROM
writer. Therefore, storage requirements are at a premium
while working with large image files.
• Display monitor: Monitors should be as large as possible and
be capable of displaying 24-bit colour (16.7 million colours)
which supports a 72 Hz refresh rate, and have a video board
with sufficient memory. Hence, this is a major part of the
system for image processing and verification.
• Image software: In order to optimize images, high-end
imaging software such as Adobe PhotoShop should be used.
There are also several types of freeware and shareware
products that are available on the Web.
As mentioned earlier, both time and expertise are required for large
objects to be moved from storage to the photographic set-up and some objects
(such as costumes) require installation with other objects. Hence, time and
skill are two major things which are required for imaging projects that entail
either traditional or digital photography of three-dimensional objects. While
photographing three-dimensional objects different views of the same object
maybe required. In order to avoid unnecessary delays all the equipment,
including supports and accessories, should be on hand before photographing.
The materials should be reviewed before an imaging plan is decided
upon while capturing two-dimensional objects. Historical photographs, not
photographs of objects in the collection, maybe scanned directly. Before the
image capture technique is decided medieval manuscripts may require more of
cautiousness as they are more delicate and may require expert curatorial and
conservation help.
Pre-scanning quality control is most important while achieving the
highest quality digital images. The projects which require scanning of images
already on hand will require some staffs to check the images for quality and to
The Digitization
141
ensure that the images are not blemished and the accession numbers are
correct.
When the pilot studies were carried out for the workflow process and the
projects were interviewed the outcome was a benchmarking. These were
undertaken for a variety of reasons:
• Technical forecasting
• Technical feasibility
• Training needs
• Workflow analysis
It must be remembered that there may be no corresponding benefit and will
vary for different types of content while considering technical forecasting or
prototyping, particularly in relation to costs. As such the cost-benefit may
simply be realized by the ability of the project to pay back debt by making
small payment in regular intervals on the equipment. Few projects charge users
for the digital deliverables. A device that enables the digitization of material
that previously could not be captured, such as a 3D modeler, may not make
financial sense if a project has to be built based on in a profit or depreciation
margin. Therefore, a new high-resolution camera may pay dividends for fine
textual or line art material, but not so for colour images. The public access
benefit may outweigh the financial costs if the device makes an important
collection more widely available.
It is important to build a project design and development cycle where any
form of pilot study is undertaken.
It is important to include all the relevant costs, not just the obvious items
such as equipment and staff time if you are considering using a cost model. A
checklist of the factors should be built into a cost model. While digitizing an
image collection, for instance, one maybe well generated in a number of
different ways of digital objects—archival masters, delivery masters,
thumbnails and other deliverables—which in turn will require storage,
tracking, documentation and upkeep. Digital asset management is the area
where one must be aware of cost estimates. Significant commitment of
resources is required in a process which needs to be planned carefully.
Typical imaging projects consist nearly about 50,000 images or more. With
this quantity of data, planning for managing the digital assets must become an
integral part of the overall digitization project. Digital imaging projects must
The Digitization
142
include a policy for managing the digital assets as mentioned in the planning
process.
A formal review process should also be designed for metadata due to the
image quality. Issues such as who will review the metadata, the scope of the
review, and how great a tolerance is allowed for errors should be asked.
It is less likely that automated techniques will be as effective in assessing
the accuracy, completeness and utility of metadata content (depending on its
The Digitization
143
complexity), which will require some level of manual analysis. Practical
approaches to metadata review may depend on how and where the metadata is
stored and the extent of metadata recorded. Rather than machine evaluation
skilled human evaluation is required ho access metadata quality. However,
some aspects of managing metadata stored within a system can be monitored
using automated system tools.
The following areas can serve as a starting point for metadata review
although there are no clearly defined metrics for evaluating quality of the
metadata. In general, it is good practice to review metadata at the time of image
quality review.
• Adherence to standards set by institutional policy or by the
requirements of the imaging project
• Procedures for accommodating images with incomplete metadata
• Relevancy and accuracy of metadata
• Consistency in the creation of metadata and in interpretation of
metadata
• Consistency and completeness in the level at which metadata is
applied
• Evaluation of the usefulness of the metadata being collected
• Synchronization of metadata stored in more than one location
• Representation of different types of metadata
• Mechanics of the metadata review process
Specifically, we consider:
• Verifying accuracy of file identifier
• Verifying accuracy and completeness of information in image
header tags
• Verifying the correct sequence and completeness of multi page
items
• Adherence to agreed-upon conventions and terminology
4. Documentation
Quality control data should become an integral part of the image metadata
at the file or the project level such as logs, reports, and decisions should be
captured in a formal system. This data may have long-term value that could
have an impact on future preservation decisions.
5. Testing results and acceptance/rejection
If more than 1 per cent of the total number of images and associated
metadata batch are found to be defective for any of the reasons listed above,
the entire bat should be re-inspected based on the randomly selected
sampling. Any specific err should be corrected found in the random
sampling and any additional errors found the re-inspection. The specific
The Digitization
144
defective images and metadata that are found shoo be redone if less than 1
per cent of the batch is found to be defective.
In order to preserve the integrity of digital objects and to retain the ability
to retrieve, display and use them, the data must be transferred to new media
types and formats.
The storage media should be subjected to inspection at regular intervals
to detect any deterioration. Moreover, continuous review should be done about
the latest technology so that images which are at the risk of becoming obsolete
must be migrated to new media or format.
A digital image must be a top priority for any digital preservation
strategy for protecting the integrity. The preservation of images can be done in
which content, defined in term of structure and format, poses integrity
problems for digital archives. Planning of a migration strategy becomes
difficult, as it can be very difficult to anticipate when migration is necessary,
how much formatting is required, and how much the entire process will cost.
Data quality can be degraded by the process of migration itself and this fact has
implications for the overall integrity of the data.
Both the master images and the surrogate images must be considered for
storage. Secure off-site storage is essential. A backup strategy including all
image formats must be put in place for all data created which includes all work
in progress during image creation and image enhancement phases.
The Digitization
145
10.11 SUMMARY
• In the technology domain, change and unpredictability are facts
of life, and often represent opportunities rather than disasters for
a well-planned project.
• Before an institution embarks on a digitization project, it should
allocate adequate resources of time and money. In addition,
future requirements should be taken into consideration, so that
future options are not limited by rapid technological change.
• Implementation of a digitization project in several stages can
provide the flexibility to accommodate possible alternatives
along the way.
• Establishment of a policy for the management of digital assets
should be part of the planning process. The policy should be
reviewed periodically to determine whether project plans or
policies need any adjustment.
• Prior to digitization, the targeted users of images, both inside and
outside the institution, should be determined. Also, the users
should be involved in the development of the project, if possible.
• Identification of potential internal uses will help carve out the
digitization strategies of the institution.
• Images may connect to collections management systems for the
illustration of artifacts and collection records for loans, insurance
and other collections management functions.
• Whether the aim of the project is to digitize all or only part of the
collection, before the start of the proceeding, a plan outlining
what is to be digitized and in what order is needed.
• Digitizing projects successfully require sufficient resources.
With latest technologies, digitized images can not only be made
available and accessed via the Internet, but also reproduced
quickly and with astonishing clarity more than ever.
• In order to create digital images no published standards or
guidelines is required for determining the level of image quality.
• Based on the requirements identified and adequate power to
handle high-resolution images, the choice of computers for
imaging projects should be made.
• It is important to include all the relevant costs, not just the
obvious items such as equipment and staff time if you are
considering using a cost model.
• A checklist of the factors should be built into a cost model.
The Digitization
146
• In order to preserve the integrity of digital objects and to retain
the ability to retrieve, display and use, the data must be
transferred to new media types and formats. Both the master
images and the surrogate images must be considered for storage.
The Digitization
147
The key factors which are responsible for the transmission
of digital images are the size of the image files and the
speed of the network.
The Digitization
148