0% found this document useful (0 votes)
9 views

Content Digitization

content digitization

Uploaded by

frantickat007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Content Digitization

content digitization

Uploaded by

frantickat007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 149

BMG 204

CONTENT DIGITIZATION

. The draft instructional material is being made


available “as received” from authors. The editing
and various other quality checks are under progress.
The uses are advised to consult Study Centers for
any missing content, updates and instructions
according to syllabi of course.
UNIT 1 THE DIGITIZATION
Program Name:BSc(MGA)
Written by: Mrs.Shailaja M. Pimputkar,Srajan
Structure:
1.0 Introduction
1.1 Unit Objectives
1.2 What is Digitization?
1.2.1 Advantages of digitization
1.2.2 Disadvantages of digitization
1.2.3 Storage for digitization
1.3 Fit for Purpose
1.4 Compression
1.4.1 Advantages of compression
1.4.2 Disadvantages of compression
1.4.3 Lossless and lossy compression
1.4.4 Compressing Different file types
1.4.5 Popular Compression Software
1.5 Pathways
1.6 Digital Objects
1.6.1 Text Based
1.6.2 Image Based
1.6.3 Time Based
1.7 Data Models
1.7.1Choosing a data model
1.7.2 List
1.7.3 Hierarchy
1.7.4 Sets
1.7.5 Geography/geometry
1.8 Choosing software
1.9 Summary
1.10 Key Terms
1.11 End Questions

1.0 INTRODUCTION
As the first Unit of the course we are going to learn what Digitization is
and other things related to digitization. We would see what the various
advantages and disadvantages of digitization. You will need to know about the
proper way to store the data while doing digitization. You will find the detailed

The Digitization
1
descriptions about Compression. We would see what the various file formats,
data models and compression software.
Digitization is the conversion of non-digital data to digital format. In
digitization, information is arranged into units. These units are called bits.
Digitizing information makes it easier to preserve, access, and share. Text and
images can be digitized similarly. For example a scanner is used to digitize
texts and image. Scanner converts this text or image into an image files, like
a bitmap. Optical character recognition (OCR) software detects a text image for
light and dark areas in order to identify each alphabetic letter or numeric digit,
and converts each character into an ASCIIcode.

1.1 UNIT OBJECTIVES:


After studying this unit you will be able to
Explain the concept of digitization
Describe the purpose of digitization
Explain data compression
Explain the guiding the pathways
Define digital objects
Explain the concept of data model
Choosing software to digitize data

1.2 WHAT IS DIGITIZATION?

Digitization involves the conversion of information into digital format.


In other words digitization is the process of converting non digital data
information into a digital format. In this format, information is arranged into
units of data. The unit is called bit. It may be used to convert a variety of data,
for example text and images can be digitalizing using scanner. Scanner
captures the image and converts it into image file like bitmap.
1.2.1 Advantages of digitization
The main Advantage of Digitizing data is it user can easily store,
access, and share the data. It allows users to get easy access to the required
information. For example, a person wants information about some historical
place, an original historical document may only be accessible to people who
visit its physical location, but if the document content is digitized, it might be
available to people worldwide. Thus, information can be easily located,
retrieved, examined and used.
1.2.2 Disadvantages of digitization
Disadvantage of digitization is that it may become outdated because of
technology changes. Financial costs are extremely high. And many time users
also prefer to use the original material in place of its digitized form.

The Digitization
2
Another disadvantage of digitization is it required expert staff and other
additional resources are often the increase costs in digitization projects.
Digitization is that only a part of the analog object can be represented in the
digital format. Thus it can never completely capture the original.
Digitization policies vary between different organizations. During material
selection, materials which have a huge demand benefit from the accessibility
which digitization offers. Material which has been used lithe may be stored on
magnetic tapes, while the improved accessibility of disk storage may be
selected for high demand and strategic online material; the method of storage
of digital information depends on the type of retrieval method used.
1.2.3 Storage for digitization
It is important to choose the right physical medium for storage of
information. It should be maintained under stable environment conditions so
that there is no failure during storage. One way to store data is storing similar
material in multiple locations and regular backup provide protection against
loss due to media failure or human error or both. Using less changeable
standard format will enable to maintaining data and later it will help in
changing data when required.
Magnetic tapes are often used to stored less material, but sometime the
improved accessibility of disk storage may be selected for high demand and
strategic online material. The storing method is also depend s on the type of
retrieval method used. Thus on the basis of the retrieval method, digital
information can be stored online, near-line or offline.
Another storage method called lossy compression is not a suitable
method for digital data storage for long period of time. This is because in lossy
compression the data may be lost forever during decompression or during
migration. During when the image is compressed and then decompressed, the
result is often different from the original.ie decompressed image is often
different from the original scanned image. This is known as lossy compression.

1.3 FIR FOR PURPOSE


To make decisions on any technical issue, the digitizer must have a clear
idea of digitization and also should be aware of the use of the digital
information. The thought of 'fit for purpose' is central to all digitization
processes. This includes improving access; new research; conservation;
enhancing value by including the objects in learning environment; or
comparing and searching objects alongside electronic resources of a similar
nature. The digitizer can make effective and proper technical decisions while
keeping in mind following questions:
Who will be the user of the resources?
Why user want to use this resources?
How should the user put the resource to use?

The Digitization
3
Check your progress-1
What is digitization?
How digitization is useful?
What are the two compression methods?
What is lossy compression?

1.4 COMPRESSION
Compression is a process of decreasing the number of bits which are
useful to represent data. Compressing data technique is useful as it can
save storage capacity, speed file transfer, and decrease costs for storage
hardware and network bandwidth. For example text compression can be as
simple as removing all unwanted characters, inserting a single repeat character
to indicate a string of repeated characters. Compression can reduce a text file to
50% or a significantly higher percentage of its original size. Compressed data
can be understood only if the receiver is knowledgeable about the decoding
method. Thus, compressed data communication will be beneficial only if both
the sender and the recipient of the information are knowledge about the method
of encoding adopted.
Data compression means bit rate reduction. Data compression means
encoding information using fewer bits than the original data. Thus,compression
is the process of decresing the size of a data file. Compression can be lossy or
losseless.
1.4.1Advantages of compression
• Time-Consuming
Compressed data can be transfer faster to and from the disk. File
compression is a mathematically intense operation. Compressing hundreds of
megabytes of files can take the less than an hour, depending on your
computer’s speed. It is easier to compress small files. Large files take more
time for compression and decompression.
• Disk Space Savings
Data compression increase disk bandwidth. Compress is useful because
it reduce resource usage also .If you have many small files on your hard drive,
it is advisable to compress them into one or more files having a smaller size
than the originals.
• Easy Downloads
Compressing is useful if you need to send several files as an email
attachment. It is easier and convenient to compressing many files into a single
file and attaching them all to the email.
1.4.2 Disadvantage of compression

The Digitization
4
• Data Compression can only be used if both the transmitting and
receiving modems support the same compression procedure.
• Data damage can occur while decompressing the compressed
data.
• A compression mechanism for video required expensive
hardware for the video to be decompressed.

1.4.3 Lossless and lossy compression

Compression is divided into two types. Compressing data can be


a lossless or lossy.
In lossless compression, every single bit of data that was remaining
same and no information is lost. All of the information is completely restored.
Compression is done without any loss of information within the file, and thus it
called lossless compression. The Graphics Interchange File (GIF) is an image
format used on the Web that provides lossless compression.
As like lossless compression, lossy compression also reduces the file
size; however this is achieved by permanently deleting certain information,
especially redundant information. When the file is uncompressed, only some
part of the original information remain same .Lossy compression is generally
used for video and sound. The JPEG image file, generally used for photographs
and other complex still images on the Web.

Fig 1.1: lossless v/s Lossy


1.4.4 Compressing Different file types
Compression can be lossy or lossless. Compressed files use to save disk
space. Compressed archive formats can also be used to compress multiple files
into a single archive.. Common compressed file extensions
include .ZIP, .SITX, .RAR, and .GZ.
The zip conversations generally used for compressing files. It uses
lossless compression in converting a folder to a zip folder.Convering
uncompressed audio files to a compressed audio format like WMA uses lossy
compression.
(a)Image

The Digitization
5
Image compression is different from compressing raw binary data. The
general purpose compression programs are used to compress images, but result
can be different than original. JPEG is an image file format that supports lossy
image compression. Formats such as GIF and PNG use lossless compression.
Image compression is reducing the size in bytes of a graphics file
without degrading the quality of the image. It also reduces the time required for
images to be sent over the Internet or downloaded from Web pages. There are
many different ways in which image files can be compressed. Most of
compression techniques are,
• JPEG
• GIF
• PNG

The jpg format is good with photographs. But it is not so good with
high contrast pictures like screenshots or computer art. It is not possible for
human eye to detect small changes in colors. This format recompresses each
time it is saved and repeated saving may lose quality. You should always work
with uncompressed formats before saving in the target format.
The Graphics Interchange Format is based on limiting the colors used
in the image. Typically up to 256colors are used to make the palette - a table
assigning up to 256 colors to the number 0 - 255. The pixel data for the image
is then stored using the 8 bit number that represents the color’s position in the
table.
Portable Network Graphics uses lossless data compression. PNG is an
open-source format that was created to improve upon the GIF. GIF was not
open source and there were licensing costs to developers using the format. This
was also a motivating factor in the uptake of the PNG.

Fig 1.1: Various Format Of Image

(b)Text
Text is a very big part of most files that digital technology users create.
There are many advantages of compressing a Microsoft World file. Therefore
being able to compress text for storage or transmission is extremely important.
It is advisable to use lossless method for text compression. It means no data
loss during compresses the data.

The Digitization
6
You can use zip file to compress text. To create a .zip file, right click a file,
such as srajan.doc,and then click a command that appears on the shortcut
menu, like add or zip file ;srajan.zip is created.
Some techniques used by general purpose compressors such as zip, gzip,
bzip2, 7zip, etc, and some types of models of text compression are,
• Static
• Semi adaptive or Semi static
• Adaptive

A static model is a fixed model that is known by both the compressor and
the de compressor and does not depend on the data that is being compressed.
A semi adaptive or semi static model is a fixed model that is constructed from
the data to be compressed. An adaptive model changes during the compression.
(c)Audio
Audio compression the amount of data in are corded waveform is reduced
for transmission. This is used in CD and MP3 encoding, internet radio, and the
like. Audio level compression is the dynamic difference between loud and
quiet of an audio waveform is reduced. Audio compression is a form of data
compression designed to reduce the size of audio files. Let us look some of the
steps of compressing audio files.
• In order to compress a .WAV files, you simply have to load the same
into a sound recorder. Chose ‘File’ and ‘Properties’. Now you will see a
button labeled ‘convert now’. Upon clicking on it, you will get the
option of changing the format and attribute of the sound.
• Format: this is the compression scheme that is used on the sound file.
Each compression scheme acts on the sound files in a different manner.
• Attributes: you will choose the sound’s frequency range-larger the
range, better the sound quality. You will choose the number of bits that
comprise each sound section-more the bits, higher the quality.
• After having chosen the compression method, choose ‘OK’ after which
you will get back to the main playing window. Play the sound. If the
quality is good and the compression satisfactory, save the file.

There are three categories in Audio formats:


• Uncompressed audio formats: A WAV audio file is an example of an
uncompressed audio file.
• Lossless compression: The WMA audio file format uses lossless
compression.
• Lossy compression: MP3 and Real Audio files use a lossy compression.

The Digitization
7
Fig 1.2: Audio compression wave
(d)Video
To represent video images, process of reducing quantity of data is
called video compression images. Video compression is generally lossy
.Compressed video can effectively reduce the bandwidth required to transmit
digital video via cable, or via satellite services. It is the process of converting
digital video into a format that takes up less storage space or transmission
bandwidth. One of the big advantages of digital video is that it can be
compressed for reduced bandwidth applications including transmission over
satellite, cable TV and Internet-based networks.

Compressed video
is particularly useful for reducing storage requirementsespecially in the
broadcast and government markets. Now, let’s look at some steps that are
required in compressing video.
• The most extension of a video file is MPEG (Moving Picture Experts
Group) or AVI (audio video interleave) etc. These extensions can be
seen along with a filename that is separated by a dot (.).for example
srajan.avi or srajan.mpg
• Select the video file that you want to compress. Here we are taking file
name ‘Srajan’. When highlighted, the file will show an extension, If it
doesn’t then go to ‘tools’ menu, click on the ‘folder option’, where you
can see few tabs. Here you have to select ‘view’ and unchecked the
option ‘hide extensions for known files’.
• Now, you can change the extension you want to zip as txt. For example,
if the file is ‘Srajan.mpeg’ change the extension to txt, now the file is
‘Srajan.txt’.
• After that use the zipper software to compress the file after which you
will see a noticeable decrease in the file size.
• To play the file,you have to extract it and change the extension to
‘mpeg’.

The most popular internet video compression standards are,


• MPEG(Moving Picture Experts Group) is a most common extension for
video files .it is used for audio and moving pictures, with support for bit

The Digitization
8
rates up to 1.5Mbit/sec. This is a popular standard for streaming videos
as .mpg files over the internet.
• DV is a high-resolution digital video format which employs lossy
compression where certain redundant information in a file is
permanently deleted, so that even when the file is uncompressed, only a
part of the original information is still there.
• DivX Compression is an application that employs MPEG-4
compression standards to facilitate fast downloads over DSL/cable
modem, without compromising on video quality. The most popular
compression format for videos on the internet is .flv or
• Flash Video. FLV and F4V are two formats used to play videos on the
internet using Adobe Flash Player.This format helps to compresses
video to low bitrates on the web and also maintains its quality because
of this it is very popular for embedded video.

Fig 1.3: Various format of video


1.4.5 Popular Compression Software
There are several digitization software that makes the process of
digitization a lot easier. File compression software compress a larger file into
smaller size and volume. Thus, by compressing you can keep more files in less
space. File compression is especially helpful when you are trying to send a file
in emails. Emails have size restriction so if your file size is large then you may
not be able to send it in mails. It becomes easier to save and transfer WinRAR
files, as this compression software compress larger files to smaller size.
Let us now look at some popular compression software available these
days.
1. 7-Zip

7-zipis a free, open-source tool.7-


Zip is one of the best and popular
compression/ decompression tool for
Windows. .It is compatible with
windows and Linux. It can very easily
Fig 1.4: 7-zip
The Digitization
9
create .tar and .gz archives.It can support a majority of compression formats. It
can support all archive formats like .zip, .tar, .rar, etc. The interface of 7-Zip
has come in for some criticism, but users generally happy with its fast
operation.

2. WinRAR

WinRAR is the most popular


compression utility for Windows.
WinRAR support a wide range of
formats. This software has got so much
popularity because of its neat and
simple interface. We can easily
compress our files in a few steps. It
does everything very quickly.

Fig 1.5 : winRAR

3. IZArc
IZArc is a full featured archiving tool, compatible with windows that you
can use to open and create compressed files in
several formats. It offers many features, such
as repairing of broken archives, searching with
archives, emailing of archives, password
protection and much more. It is very adjustable
with a many number of archive formats. You
can very easily compress and decompress files
with IZArc. However, IZArc cannot create GZ
archives.
Fig 1.6: IZArc Format

4. The Unarchiver (Mac OS X): The unarchiver is the built-in file


compression utility for Mac OS X.The Unarchiver is capable of handling most
major formats. However, there is a drawback to this software. It is a read-only
application, and to archive types other than ZIP, a supplementary tool maybe
needed.

Check your progress-2


What is data compression?
Which formats are used in Flash player?
What are the disadvantages of compression?
What is WinRAR?
What is full form of PNG and GIF?

The Digitization
10
1.5 PATHWAYS
For better digitization it’s always better to represent data as close to the
original as possible for best result during digitization. The amount of
differences between the original and the digitized form, directly affects the
number of errors in the file. Depending on the nature of the source this could
mean capturing directly using, for example, a flatbed scanner to digitize a text
document, a digital camera to capture an object, or a digital camcorder or audio
recorder to capture moving images or sound. When depiction to digitization it
is thought best to capture as close a performance of the analog object as
possible. It may be that the source is one step removed from the original in,
say, capturing a slide or photograph of an object, or digitizing an interview
stored on analogue audio tape.
Taking decision regarding about the method of capture is very much
self evident and easy to make and also based on the type of source material
being digitized, the equipment and staff skills available, and the budget
allocated, for both equipment and staff time. For example, if we want to
digitize slides of 35mm, then a slide scanner is probably the best solution.
Same as if want to scan a series of flat documents, than an A4 flat bed scanner
would be a good choice. Of course, decisions are not always as straightforward
as this. Making the decision of capture is very much a project decision. For this
decision there are some aspects effects like what type of source material being
digitized? The equipment and staff skills available, and also about the budget
allocated for both equipment and staff.
Naturally, decisions are not always straight forward. For example, in
some historical museum, you want to capture different things like some flat
pantings, some three dimensional objects, some written documents, there are
several ways to go about digitization the collection.

1.6 DIGITAL OBJECTS


The three major categories of digital objects are text based, image based
and time based. The process of capturing digital objects involves various
issues, one among which is the characteristics of the object. The following is a
brief analysis of all these issue.

1.6.1 Text Based


The early text format used in computing was the American Standard Code for
information interchange, more commonly known as ASCII text, sometimes
referred to as plain text.
Unicode solves the problem of limitation that are evident in use of
ASCII. Unicode is the second and later text format, and it is certain that it will
become the standard for character encoding in the future, and is already
supported by the latest versions of the major operating systems. The aim of
Unicode is for all characters in all of the world’s languages, including some
languages of the past, to be mapped on to a distinct numerical code.
There are two main methods used to digitize existing texts-Transcription and
OCR.

The Digitization
11
1) Text Transcription

The simplest and easiest method of digitization is text transcription .this


method can be relied upon for accurate transcription of documents which have
complex layouts and difficult to read passages.
Following are some of the disadvantages of the method:
Time consuming, especially if the work is outsourced.
It can be cumbersome task to identify and correct the spelling and
typographical errors committed by the transcriber. This problem can be
taken care of by assigning two people for transcription, while one
person transcribe the text ,other proof reads it simultaneously.

2) OCR

The full form of the OCR is optical character recognition.OCR is another


method of digitization text which relies on the OCR software. During
digitization the text is scanned by the OCR software, the resultant digital image
is then read by a computer programme.
Following are some Advantages and disadvantages of OCR:
Advantages:
• It is a faster data entry system than manually keying in data.
• Number of errors is reduced.
Disadvantages:
• It usually has difficulties reading handwritings.
• It is not a very accurate technique.

Digitization can occur due to any of the following four methods adopted by
OCR software:
Neural Network: In these networks, each character is compared with
characters the software has been trained recognize. They therefore
evolve and grow over time. Each character has a confidence level, and
this is better with texts of poor quality.
Feature Network: In this method, characters are identifies on the basis
of their shape, high quality prints benefits from this method.
Pattern recognition: in this method, documents with a uniform typeface
are recognized based on the pre-recorded images in a database.
Structural analysis: This method involves analyzing the structure of
each character along with the number of horizontal and vertical lines.
These method suites for text which are of poor quality.

1.6.2 Image Based


1) Raster (or Bit-mapped )Image:

Most images you see on your computer screen are raster graphics.
Raster images are made up of pixels commonly referred to as a bitmap.
Pixel is a packet of color. Each pixel stores information about the color

The Digitization
12
of an image. For RGB image, there are commonly 8 bits per channel of
red, green and blue, respectively. , making it a 24- bit image. A
grayscale image is made up of 8 bits, going from white to black through
shades of grey.

Fig 1. 7: Raster Image

More disk space is required for large size


images. JPEG and GIF are the most commonly used image types on the
Web. Raster data models are often used in GIS (geographic information
system) to represent continuous surfaces, such as satellite or historic
maps. There are several other types of image compression are available.

2) Vector image:

Vector images are different from raster image. Raster images are made by
pixels where a vector image is made up of lines and dots. It is also called path.
Each path contain a mathematical formula, that guide the path how it is shaped
and which color should be used to borders or to fill in the shape. Vector
graphics are comprised of paths, which are defined by a start and end point,
along with other points, curves, and angles along the way. A path might be any
shape like a line, a square, a
triangle, or a curvy shape. These
paths can be used to create simple
drawings or complex diagrams.
Vector graphics finds application
most often in virtual reality and 3-d
modeling, as well as in
macromedia Flash applications.

The Digitization
13
Animation images are also usually created as vector files. The advantage of
this is that vector images can be magnified to any extent without compressing
the picture quality. In other words, there will be no pixilation. Other fields like
architecture, cartography, and computer –aided design (CAD) also use vector
graphics.
3) Resolution:

In simple words, resolution is the quality of the image. As the resolution


goes up, the image becomes clearer. As resolution is high ,the image becomes
more clear ,sharper, more defined, and more detailed as well. The resolution of
an image depends to the number of pixels held within the digital file, which
measured in pixel per inch (ppi).this is known as ‘scan’ resolution. Resolution
depends upon the size of the pixel. If the pixels are small it gives high
resolutions and the image is clearer. If pixels are small in size then need more
pixels for an image.
The pixel resolution calculated by the set of two positive integer numbers,
where the first number is the number of pixel columns (width) and the second
is the number of pixel rows (height),It is usually quoted as width × height, with
the units in pixels: for example, "1024 × 768" means the width is 1024 pixels
and the height is 768 pixels. In GIS (Geographic information System) image,
each pixel represents a known area on the surface of the earth.
1.6.3 Time Based
The digitizer is faced with some issues while digitizing time based media,
such as sound and video.
I. The most prominent issue is the huge size of the digital files produced,
when compared to all other methods of digitization.
II. The second issue is the dissemination of sound and video for seamless
display over the web. As in large raster images, here also file
compression to some extent is required to do this.
III. Yet another issue is that while other types of digital objects can be
readily viewed on screen, it is essential to have a plug-in or viewer to
view most compressed sound and video.
1) Sound

Sound waves are often simplified to a description in terms of sinusoidal


plane waves, which are characterized by these generic properties:
• Frequency, or its inverse, the period.
• Wavelength.
• Wave number.
• Amplitude.
• Sound pressure.
• Sound intensity.
• Speed of sound.
• Direction.

The Digitization
14
Sampling refers to the process of converting a signal from analog to
digital sound. Sound is a continuous wave that travels through the air. The
wave is made up of pressure differences. Sound is detected by measuring the
pressure level at a location. Sound waves have normal wave properties like
reflection, refraction; diffraction etc.The frequency of this sample is measured
in Hertz. The range of each sample is measured in bits. The minimum sampling
rate for lossless digitization is 36 kHz and the highest frequency for most
computers is 44.1 kHz. In terms of bit rate 16bits per sample is considered
good enough that gives an overall bit rate of 192 kb/s.

Common on compressed sound file format are Microsoft’s Waveform


PCM encoding(.wav).and Audio interchange File Format (.aiff)-the default
format for Apple MacOS.These formats provide accurate ,lossless high –
quality sound files.however,one disadvantage is the huge file size.

MP3 is the most common compressed format. A sample rate of 44.1 kHz
and bit rate of 192 kbps or higher is recommended to preserve quality.
Specialist codecs are used to compress audio format.

2) Moving Image

Usually accompanied with audio data played in tandem, a digital video file
is a sequence of still images played in rapid succession. When played at a set
rate, the image sequence creates the illusion of a moving object.

Following are some common video formats:

Windows Media Video (.wmv): WMV was developed by Microsoft,


and a player (Windows Media Player) ships with Microsoft's Windows
operating system. Because of the near ubiquity of Windows, WMV is
often a very good format for making your video readable by a large
number of people.

QuickTime (.mov): This format was developed by Apple, and a player


began shipping with OS 7. A free version is available for Microsoft
Windows.

DivX (.avi, .divx): DivX is a very popular codec for compressing


MPEG-4 videos. The size difference between DivX and an MPEG-2
encoded DVD is a factor of about ten, which makes it a very popular
way of encoding large videos that need to be transferred over the
internet. The codec can work as a plug-in for existing players such as
Windows Media Player.

Moving Picture Experts Group (.mpeg, .mpg, .m4p): MPEG was


designed for compressing video frames along with audio. It's able to get
excellent compression by grouping frames together. DVDs use a form
of MPEG compression called MPEG-2. Another format developed by
MPEG (MPEG-4) is capable of producing better compression and

The Digitization
15
supports digital rights management, making it a popular format for
computer viewing. Apple's iPod players use a version of MPEG-4.

Audio Video Interleave (.avi): Developed by Microsoft in the early


1990s as part of the Windows Video format, AVI is still very popular
today, especially when partnered with a compression codec like DivX.

Check your progress-3


What are the categories of digital object?
What is the full form of OCR?
What is resolution?
What are raster image?
What is a path?

1.7 DATA MODELS

Any digitization process may result in the generation of hundreds or even


thousands of objects. Data modeling refers to the process of storing individual
digital objects, organized and managed together. Data modelers often use
multiple models to view the same data and ensure that all processes, entities,
relationships and data flows have been identified. There are several different
approaches to data modeling, including:

Conceptual Data Modeling - identifies the highest-level relationships


between different entities.
Enterprise Data Modeling - similar to conceptual data modeling, but
addresses the unique requirements of a specific business.
Logical Data Modeling - illustrates the specific entities, attributes and
relationships involved in a business function. Serves as the basis for the
creation of the physical data model.
Physical Data Modeling - represents an application and database-
specific implementation of a logical data model.

There are usually four conceptual methods of organizing data: lists,


hierarchies, sets, and geometric or coordinate based system.

1.7.1Choosing a data model

The following three questions need to be answered before selecting a data


model for digital output:

I. How is the data organized in its natural form?

The Digitization
16
II. What is the purpose of the resources?

III. What are the intended user’s expectations and experience while using
resources of a similar nature?

1.7.2 List

The best way to model is a tabular format either in spreadsheet, or if


slightly more complex, then data manipulation and searching is required in one
table of a database. It would be wrong organized this data to store it in text file,
say an MS Word document. This prohibits the effective use of the digital
resources and finding contacts are made more difficult.

1.7.3 Hierarchy

Another method of modeling data lies in hierarchy. This is common for


storing files inside folders on a computer with the classic tree structure. More
text is organized in a hierarchical fashion. E.g. a book, inside which chapter
consist pages, sentences and so on poem, which has stanza, lines and words.

It is also common for archival resources to be organized in a hierarchy, as


they are logically ordered in this way in the analog world. Most electronic
archives are stored in some form of hierarchical database.

Fig 1.8: Hierarchy of company organization

1.7.4 Sets
Sets are an effective method of
storage particularly for objects that have
clear relationship with one another. One
popular example of the set of data model
is the relational database, where one
object can have numerous related
object.e.g.relational databases that contain
ID may have one main ID with more
related information showing different
views. The main Id information is entered
only once.Thus, relational database avoid
Fig 1.9: Relational Database

The Digitization
17
unnecessary duplication of the same information in a database.

1.7.5 Geography/geometry
Modern and historic maps are plotted together with observations taken in
the field or digitization from another sources.GIS (geographic information
system) is a common model for storing such data combines many of the
features of relational database with image processing tools. It uses geography
as a primary key for data. For example, if you ask question such as ‘show me
all the information within 1 km of where I am’-tasks would be impossible or
very time-consuming. This is possible when the different sets share a common
vocabulary of coordinates.

Fig 1.12: Geographic information system

1.8 CHOOSING SOFTWARE


Rather than choosing a particular type of software to represent that data
model, it is more important to get the data model right. Thus if you choose to
organize your data in a relational database ,you can choose amongst MS
Acess,Filemaker Pro,MySQL or any other relational database software.
Depending on use, there are caveats to this; for example if the database
is searchable over the Web, or if large amount of data is stored and retrieved at
faster speed then MY SQL is preferred over MS Access.
There are other important considerations to pay attention while choosing
software. These are depicted as follows:
• Ensuring that the software performs the intended tasks as required
• Selecting well-used software having good support
• Choosing software supporting good import and export functions.
• Choosing software that supports recognized international standards.

Check your progress-4


What are sets?

The Digitization
18
What is data modeling?

1.9 SUMMARY
Digitization is the process of converting information into
a digital format.
Digitization can be stored and delivered in a variety of ways; and can
be copied limitless times without degradation of the original. Digital
data can be compressed for storage, meaning that enormous amounts of
analogue content can be stored on a computer drive, or on a CD-ROM.
Digital content can be browsed easily, and can be searched, indexed or
collated instantly.
lossless compression, every single bit of data that was originally in the
file remains after the file is uncompressed.
lossy compression reduces a file by permanently eliminating certain
information, especially redundant information.

Compressed files use file compression in order to save disk space.


Compressed archive formats can also be used to compress multiple files
into a single archive. Several open and proprietary compression
algorithms can be used to compress files, which is why many different
compressed file types exist.
Data modeling refers to the process of storing individual digital
objects, organized and managed together. Data modelers often use
multiple models to view the same data and ensure that all processes,
entities, relationships and data flows have been identified.

1.10 KEY TERMS


• Data compression: The process of encoding information using
fewer bits than that used by an uuencoded representation, through
use of specific encoding schemes
• Transcription: The simplest method of digitization which requires
only a person, keyboard, and monitor.
• OCR: Works by scanning a document and using computer
programme to read the resultant digital image.

1.10 END QUESTIONS


1) What are the Advantages of compression?
2) What is a difference between lossless and lossy compression?
3) Explain the concept of digitization?
4) Write a note on pathways?
5) Write a note on following:
A. Raster image
B. Resolution
6) What are data model? Explain in details.
7) Explain different types of digital objects.
8) What is compression? Explain methods of compression.
9) What is OCR? What rae the advantages and disadvantages of OCR?

The Digitization
19
10) What factors should be considered while choosing a data model?

Answer to check your progress questions


Check your progress -1:
Digitization is the process of converting non digital data
information into a digital format.
Digitizing data is it user can easily store, access, and share
the data. It allows users to get easy access to the required
information.
Lossy and lossless compression
During when the image is compressed and then
decompressed, the result is often different from the
original.ie decompressed image is often different from the
original scanned image. This is known as lossy
compression.
Check your progress -2:
Compression is a process of decreasing the number
of bits which are useful to represent data.
FLV and F4V are two formats used to play videos on the
internet using Adobe Flash Player.
(a)Data Compression can only be used if both the
transmitting and receiving modems support the same
compression procedure.
(b)Data damage can occur while decompressing the
compressed data.
(C) required expensive hardware for the video to be
decompressed.

WinRAR is the most popular compression utility for


Windows.
PNG:- Portable Network Graphics , GIF:- The Graphics
Interchange Format
Check your progress -3:
The three major categories of digital objects are text based,
image based and time based.
The full form of the OCR is optical character recognition.
Resolution is the quality of the image.
Raster images are made up of pixels commonly referred to
as a bitmap.
Vector image is made up of lines and dots. It is also called
path.

The Digitization
20
Check your progress -4:

Sets are an effective method of storage particularly for objects


that have clear relationship with one another.
Data modeling refers to the process of storing individual digital
objects, organized and managed together.

BIBLIOGRAPHY
Korn, D.G, and K.P Vo. 1995.Vdelts: Differencing and Compression,
Practical Reusable Unix Software. Edited by B. Krishnamurthy. John Wiley &
Sons

The Digitization
21
UNIT 2 CAPTURING VIDEO IN MOVIE
MAKER 2
Program Name: BSc (MGA)
Written by: Mrs.Shailaja M. Pimputkar, Srajan
Structure:
2.0 Introduction
2.1 Unit Objectives
2.2 Choosing the Format
2.2.1 DV-AVI format
2.2.2 Windows Media Video 9
2.3 Improving Capture Performance in Movie Maker
2.3.1 Defragmenting Your Hard Drives
2.3.2 Install a Faster Hard Drive
2.3.3 Partition Your Drive as NFTS
2.3.4 Get a second hard drive
2.3.5 Use the Windows Media Codec
2.3.6 Turn Your Preview Monitor Off
2.3.7 Decrease Your Monitor Display Settings
2.4 Project Files in Movie Maker
2.5 Editing within Moviemaker 2
2.6 Summary
2.7 Key Terms
2.8 End Questions
.

2.0 INTRODUCTION
Earlier video capturing was considered to be very difficult job as it
involved number of problems, such as system crashes, roped frames and
hardware issue. To capture the video and then transfer it from digital
camcorder required lot of hard work and experience. In this unit of the course
we are going to learn about Movie Maker 2.Microsoft developed Windows
Movie Maker2 is video editing software. It has many features such as effects,
transitions, titles, audio track, and timeline.
The vision of Windows Media 9 Series is to deliver compressed digital
media content to any device over any network. Windows Movie Maker also
known as Windows Live Movie Maker in Windows 7.It is a video editing
software by Microsoft. It is a part of Windows Essential software suite.
Windows Media 9 Series provides many audio and video codec for different
applications.

2.1 UNIT OBJECTIVES:

The Digitization
22
After studying this unit you will be able to
Choose the format to use
Explain how to improve capture performance in Movie maker
Explain the process of saving project files in Movie maker
Explain how to edit in Movie Maker2

2.2 CHOOSING THE FORMAT


Windows Movie Maker is video editing software. Many important
features such as effects, transitions, titles/credits, audio track, timeline
narration, and Auto Movie can covered in Windows Movie Maker. Windows
Movie Maker is also an audio track editing program.
Movie Maker allows capturing video in many formats. User can import
footage in many ways. For example user can capture video from camera,
scanner or other device. You can choose both traditional DV-AVI format and
WMV format for capturing the video in Move Maker. Other formats accepted
by Movie maker for import video are -PG (MPEG-1), WMA, .WAV, and
.MP3.
As each format have its advantages and disadvantages. In following
section we should go through the further details about the formats and discuss
the importance of these formats in movie maker.
2.2.1 DV-AVI Format
DV AVI is a type of AVI file where the video has been compressed
.also known as DV or digital video. You will get video that has potentially
higher quality than a commercial DVD. A basic AVI file usually consists of
one video stream and one audio stream. On the other hand, native DV format
as stored on your DV camcorder will mix audio and video into a single stream.

DV-AVI is a video compression format that camcorder captures as its


output. Without losing its quality, we can then easily edit and save any movie
as a DV-AVI file. Most DVD authoring applications will only accept DV-AVI
files as their source input file.
This format is capture and editing format of choice of all other video
software programs. Therefor any video-related software can easily recognize
and work with format. At outstanding resolution, running at 30 frames per
second, the video is saved at 720x480 pixels. DV was originally designed for
recording onto magnetic tape. Thus, when you film a video in the DV-AVI
format, the camcorder saves the video data onto magnetic tape as a series of 0s
and 1s.
As every coin have two sides. DV-AVI format also has its advantages
as well as disadvantages. The major disadvantage is uncompressed DV-AVI
files are very large in size as compared to the other file formats. Lots of space
consumed by the DV-AVI video. As each minute of video eats up around 200
Megs of space on the hard drive, this means that an hour of tape will take up 13

The Digitization
23
gigabytes of hard drive space. The format is huge that many old computers face
problem while capturing and saving video. Whenever the computer’s hard
drive slows down below a critical level, this signifies that it has leads to
‘dropped frames’. It is very commonly happened that most advanced video
users have back up for the work using spare hard drive to save their video
projects.

2.2.2 Windows Media Video 9


Windows Media 9 Series is the latest generation of digital media
technologies developed by Microsoft. Although the main purpose of Windows
Media was focused on streaming compressed audio and video over the Internet
to personal computers, but now it become more advance that to enable
effective delivery of digital media through any network to any device. It should
be noted that Movie Maker 2 helps in capturing the video into its own ‘wmv9’
format.
Windows Media 9 Series provides a variety of state-of-the-art audio
and video codec for different applications. The Windows Media Video 9
(WMV-9) codec is a block-based, hybrid codec that incorporates advanced
compression technology in each of its components. It saves the video into high-
quality video that uses up to one-tenth the space as DV. The compression level
of WMV 9 is great and allows to take backup and to create assembling of video
on computer.
Even if you set the higher setting while re-recoding a video, you can
always loose the desired image quality. To save the movie into WMV9 format,
first camcorder capture the movie through Fire Wire cable, after that your
computer has to‘re-encode’ the video into the WMV9 format. But, through the
encoding process, even if you set the compression level to the highest quality,
file looses the desired image quality.
WMV9 is a Microsoft’s proprietary format so no other program uses
WMV9.therefore you are bound to use Movie Maker or Movie Maker2 for
editing.
Thus, while comparing DV-AVI format with WMV9, it is advisable
that if you are going to capture a short section of video from your camcorder,
stick with the DV-AVI format. On the other hand, WMV9 format is simply
fine if you are having huge hard drive space or before capturing long amount of
tape.

Check your progress-1


What is DV-AVI?
Explain the disadvantages of DV-AVI format?
What is the fullform of WMV?

The Digitization
24
2.3 IMPROVING CAPTURE PERFORMANCE IN
MOVIE MAKER
As we come to know in early sections that videos captured from camcorder
has large in size and its difficult to save it on hard drive. Transferring video
from a digital camcorder and capturing it onto your hard-drive is a very
difficult task and somewhat frustrating also. It is not possible for every system
to handle the confirmed speed needed to transfer your video or movie onto a
hard drive.
For this problem there is a solution. If you have installed Windows XP on
your computer then you can easily run Movie Maker 2 on your computer. Even
if your computer has slow processor speed, Windows XP are powerful enough
to capture.
However, if you get into trouble while capturing, there are several helpful
ways to seep up your system.

2.3.1 Defragmenting your hard drives


Defragment means rearrangement of the files on a hard disk for
faster data access. After the files are removed from a disk, the operating
system tries to fill the empty space with the new files. It can store the extra data
at different location if a new file is too big not able to fit in the same location,
then it stores the excess data at another location.

A hard drive is really a circular platter. For example like CD, data is
written onto this platter in a circular pattern, and each hard drive platter can
only hold a predetermined amount of data. Throughout the disk, any single file
may be broken up into many little sectors. All the broken fragments are placed
together when you defragment your drives. Thus, your hard drive gets a large
"physical area" of available space to write your video.

Fig 2.1: Fragmented and Defragmented Files

2.3.2 Install a faster hard drive


the major problem with hard drive is to store big video files. Digital video is
written onto your hard drive as large a DV-AVI file, when you stream video
onto your computer through a fire wire cable. This stream runs at a constant
200 Megs per minute. While this stream runs if your hard drive slows down,
some of your video data will be lost, and that resulting in “dropped frames”. So

The Digitization
25
final video frames will be out of synchronization and these dropped video
frames are risky.

Most hard drives run fast, some of them run slow. Different hard drives
have different time for slowdown. To avoid hard drive temporary problem you
have to close down any background programs, and empower the system with
faster 7200 rpm hard drive.

2.3.3 Partition your Drive as NTFS


NTFS (New Technology File System) is the file system that the Windows NT
operating system uses for retrieving and storing files on a hard disk. If your
hard drive originally comes with Windows XP, it should be defragmented as
NTFS. To check this, right click on your drive with your mouse inside of “My
computer”. If your drive is partitioned in the older format like Fat16 or Fat32,
than this is the main reason for the capture problems. The older partition
structure is not modified for video capture, and thus it would not allow
capturing video files over a certain size like 2 or 4 gigs.

2.3.4 Get a second hard drive


if you do lot of editing, you will need another drive as your videos need lot of
storage space. It is always advisable to add another hard drive for video
capture; this will help to improve your editing system. Most professional
editors have systems with many drives. This way, your computer can use your
main drive to handle running programs, and the second drive strictly for video
capture.

2.3.5Use the Windows Media Codec


in Movie maker, Windows Media Codec is able to remove dropped frames.
Other advantage of this format is it generates small file sizes; you are not going
to carry dropped frames from an underperforming hard drive as because
encoding video takes a lot of processing power, the compression itself might be
tasking on your CPU .Most of the time, and without any problem you can
capture video at the highest setting.

2.3.6 Turn your preview monitor off


you can watch the video capture on your computer inside Movie Maker’s
preview monitor while video capturing. However, generating this “preview
video” is tasking on your system. It is not really essential as you can watch
your captured video directly on your camcorder’s LCD screen. Inside the
capture wizard, you can set the preview mode off.

2.3.7 Decrease your monitor display settings


you should try to set your display to a lower resolution .1024 x 768 is Movie
Maker’s minimum recommendation, but you can go lower if you need to and
decrease your color depth to 16 million (or “high color”).

The Digitization
26
2.4 PROJECT FILES IN MOVIE MAKER
Project files
The project file is a ‘linking file’ that keeps track of every item in your
home movie. This includes every Picture, music song, voice track and video
clip. The project file knows how they are laid out on the movie timeline, what
effects and transitions should be applied to each and where each of these items
are located on your computer.

Fig 2.2 :Linking to files

Actually these video objects are not attached or fix with the project file. If
you notice your project file itself is very small in size somehow 1 megabit your
movie size is quite bigger and may comprise several gigabytes, this is
happened because the multimedia files are just linked to the project file. At any
point in time if you ever want to re-edit your project, you would need to
organize all your files.
If you ever want to cleanup or reorganized your computer, you should
be very careful. If you move files unknowingly or delete them, it is difficult for
moviemaker to detect your file. This will ultimately result in a valuable project
loss.
To avoid this problem always create a new folder for each of your
video projects. This will keep your project intact as it is. Then save every
movie element into this folder prior to importing them into Movie Maker. Your
video, pictures, images, background music, voice narration should be in this
folder only. This method help you to move your entire project easily to another
computer without losing its contains.
Latest way to save movie maker project files:
There is some particular procedure in order to save your Movie Maker
projects; this should keep in mind before you start editing. It is important to
keep back up your video project to another computer to re-edit your project in
the future. When you first save a project in Movie Maker 2, the program
generates a “movie maker project file” on your computer’s hard-drive. You
have option of renaming file and save this project file anywhere you want.
Movie Maker will try to place the file within your “My Movies folder.”

The Digitization
27
2.5 EDITING WITH MOVIE MAKER 2
Almost 95 per cent of home movies are boring, mostly because you
have to sit an hour an hour to find out the actual intersecting material from that
shoot. The useful aspect of computer editing is that all the ‘junk video’ can be
combed out. It is also observed that keeping movie under 5 minutes time
interests the audience.
Generally a new videographer overuse the camcorder’s zoom function.
Zooming should only be used for framing shots, as it tends to make audience
bored. For this editing is the option. You can edit these zooms right out of the
videos and only shows the interesting shots. Good video needs motion, action.
For example, if you are filming a birthday and it takes your small child two
minutes to open his birthday present. So cut down the middle 1.5 minutes.
Your audience wants to see the main portion of the event that your child’s
delight at seeing the present. Also while filming a family member, there’s
always that couple of seconds where they say “Ok. Are you recording?” Now
you can cut that part out and start right with your interview.
A video editing program like Movie Maker 2 makes it easy and there
are several ways to get rid of junk video. Some of the important ways are
discussed as follows:
1. Trimming the ends of clips:
While working on timeline, simply ‘drag the ends’ of each clip to the
exact point where you would like to start and stop. You can set in and out
points of each clip with the help of very easy controls on timeline. if you
zoom in on each clip using magnifying glass, you can accomplish very fine
control of each clips start and stop points by trimming.

Fig 2.3: Trimming the end clips


2. ‘Manual Capture’ only the video that you actually want:
Movie Maker has the option of “manually capturing”. While
transferring digital video from camcorder to your computer,
Moviemaker gives you this option. This will help you to decide exactly
section of your tape you want to transfer. This allows you to capture
only parts of your video tape that you want in your final movie. This
will help to save lot of valuable space on hard drive. This is how Movie
Maker gives you option of capturing an entire video tape.
3. Cutting Clips in half:
You can cut your clips into two parts, while working on timeline with
the help of Movie Maker. Movie Maker allows you to cut your video

The Digitization
28
clips in half. This is a nice way to freed from large chunks of junk files.
Just find the location you want to cut and click the ‘cut button’ located
under the preview monitor in Movie Maker.

Fig 2.4: Cut Button


But while using this option you must be well organized. Otherwise it
will create mess. If you cut around twenty separate video clips, at the
end you might have as many as forty video clips in your collection. It
would consume lot of time. Deleting unwanted films is very easy to
accomplish within an editing program like Movie Maker 2.

Check your progress-2


What is project file?
How do you trim the ends of your video clips?
What is the full form of NTFS?

2.6 SUMMARY
Movie Maker is very powerful and effective video editing software.
Video capturing process has been made relatively easy with the help of
Movie Maker 2.
Windows Movie Maker is a video editing program, included in
Microsoft Windows. It has many features such as effects, transitions,
titles, audio track, and timeline.
Movie Maker has several ways to remove junk files from your video.

2.7 KEY TERMS


• DV-AVI: Also knows as digital video or DV, this format is the
video compression format that camcorder captures onto tape.
• Movie Maker: A video editing program.

The Digitization
29
2.8 END QUESTIONS
11) Explain Windows Media Video 9.
12) Write a note on project files? How to save project files in Movie
Maker 9?
13) Write a note on DV-AVI format.
14) What are the advantages and disadvantage of DV-AVI format?
15) How do we improve capture performance in Movie Maker?
16) How do we defragment our hard drives? Elaborate.
17) How do we apply effects in our files?
18) How NTFS file system is useful to improve the capture
performance in Movie Maker?
19) How to improve capturing performance in Movie Maker 2?
20) How to edit videos using Movie Maker 2?

Answer to check your progress questions


Check your progress -1:
This format is the video compression format that camcorder
captures onto tape.
Disadvantages:-Uncompressed DV-AVI files are very large
in size as compared to the other file formats. Lots of space
consumed by the DV-AVI video.
The Windows Media Video.
Check your progress -2:
The project file is a ‘linking file’ that keeps track of every
item in your home movie.
While working on timeline, simply ‘drag the ends’ of each
clip to the exact point where you would like to start and
stop. You can set in and out points of each clip with the help
of very easy controls on timeline. if you zoom in on each
clip using magnifying glass, you can accomplish very fine
control of each clips start and stop points by trimming.
New Technology File System.

BIBLIOGRAPHY
http://www.microoft.com/windowsxp/using/moviemaker/default.mspx
http://www.atomiclearning.com/k12/moviemaker2?from_legacy=1

UNIT 3 DIGITIZING SOUND


Program Name:BSc(MGA)

The Digitization
30
Written by: Mrs.Shailaja M. Pimputkar,Srajan
Structure:
3.0 Introduction
3.1 Unit Objectives
3.2 Sound
3.2.1 Units for sound measuring
3.2.2 Characteristics of Sound
3.3.3 Sound Pressure Level
3.3 Analog Audio
3.4 Digital Audio
3.4.1 Sampling
3.4.2 Resolution
3.4.3 Quantization
3.4.4 Dithering
3.4.5 Clipping
3.4.6 Bit-Rates
3.4.7 Dynamic Range
3.4.8 Signal-to-noise Ratio
3.4.9 Encoding
3.5 Advantages and disadvantages of Digital Audio
3.6 File size and bandwidth
3.7 Compression
3.8 Summary
3.9 Key Terms
3.10 End Questions
.

3.0 INTRODUCTION
As the third Unit of the course we are going to learn about sound.
Knowledge about the sound will help you understand its importance in our life.
I have always fascinated by sound. I am sure you will feel the same
fascination when you learn about the sound. Sound is present everywhere
.Sounds can be impossible to ignore but at the same time it is difficult to notice
also. We are going to learn about advantages and disadvantages of sound. We
will learn about how sound interact with digital world and also importance of
sound in our life.
Sound waves are created by vibration. The human ear can hear most
sounds. When moving air passes through an object, vibration creates and it
creates sound. This type of sound can be heard on a windy day. A vibrating
object such as a guitar ring produces rapidly varying air pressure and thus
sound reaches our ears. Because string moves in one direction, it presses on
nearby molecules, and causes them to move close together. A decibel is a unit

The Digitization
31
to measure sound's volume. Frequency means total number of vibrations per
second.

3.1 UNIT OBJECTIVES:


After studying this unit you will be able to
Explain the units for measuring sound
Describe analog audio
Describe digital audio
Explain file size and bandwidth
Describe compression

3.2 SOUND
When an object vibrates that time sounds create. That means any
vibrations that travel through the air or another medium and can be heard when
they reach a to the person's ear is called sound. For example, when you play a
guitar, the strings of the guitar vibrate up and down. Because of the vibration
sound creates. When the string moves up, the air compressed which is above it
and when the strings moves down the air moves with it and expands. Due to
this compression and expansion it creates differences in air pressure. The
pressure differences in the air move away from the drum surface and creating a
sound wave. This is way we can hear the sound comes out from the Guitar.

Fig 3.1: Conversion of sound wave to Analog Signal

3.2.1Units for sound measuring:

Decibel (dB) – A decibel is used to measure the intensity of a sound.


One third of a bell is called decibel (dB), which names after Alexander
Graham Bell. (this is why letter B in dB is capital)
Sone: The sone is a unit of how loud a sound is perceived.
Phon: The phon is a unit of loudness level for pure tones.
Hertz: The hertz (Hz) is the unit of frequency.

3.2.2 Characteristics of sound

The Digitization
32
Sound can be characterized by the following three properties:

1. Pitch/frequency:
Pitch is the frequency of a sound that understands by human ear.
Frequency is measured in the number of sound vibrations in one
second. To measure frequency unit is called Hertz (Hz). A low
frequency produces a low pitch note and high frequency gives produce
a high pitch note.
2. Loudness/amplitude:
Loudness means the volume of a sound. Amplitude measures force of
the sound wave. Decibels or dBA is a unit to measure loudness. Normal
speaking voices are around 65 dBA. Sounds that are 85 dBA or above
can permanently damage your ears.
3. Quality/timber:
Tone is a measure of the quality of a sound wave. Timber means the
quality of a tone that distinguishes it from other tones of the same
pitch. A violin has a different timbre than a piano.

3.2.3 Sound Pressure Level

The intensity of a sound is called sound pressure level. Decibel is a unit for
measuring SPL.SPL is actually a ratio of the actual Sound Pressure and a fixed
reference pressure. Reference pressure is the lowest intensity sound that can be
heard by most people. SPL can be measured with a Sound Pressure Level
Meter in decibel. Decibel is a logarithm scale representing how much audio
signal or sound levels varies from reference level or another signal.

Fig 3.2: Relationship between Sound Pressure Level and frequency

3.3 ANALOG AUDIO


There are two ways in which sound is recorded and stored: Analog and
Digital. Analog audio refers as the method used for recording audio that make
exact copy of original sound waves. For example Vinyl records and cassette
tapes are the medium of analog sound.

The Digitization
33
Fig 3.3: Analog Signals

As refer the above diagram, the analog sound wave makes an exact
copy of the original sound wave. Analogue audio recordings, such as tape,
capture continuous changes of sound during recording. For example sound
pressure recorded through a microphone is converted to electrical voltage. The
changes in voltage represent changes in amplitude and frequency and are
recorded onto a medium such as tape. The first machine used to capture analog
sound called Phonograph. This machine was invented by the Thomas Edison in
1877.

Check your progress-1


What is sound?
What are three properties of sound?
What is analog audio?
What is sound pressure level?
What is loudness? What is a unit to measure it?

3.4 DIGITAL AUDIO


A method of storing values in binary form is called digital data. Binary
is a language that essentially for computers. All digital file formats store
information in binary form for any type. A digital audio signal is represented
by the stream of numbers, and it can be stored as a computer file and
transmitted across a network. In other words, digital data means reproduction
and transmission of sound stored in digital format. Digital audio refers to a
digital representation of the audio waveform for processing, storage or
transmission.

3.4.1 Sampling:

The Digitization
34
The value is sampled at regular intervals, thousands of time per second
to convert an analog signal to digital form. On a scale, the value of each sample
is rounded to the nearest integer that varies as per the resolution of the signal.
Thereafter, the integer is converted to the binary numbers.
Sampling Rate or sampling period: sampling rate refers to how many
times the value of analog signal is measured per second. Sampling rate
measured in Hz or kHz. Sampling rate of audio CD is 44.1 kHz (44100 Hz) as
shown in fig 3.5.Human hearing range roughly 20 kHz is the highest
frequency.

Fig 3.4: Sampling rate

In above figure each line represents a new sample. The time between each
line represent the sampling period, which equals to 1/44,100 of a second.

Fig 3.5: Sampling rate of CD

3.4.2 Resolution:
The range of numbers that can be assigned to each sample is the
resolution of a digital signal. Bit depth is the number of bits of information in
each sample, and it directly corresponds to the resolution of each sample. For
example CD uses 16 bits per sample, DVD audio and Blu-ray disc support up
to 24 bits per sample. Higher resolution reduces quantization distortion and
background noise and increase the dynamic range.

3.4.3 Quantization:

The Digitization
35
The process of converting a continuous range of values into a finite
range of discreet values is called Quantization. In simple words, Values can be
"rounded" to a commonly-agreed standard for simplicity. For example, our age
is usually calculated to the number of years we have been alive as of their last
birthday. This is the function of analog digital converter. Quantization also
forms the core of essentially all lossy compression algorithms. The difference
between an input value and its calculated value (such as round-off error) is
referred to as quantization error.

Fig 3.6: Quantization Errors

At the lower level, quantization distortion increases because the signal


uses a smaller portion of the available dynamic range. Therefore, any error
occurs as a greater percentage of the signal. A key advantage of audio-
encoding scheme is that to reduce quantization errors, more bits can be
allocated to low-level signals. A process of reducing the number of colors
required to represent an image is called color quantization. For example,
converting a photograph to GIF format needs less than 256 colors.

3.4.4 Dithering:
Dither is a process of adding nice to the signal. It is help to preserve the
information, which would be lost otherwise. Basically, Dithering is a process
that adds broadband noise to a digital signal. Digital audio is highly
advantageous and produces results that are much better than many analog
systems. Quantization errors occur due to reduction of bit resolution. This
process is also known as truncation distortion.
To understand why dithering is important, let’s take one example.
Mostly, all mastered audio files are 16bit. Although 24bit audio has more
details and is a higher quality sound. If you try and play a 24 bit audio file
through one of these 16 bit playback devices, it will create horrible sound. For
this you have to use dithering tool in your production chain.

The Digitization
36
Fig 3.7: Dithering Process

Dithering is always used in digital audio and video processing, where it


is applied to bit-depth transitions; it is applied in many different fields where
digital processing and analysis are used-especially waveform analysis. This
includes digital photography, RADAR, weather forecasting system etc.

3.4.5 Clipping:
Clipping means when an amplifier is pushed to create a signal with
more power than its power supply can produce. The amplitude of the electrical
signal should not exceed the maximum. It was observed that the clipped
sample often sounds quite different from the original.
For example your audio files have some space limitation. If you keep
increasing the audio level up, after some extent when we reach at maximum
level, this is the time where clipping would occur and the audio signal craps
out. In digital audio, there is a limit on how far an input sound can be
represented.

Fig 3.8: Clipping

Every audio device, like speaker, a transistor or a tube, has a maximum


limit of how high of a signal can pass through easily; when the level of signal
going through exceeds that maximum, the tops of the waves get Clipped off.

3.4.6 Bit-Rates:

The Digitization
37
To represent the signal, the term ‘bit- rate’ is used to know how many
bits are transfer in one second. For digital audio, the bit rate is expressed in
thousands of bits per second (kbps) and directly relates to sound quality and
file size. If the sound quality is poor and bit rate is high then it gives better
quality but larger file size.
To calculate the bit rate of uncompressed audio you have to multiplying
the sampling rate with resolution and number of channel. For example CD
audio has a resolution of 16 bits. It has two channels and sampling rate is
44,100 times per second. So using above formula, result is approximately 1.4
million bits per second would be the bit rate.

Table 3.1: Calculating Bit-Rates

3.4.7 Dynamic range:


The difference between the lowest and the highest usable signal create
through a strong medium or transmission is called dynamic range. We as a
Sampling Rate x Resolution x No. Of Channels = Bit Rate
44100 x 16 x 2 = 1,411,200
human are capable of hearing anything from a low sound in
a soundproofed room to the sound of the loudspeaker. Such a difference can
exceed 100 dB. Whereas digital audio at 16 bit resolution has a theoretical
dynamic range of 96 dB. The dynamic range is varies depending on the quality
of recording and playback equipment. Vinyl records and cassette tapes has
much lower dynamic range than CDs. Depending on the type of tape; the
dynamic range of cassette tapes also varies.

3.4.8 Signal-to-Noise Ratio:


Signal-to-noise ratio is defined as the ratio of the power of meaningful
information and the power of background noise i.e. unwanted signal. Signal-to-
Noise Ratio specifications are used in components. An average quality stereo
tape deck will usually have a signal-to-noise ratio of about 60 dB to 70 dB. In
signal –to-noise ratio, each additional bit of resolution corresponds to an
increase of 6 dB.

3.4.9 Encoding:
The process of converting uncompressed digital audio to a compressed
format such as MP3 is called encoding. A codec is the algorithm used in
encoding software. For a particular format, there is often more than one codec.
Even for the same format, different codec can vary widely in quality and speed.

Check your progress-2


What is Quantization?
What is dithering?
What is encoding?

The Digitization
38
What is dynamic range?

3.5 ADVANTAGES AND DISADVANTAGE OF


DIGITAL AUDIO
3.5.1 Digital Audio Advantages:

Better copy ability:

In analog recording, information is lost many times and with every copy
there is a noise in recording. Even the best analog systems lose about 3dB
of signal-to-noise ratio when a copy is recorded. But we can copy digital
audio from one digital device to another without a losing any information.
So the advantage is perfect copies can be made with digital recording.

Digital copies can also be created much faster than analog copies, which
usually must be made in real time. For example, to copy 60 minutes of
music from CD, with the help of an analog device like a cassette deck takes
at least 60 minutes to record. But to copy the same 60 minutes of music
with digital audio takes less than 5 minutes on a system with a fast CD-
ROM drive.

The ability of making perfect copies creates a problem also. And for this
reason RIAA has gone to so much trouble to introduce the SCMS (Serial
Copy Management System) for consumer audio equipment. SCMS prevents
multiple generations of copies and is required by the Audio Home
Recording Act of 1992 to be used on all consumer digital audio recording
devices sold in US.

It will take the same amount of time as with analog equipment, if you
are making a master recording with digital equipment. But once a digital
recording is done on your PC, you can make as much as copies in fraction
of time.

Quality and control benefits

In general, the most important advantage of digital audio is its consistent


playback quality. Improved audio quality is one of the best reasons to use
digital two-way radios (walkie-talkies) in commercial area..With analog two-
way radios, the natural human voice is carried by radio signals. You can hear
the person’s voice exactly as it sounds. It means you can hear all other sounds
like back ground noise, interference, and obstacles. This will degrade the
quality of sound. But in contrast, digital radios create an electronic version of
our voices. The decoder inside the radio converts traditional analog voice
signals into positive and negative binary signals, and transmits this digital
information to the other radio. Then the decoder on the other end translates the

The Digitization
39
signal back to an analog voice, so the signal goes from analog to digital and
back to analog.

Bit Error Correction

Built-in error correction system is present in most digital audio media,


such as CDs and DATs.Approximately 25% of the disc is used for error
correction data on an audio CD. The player will attempt to reconstruct the
missing data by interpolation if a bad scratch causes an error that can’t be
corrected.
For example if you are driving a car on a downward slope and if your
car radio has analog audio system then signal steadily degrade and become
fades. Also noise starts coming in as more obstructions and distance comes
between the receiver and the transmitter. A digital audio media has bit-error
correction tool. This tool helps to re-assemble the voice signals. In the same
situation if you have digital audio, bit-error correction prevents the quality of
the voice. Hence, unlike analog platforms, the audio quality stays clear right to
the very edge of the coverage range.

Wider Dynamic Range:


Compared to less than 80 dB for the best analog system, digital audio at 16
bits can achieve a dynamic range of 96 dB. For example levels within the same
composition can range from the relative quiet of a flute solo to the loudness of
many instruments playing simultaneously.

Durability:
Digital media like CDs, Minidisc are more durable than analog media. Because
of this people are preferred CDs then vinyl records. Each time when you play
vinyl records the oxide coating are rub away. Vinyl records are particularly
prone to warping and scratching. But you can play CD hundreds of time,
without losing quality.
Both digital and analog tapes can suffer degradation from magnetic
fields, but on the other hand, some tape is stronger and has thicker oxide
coating, such as DAT are much more durable than analog tapes.

Increased resistance to noise:


In analog system as the signal passes through analog circuits, crackling
noise and hum from electromagnetic frequency (EMF) interference is
picked up along the way. By thermal noise from analog components
background hiss is also generated. Digital signals are virtually capable to
picking up these types of noise, although any noise that enters the signal
before it’s converted to digital will be reproduced along with the rest of the
signal.

Easy operation and Automation:


Digital system has an ability to memorize and recall settings whenever they
are needed. For example if a speaker doesn't maintain a constant distance from
the microphone, then digital systems can automatically suppress feedback and

The Digitization
40
compensate for volume variations. A variety of previously tricky tasks are
being made easier or fully automated by advanced digital technology.

3.5.2 Digital Audio Disadvantages:

You should have enough hard disk space, CPU processing power,
RAM; otherwise digital audio won't work properly.
Digital audio files are bigger than MIDI (musical instrument digital
interface) files.
Digital system can have poor multi-user interfaces.

3.6 FILE SIZE AND BANDWIDTH


Bandwidth is like a pipe that carries a stream of bits in given time
period. Bandwidth, also called data transfer rate. Bandwidth is usually
measure in bits per second (bps). File size is measured in bytes. One kilobyte
(K or KB) equals about 1,024 bytes. If you want to calculate the file size,
then multiply sampling rate by number of channels, resolution and time in
second and divide by bit rate. If you change the bit rate, the file size changes
proportionally. Bit rate has direct relation to the file size. Following table
shows the formula for calculating file size for an uncompressed audio.

Sampling X Resolution X No.Of X Time / Bits = File


Rate Channels in /Byte Size(in
Second s Bytes)

44,100 X 16 X 2 X 60 / 8 = 10,584
,000

Table 3.2: Calculating File Size

3.7 COMPRESSION
As explained in unit 1, Data compression is the process that help to
reduce the data file size. The inverse process is called decompression
(decoding). Software and hardware that can encode and decode are called
decoders.
.Compression helps reduce resource usage, such as data storage space or

transmission capacity. In digital audio, reducing the data storage requirements


allows us to fit more songs into our iPods and download them faster. While
recording video and audio to a digital format quality, size and bitrates going to
be effect. Most formats use compression to reduce file size and bitrates by
reducing quality. Compression reduces the size of movies, so that we can
stored and playback the same movie. As we discussed, there are two basic
categories of compression. Lossless and Lossy.Lossless Codec can only reduce
audio file size to about half of the original size. While Lossy Codec gets audio

The Digitization
41
file size to around 1/10th to 1/15th of the original size. Following table shows
some file format used for audio compression.

Audio Formats
File Format Compression
WAV .wav Uncompressed Full size-Full quality
AIFF file .aif Uncompressed Full size-Full quality
SDII files .sd2 Lossless Reduce size-full quality
.ALAC(Apple lossless Lossless Reduce size-full quality
Audio codec)
.FLAC(Free Lossless Lossless Reduce size-full quality
Audio Codec)
MP3 .mp3 Lossy Reduce size-reduce quality
WMA.wma Lossy Reduce size-reduce quality
AAC .m4a Lossy Reduce size-reduce quality
MP4 .mp4 Lossy Reduce size-reduce quality

For example, to download one minute song could be downloaded in


less than 20 minutes and would take up less than 4MB of space with a 28.8
kbps modem and with MP3 encoded at 128 kbps. In this case, 500 songs could
be held in a 2GB hard disk. Newer generation of MPEG audio, such as AAC
(Advance Audio Coding) has sound quality and higher levels of compression.

Dynamic range compression:


Dynamic range compression reduces the range in dB between the
lowest and highest levels of signal, but does not affect the file size or
bandwidth requirement. Dynamic range compression is often used by recording
engineers to make songs sound louder without clipping.

Check your progress-3


What is bandwidth?
How to measure bandwidth and file size?
What is the use of dynamic range compression?
Which formats used lossless compression?
Why compression is useful?

3.8 SUMMARY
In digitization, computers convert a sound wave first into an analog
signal, and then convert that into the digital form.
Sound waves are created by vibration.

The Digitization
42
Some of the benefits of digital representation of sound are higher
fidelity recording than was previously possible, synthesis of new
sounds by mathematical procedures, application of digital signal
processing techniques to audio signals, and so on.
Dithering is a process that adds broadband noise to a digital signal.
Data compression is the process that help to reduce the data file size.
The inverse process is called decompression (decoding).
Dynamic range compression reduces the range in dB between the
lowest and highest levels of signal, but does not affect the file size or
bandwidth requirement.

3.9 KEY TERMS


• Sound: Any vibrations that travel through the air or another
medium and can be heard when they reach a person's or animal's ear
is called sound.
• Resolution: The range of numbers that can be assigned to each
sample is the resolution of a digital signal.
• Encoding: The process of converting uncompressed digital audio to
a compressed format such as MP3 is called encoding.
• Quantization: The process of converting a continuous range of
values into a finite range of discreet values is called Quantization.
• Dithering: Dithering is a process that adds broadband noise to a
digital signal.
• Dynamic range compression: Dynamic range compression
reduces the range in dB between the lowest and highest levels of
signal, but does not affect the file size or bandwidth requirement.

3.10 END QUESTIONS


21) What is a sound? What are the units to measure the sound?
22) Explain the characteristics of sound.
23) What is bit rate?
24) What is bandwidth?
25) What is analog audio? How it is different from digital audio?
26) What is dithering?
27) What is dynamic range?
28) What is clipping?
29) What are the advantages of digital audio?
30) Write a note on compression.

Answer to check your progress questions


Check your progress -1:

The Digitization
43
Any vibrations that travel through the air or another medium
and can be heard when they reach a to the person's ear is
called sound.
Pitch, Loudness, Quality.
Analog audio refers as the method used for recording audio
that make exact copy of original sound waves.
The intensity of a sound is called sound pressure level.
Loudness means the volume of a sound. Decibels or dBA is
a unit to measure loudness.

Check your progress -2:


The process of converting a continuous range of values into
a finite range of discreet values is called Quantization.
Dither is a process of adding nice to the signal.
The process of converting uncompressed digital audio to a
compressed format such as MP3 is called encoding
The difference between the lowest and the highest usable
signal create through a strong medium or transmission is
called dynamic range
Check your progress-3:
Bandwidth is like a pipe that carries a stream of bits in given
time period.
. Bandwidth is usually measure in bits per second (bps). File
size is measured in bytes.
Dynamic range compression reduces the range in dB
between the lowest and highest levels of signal, but does not
affect the file size or bandwidth requirement.
SDII files, ALAC files, FLAC files.
Data compression is the process that help to reduce the data
file size.

BIBLIOGRAPHY
‘What is Digital Presentation’? Library Technology Reports 44:2
(Feb/March 2008)

Cloonan,M.V., and S.Sanett, ‘The preservation of digital content’,


Libraries and the Academy.Vol.5, No.2(2005):213-37

UNIT 4 DIGITAL VIDEO CAPTURING

The Digitization
44
Program Name:BSc(MGA)
Written by: Mrs.Shailaja M. Pimputkar,Srajan
Structure:
4.0 Introduction
4.1 Unit Objectives
4.2 Digital video recording
4.2.1 Digital Video Recorder (DVR)
4.2.1.1 Types of DVRs
4.3 Video Capture Device-Analog Video to PC
4.4 High-Definition (HD) Options for Digital Video recording
4.4.1 Satellite Alternatives
4.4.2 Cable Alternatives
4.4.3 Sony HD DVRs
4.5 Capture Cards
4.6 Summary
4.7 Key Terms
4.8 End Questions

4.0 INTRODUCTION
In this unit we are going to learn about digital video. To capture means
to save. Video capture means to store or save video images in a computer. In
technical term, Video capture is the process of converting an analog
video signal.

Today everything is digitalized. Video capturing techniques are also


changing day to day. Previously we were VCR (Video Cassette Recorder), now
DVR takes it place because of its great flexibility and simple operation. We are
going to learn more about Digital Video Recorder and its advantages in this
unit.

4.1 UNIT OBJECTIVES:


After studying this unit you will be able to
Describe the video recording
Describe the video recorder
Explain the high definition options for digital video recording
Describe high definition TV and video capture cards
Explain TV tuner/Video capture cards

The Digitization
45
4.2 DIGITAL VIDEO RECORDING
Digital video recording is a technique to compressing video signal by
using a video encoder like MPEG-2.it is used for digitally recording TV and
videos. The encoder can also be used for DVD movies.

The following are the three primary methods to record TV and video
digitally.

i. Attaches computer with Video capture or TV which records TV


and Video to the computer’s hard drive.
ii. A stand-alone set-top Digital recorder (DVR such as TiVo)
which records to built-in hard drive.
iii. A stand-alone set-top Digital recorder which records to a built-
in DVD drive.

4.2.1 Digital Video Recorder (DVR)

Digital Video Recorder is an electronic device connected to a television


that allows you to record various programs. DVR is also called a personal
video recorder (PVR.)The DVR is same work as a standard video recording
device. But DVR doesn’t need any extra storage for recording program onto. A
DVR has an internal hard drive. It has some specific memory capacity. The
device has some software for personal computers, portable media players
(PMP) and complete set-top boxes that facilitate playback and video capture to
and from disk.

Fig4.10 : Digital Video Recorder

Some consumer electronics manufacturers provide televisions with inbuilt


DVR hardware and software. LG was the first company to launch this type of
televisions which as DVR inbuilt in the television in 2007. Now camcorders
and mobiles are available with a camera and digital video recording ability.

4.2.1.2 Types of DVRs

• Replay TV and TiVo DVR:

The Digitization
46
Replay TV and TiVo are the two early consumer DVRs that were
launched at the 1998 Consumer Electronics Show in Las Vegas.
In 1999 Dish Network’s DISH player receivers come with full DVR
features. These digital set-top devices have a feature to record television
programs without using a videotape.

Fig 4.2:
Replay TV

TIVO is one
of the most popular brands of DVRs. TiVo were launched in 19998 at
Consumer Electronics Show in Las Vegas. The television signal comes into
DVR through cable, satellite or antenna. After that signal goes into MPEG-2
encoder for compression and it converts analog signal to digital. From encoder
signal goes in hardware for storage and MPEG-2 decoder, which convert signal
back to analog and send to television for viewing.

Fig 4.3: Rear view of TiVo

Some consumer electronic manufactures have commenced to provide


televisions with inbuilt DVR hardware and software to the television itself. LG
was the first to launch this kind of DVR in the year 2007.

• Dual Tuners:

These Devices have two independent tuners within the same receiver. Both
the tuners function separately to one another. The main use for this
characteristic is the ability to record a live program while watching another live
program at the same time or to record two programs simultaneously time. Some
dual-tuner DVRs can also operate two television sets simultaneously.

The Digitization
47
Fig 4.4: Dual Tuner

• PC based DVRs:

PC based digital video recorders are camera video recorders that


connect to computers to the security cameras and store video on hard disks.
A PC-based system looks similar to your PC. It has hard drive; LAN board,
motherboard, and video card are all located in the computer tower. Personal
computers running Linux, Mac OSX and Microsoft Windows can be
exchanged into DVRs. It also has a DVD-writer to burn video images to
and a card that records images. Computer based digital video recorders are
more expandable and flexible than standalone DVRs. PC based recorders
offer advanced video analytic options. PC based systems also have bigger
storage capacity. It has more power and greater memory. So that user can
store larger amounts of footage. PC based recorders are more user oriented.
• Standalone DVR:

A standalone system looks a lot like an old VCR with all their
components encased in one cabinet. This includes the CPU, IC chips, and
power supplies. Main drawback of this system is everything you need to
have to operate the unit is located in one motherboard. If one component
fails then you have to change the whole unit. Also it has limited storage
capacity.

Fig 4.5: Standalone DVR

Check your progress-1

The Digitization
48
What is digital video recording?
What is DVR?
How is Dual Tuner work?
When was the TiVo DVR launch?
Who launched the first inbuilt DVR in television and when?

4.3 VIDEO CAPTURE DEVICE-ANALOG VIDEO


TO PC
To understand video capture device and video capture card, we should
first know what is a video capturing .Video signal generated by video camera
that is analog signals converting into digital format and store this digital video
on a computer’s mass storage is called video capturing. Video Capture Device
is a device that as capability to transferring an audio and video signal from an
electronic device such as a VCR, Television or DVD Player to a Computer.

Video capture from analog devices requires a special video capture


card that converts the analog signals into digital form and compresses the data.
The Video Capture Software converts the Analog to a video format such as
Mpeg-1, Mpeg-2, .avi or .wmv. This software typically allows users to edit
their Video and also burn it to DVD or Video CD.

In this section we are going to see how to capture video from an analog
video source to a Windows XP computer using an external Video Capture
device. We need source, a capture device, and capture software. For editing the
video, we will need Video Editing Software. If you want to record your video
to DVD you will also need DVD Recording software. You will need a DVD
Burner to physically record the DVD.
To understand the process, we are taking VCR as source, ADS Techs
DVD Press as the capture device and Pinnacle Studio Plus 9 as capture
software. This works with any other combination of capture hardware using
USB 2.0 cable, capture software or analog source. For e.g. Hi8, a VHS-C
camcorder, or 8 mm.

Following are the steps to Capture Analog Video to a PC using a Video


Capture Device
1. First, connect your video capture device to the port of your PC using
the USB 2.0 cable. Switch on the capture device by connecting it into
electrical socket. As shown in Fig 4.6

The Digitization
49
Fig 4.6: PC with Video Capture Devic

2. Next, switch on your PC. The Pc should be able to recognize the


capture device.
3. Plugging in the source by connecting device’s video and audio out
cables into the video and audio inputs on the capture device. For a VHS
VCR, connect the RCA video (yellow cable) output and RCA (name
derives from Radio Corporation of America) audio (white and red
cables) outputs to the RCA inputs on the DVD XPress Capture device.
4. Start the video capture software. Double-click the icon on your desktop
or choose Start>Programs>Pinnacle Studio Plus ((your whichever
software program you are using) to run the software.
5. You have to ensure and tell the software in which format you want to
convert your video. If you are planning to record on CD, then use
MPEG-1 format. For DVD used MPEG-2 format.
6. To capture your video, click the start capture button and a dialog box
open for a file name. Give some name to the file and click on the Start
Capture button.
7. Once your video is captured to your hard drive it can then be imported
into a video editing software application for editing or recorded to CD
or DVD using CD/DVD Recording software and a CD/DVD writer.

4.4 HIGH DEFINATION (HD) OPTIONS FOR


DIGITAL VIDEO RECORDING (DVR, SONY’S
HD DVRS)
High Definition (HD) Digital Video Recording devices are available in
market these days with the very affordable cost. The DVRs provide all the
functionality of a standard DVR like TiVo but also allow to view and record
HD broadcasts. If you are a Cable subscriber, there are HD DVRs available on
rent from the providers for a monthly fee. You can buy HD DVRs that are
available with satellite providers. Following are some High Definition Digital
Video Recording options:
4.4.1 Satellite

There are two varieties of Satellite TV: DirecTV and Dish Network. Each
company offers a High Definition Digital Video Recorder that also works as a
Satellite Receiver.

The Digitization
50
Dish Network: Dish Network is a dual tuner receiver. It can record one
show while watching another. It provides customers the ViP722 DVR,
two-TV HD DVR receiver. This is Dish Network's top-of-the-line
Receiver, because of this it lets you to watch and record both HD and
SD broadcasts, while also using the receiver as a DVR. It also has hard
Drive for up to 350 hours of SD recording, and up to 55 hours of HD
recording. It also provides an Electronic Programming Guide (EPG) for
scheduling recordings in advance.
Direct TV: DirecTV provides an HD DVR that includes the TiVo
Service built-in to the Receiver. One not only obtain HD broadcast for
recording, but also receive a fully operational TiVo DVR. It has 250GB
hard drive.

4.4.2 Cable
Cable TV suppliers provide HD DVR at a very low –cost price, which is
much less price than satellite service supplier. Most Cable provider
organizations are now providing HD DVR service for a low monthly fee,
and provide their customers with either a Scientific Atlanta 8300HD DVR
or a Motorola DCT6412 HD DVR, depending on the Cable provider.

Other High-Definition Digital Video Recording options:


4.4.3 Sony’s HD DVRs
Sony builds two types of HDV DVR models: The DHG-HDD250 and
the DHG-HD500 model. Both these models work with existing analog cable
arrangements, and include a free Electronic Programming Guide (EPG).Also,
they have aerial device such as an antenna for recording free over-the –air
HDTV.
The DHG-HDD500 can record and store at least 60 hours of high-
definition video and up to 400 hours of standard-definition video.DHG-
HDD250 model can record up to 200 hours of standard definition video and at
least 30 hours of high definition video. Both include several analog inputs and
outputs, in addition to HDMI, digital audio outputs and component. These are
costly and high-end DVRs and are ideal for analog cable subscribers who want
the power to record HD signals free over the air.

4.5 VIDEO CAPTURE CARDS


Capture Cards are internal or external devices. Video capture cards that
convert the analog signal into digital form and compresses the data. Capture
cards record video into computer’s hard drive. Internal cards fit into a PCI
slot on your PCs motherboard, external cards needs USB 2.0 cable to attach.
There are also some exit cards that record digital video through IEEE1394
(Firewire) or video with analog inputs(S-video and composite).
Following are some various kinds of capture cards:
• TV and Video Capture Card:
These cards capture the video from analog signal. This type of cards will
only hold S-video and composite inputs to record video and audio. These

The Digitization
51
cards can externally attach via USB 2.0 or internally into a PCI slot. Analog
videos can also be recorded by these cards from a camcorder, DVD
player/recorder or VCR. These types of cards do not input digital signals or
capture from coaxial cable. They typically come bundled with TV and Video
Capture software.
• Video only capture cards:
These kinds of cards usually used by professionals. These are higher-end
Video Capture Cards. These cards are basically used for editing video. These
cards capture with IEEE1394 (Firewire) inputs from digital camcorders that
usually bundled with high-end video editing software.

• TV tuner Card:
TV tuner cards is kind of television tuner that record video or television
programs on to computer’s hard disk. In other words, A TV tuner is a device
that allows connecting a TV signal to your computer. Most TV tuner works
as video capture cards. They provide an Electronic Programming Guide
(EPG) to schedule recordings in advance. They also function as a Digital
Video Recorder, so users can pause and rewind live TV.

Check your progress-2


What is TV tuner card?
What is video capturing?
What is video capture device?
What is capture cards?

4.6 SUMMARY
Video signal generated by video camera that is analog signals
converting into digital format and store this digital video on a
computer’s mass storage is called video capturing.
Video capture from analog devices requires a special video capture
card that converts the analog signals into digital form and compresses
the data.
Digital Video Recording doesn’t need any extra storage for recording
program onto. A DVR has an internal hard drive with a
specific memory capacity. The device includes s software for personal
computers, portable media players and complete set-top boxes that
facilitate playback and video capture to and from disk.
High Definition digital video recording devices are easily available
these days and cost less for user.

The Digitization
52
DVR provide all practicalities of a standard DVR like TiVo, but also
allow to view and record HD broadcasts.
Capture Cards are internal or external devices. Video capture cards that
convert the analog signal into digital form and compresses the data.
Capture cards record video into computer’s hard drive.

4.7 KEY TERMS


• Digital video recording: A method that record video digitally to a
disk drive or other memory medium within the device. The device
includes software for personal computers, portable media players
and completes set –top boxes that facilitate playback and video
capture to and from disk.
• Video capturing: Video signal generated by video camera that is
analog signals converting into digital format and store this digital
video on a computer’s mass storage is called video capturing.
• TV tuner card: TV tuner cards are kind of television tuner that
record video or television programs on to computer’s hard disk. In
other words, A TV tuner is a device that makes it possible to
connect a TV signal to your computer.

4.8 END QUESTIONS


31) What is Digital video recorder?
32) Explain the types of DVRs?
33) What are the high definition options for digital video recording?
34) What are the three primary methods to record TV and video
digitally?
35) What is capture card? Explain in details.
36) What is a difference between PC based and standalone DVR?
37) Write a note in Satellite and its types?
38) How do you capture analog video to a PC using external video
capture device?
39) Write a note on Replay TV and TiVo.

Answer to check your progress questions


Check your progress -1:
Digital video recording is a technique to compressing video
signal by using a video encoder like MPEG-2.
Digital Video Recorder is an electronic device connected to
a television that allows you to record various programs.
These Devices have two independent tuners within the same
receiver. Both the tuners function separately to one another.
In 1998

The Digitization
53
LG in 2007

Check your progress -2:


TV tuner cards is kind of television tuner that record video
or television programs on to computer’s hard disk.
Video signal generated by video camera that is analog
signals converting into digital format and store this digital
video on a computer’s mass storage is called video
capturing.
Video Capture Device is a device that is capable of
transferring an audio and video signal from an electronic
device such as a VCR, Television or DVD Player to a
Computer.
Video capture cards that convert the analog signal into
digital form and compresses the data.

BIBLIOGRAPHY
1. Jenkins, Henry,2006. Convergence Culture: where Old and New
Media Collide. Buying into American Idol. New York and London:
New York University Press.
2. Maxine K.Sitts. (ed).2000. A management Tool For Preservation
and Access. Andover ,Massachusetts: Northeast Document
Conservation Center,2000.
3. Digital Projects Guidelines. Arizona State Library, Archives and
public Records, http:// www.lib.az.us/digital/
4. Green, David, 2003. The NINCH Guide to Good Practice in the
Digital Representation and Management of Culture Heritage
Materials. New York: NINCH

FOR 5.3 SECTION : DIGITIZATION FILE


FORMAT

IN THIS SECTION CONTAIN TAKEN FROM


FOLLOWING SITE IN THE OLD
BOOK.

The Digitization
54
HTTP://WWW.UCD.IE/IVRLA/WORKBOOK2/
WDIGFILEFORMAT.HTML
DO I HAVE TO FOLLOW THE SAME
BECASE ITS ACTULY A PROJECT
DONE BY IRISH VIRTUAL
RESEARCH LIBRARY AND
ARCHIVE TEAM, AS PART OF THE
PILOT PROJECT

I KEPT THIS SECTION BLANK. PLESE


GUIDE ME FOR THE SAME.

UNIT 5 DIGITIZING FILE FORMATS


Program Name:BSc(MGA)
Written by: Mrs.Shailaja M. Pimputkar,Srajan
Structure:
5.0 Introduction
5.1 Unit Objectives
5.2 File Format Glossary
5.2.1 TIFF

The Digitization
55
5.2.2 JPEG
5.2.3DjVu
5.2.4 PDF
5.2.5 WAV
5.2.6 MP3
5.2.7 Real Audio
5.2.8 MPEG 21
5.3 Digitization file format
5.3.1 Images
5.3.2 Text
5.3.3 Data set
5.3.4 Audio
5.3.5 Video
5.4 Summary
5.5 Key Terms
5.6 End Questions

5.0 INTRODUCTION
In this

5.1 UNIT OBJECTIVES:


After studying this unit you will be able to
Explain Tagged Image Format (TIFF)
Describe Joint Photographic Experts Group (JPEG)
Explain DjVu
Describe Portable Document Format (PNG)
Explain Waveform Audio Format
Explain MPEG-1 Audio Layer 3
Explain Real Audio ram
Describe MPEG-21
Describe different file formats

5.2 FILE FORMAT GLOSSARY


There are a plenty number of different computer file formats available.
In this section we describe some of the most popular file formats used for
different types of files. file type can be document, image, music, video ,e-book,
CAD etc. A particular file format is often indicated as part of a file's name by a
file name extension.

Following are some popular file formats for different file types:

The Digitization
56
5.2.1 TIFF

TIFF (Tagged Image File Format) was originally developed by a


company called Aldus in 1986, and is now owned by Adobe systems. The .tiff
file extension is used for one of the common graphics format. TIFF is adopted
lossless file formats for storing bit-mapped images, which means it does not
lose information during compression. TIFF is very popular in printing and
publishing industry. The TIFF format is widely supported by image-
manipulation applications.

It is a file format for storing images, including line art and photographs.
TIFF graphics can be any resolution, in black and white, gray-scaled, or
color.TIFF is very suitable format for high-color depth images. TIFF doesn’t
suitable for vector data. TIFF files only contain bitmap data. TIFF used the
LZW compression. LZW compression is the compression of a file into a
smaller file using a table-based lookup algorithm.

TIFF describes image data that typically comes from scanners, and
paint and photo retouching programs. TIFF includes a number of compression
schemes that allow developers to choose the best space or time tradeoff for
their applications. TIFF is portable. It can suit with any operating systems, file
systems, compilers, or processors.

5.2.2 JPEG

JPEG (The Joint Photographic Experts Group) is graphical file


format for editing still images. JPEG files are usually used in digital cameras
and web pages, because JPEG compresses the data to be very much smaller in
the file.

JPEG uses lossy compression algorithms for image.JPEG is used when


small file size is more important than maximum image quality .for examples
images for web pages, email, memory cards, etc. The JPEG file is a great
format as it often manages to compress files to 1/10 of the size of the original
file which is especially good for saving on bandwidth. For JPEG file type file
extension is .jpeg

5.2.3 DjVu

DjVu is a popular document file format. DjVu was initially developed


by AT&T Labs from 1996 to 2001. DjVu is a new image compression
technology. It is an open sourced alternative to PDF. Djvu primarily stored
scanned documents, especially those containing a combination of text, line
drawing and photograph. DjVu allows the distribution on the Internet of very
high resolution images of scanned documents, digital documents, and
photographs. DjVu allows content developers to scan high-resolution color
pages of books, magazines, catalogs, manuals, newspapers, historical or
ancient documents, and make them available on the Web.

The Digitization
57
DjVu developers report that for color document images that contain text
and pictures, DjVu files are typically 5 to 10 times smaller than JPEG at
similar quality and for black white document images, DjVu files are 10 to 20m
times smaller than JPEG. The main technology behind DjVu is these files are
typically separated into three images - the background and foreground (around
100 dpi) and the mask image which is higher resolution (e.g. 300 dpi). It
separate the text from the backgrounds, DjVu can keep the text at high
resolution, while at the same time compressing the backgrounds and pictures at
lower resolution with a wavelet-based compression technique. DjVu is used by
many commercial and non-commercial web sites on the Web today. For DjVu
file type file extension is .djvu.

5.2.4 PDF

PDF (Portable Document Format) is another popular document file


format.PDF developed by Adobe. It allows the user to utilize various images,
fonts, and text formats in a single document that is print-ready and easy to
share from any device. PDF files captured a document in a fixed layout like an
image which translates the same throughout various programs, hardware, and
operating systems. PDF files are created using Adobe Acrobat, Acrobat
Capture or similar products. It means any computer with Adobe Acrobat
Reader or similar product can open a PDF file. A PDF file contains one or
more page images which you can zoom in or zoom out. You can page forward
and backward.

5.2.5 WAV

WAV (WAVE) files were developed by IMB and Microsoft. WAV


(WAVE) files are audio files. WAV files can be play via multimedia playback
software such as Windows Media Player and other software available for your
operating system. They contain sounds such as effects, music, and voice
recordings.

WAV files are becoming less popular because of its large file size.
WAV files are larger than MP3 files. WAV file format does not use a form of
lossy compression so file sizes are therefore much bigger and now less popular.
Another drawback of WAV files is sending and downloading of the files takes
much more time and space. WAV files are based on Resource Interchange File
Format (RIFF) method for storing data. Data is stored in chunks which contain
a four character tag and the number of bytes in the chunk. All the system
sounds like when sound comes when you log in, are in the .wav format. These
sounds are uncompressed WAV files.

5.2.6 MP3

A MPEG-1 or MPEG-2 Audio Layer III files referred as MP3 is the


standard audio storage file type. Most of the music players play music using
MP3 files.MP3 files used lossy compression format which helps to reduce file
size. Using lossy data compression, audio files are compressed for easy storage

The Digitization
58
and sending. MP3 files remove those sounds the human ear is incapable of
hearing and processing.MP3 file stores audio information only.

MP3 files are portable. A three minute song that require about 32 Mb of
disk space in its original form can be compressed using MP3 into a file of
about 3 MB without losing sound quality. Using a 56K modem, the song can
then be transmitted over the internet in a few minutes. It is possible for us to
create virtual libraries by downloading from the internet. The user can also
‘rip’ MP3 files from their own CDs using free software easily available on
internet.

5.2.7 RealAudio

RealAudio is a proprietary audio format developed by RealNetworks


and first released in April 1995 that uses a variety of audio codecs, ranging
from low-bitrate formats to high-fidelity formats for music. Due to its ability to
conform to low bandwidths, it can also be used as a streaming audio format,
which is played at the same time as it is downloaded.

Real Audio Files have the extensions of .ra. In 1997,RealNetworks


started video format called RealVideo. The combination of audio and video
formats called Real Media, and it use the file extension .rm. However, the latest
version of RealProducer, Real's flagship encoder, reverted to using .ra for
audio-only files, and for video files used .rv.

Real Audio files are played with RealNetworks' RealPlayer. You can
play Real Audio file in free Real Alternative or JetAudio, but for that you need
to install an additional, free plug-in. That is the reason many users convert RA
files to other more popular audio formats like MP3, AAC, WAV, WMA.

RealAudio was developed as a streaming media format to play while it


is being downloaded. It is possible to stream RealAudio using HTPP. In this
case, the RealAudio file is retrieved similarly to a normal webpage, but
playback begins as soon as the first part is received and continues while the rest
of the file is downloaded. Using HTPP streaming works best with prerecorded
files, through there are some alternative protocols developed to work better for
live broadcasts.

5.2.8 MPEG 21

Moving Picture Experts Group designed MPEG-21 a comprehensive


standard framework for networked digital multimedia. MPEG-21 includes
an REL and a Rights Data Dictionary. MPEG-21 provides a truly interoperable
multimedia framework.

MPEG-21 uses an XML- based standard designed to transmit machine


readable license information. On the basis of two important inventions, MPEG-
21 is established. These are (i) the explanation of a key unit of the
transactions and distribution is the digital Item. (ii) The concept of
users moving with them.

The Digitization
59
Digital Item can be conceived as the center of the Multimedia Framework and
the users who interact with them inside the Multimedia Framework.

At the most introductory level, MPEG-21 provides a framework in


which one user interacts with the other via a Digital Item. So, the main
objective of the MPEG-21 is to determine the technology required to back up
users to interchange, access, consume, manipulate or trade Digital item in a
transparent and effective ways.

Check your progress-1


What is a full form of PDF and JPEG?
What is the use of DjVu file format?
What is LZW compression?
Who designed MPEG-21 file format?

5.3 DIGITIZATION OF FILE FORMATS


The Irish Virtual Research Library and Archive team presented a body
of digitized content as a part of the pilot project. Following are the detail
description of the files and their functions and contains created by the project.

5.4 SUMMARY
Different file formats are used for both preservation and surrogate files,
based on the type of content in the original resource.
Preservation Master Files are created for deep storage purpose.
Compressed Web files are created from PM files for use as surrogate
files in the repository and on the information website.
PM files must be uncompressed in order to retain archival integrity.
Surrogate files are compressed file formats with title ,perceivable loss
quality.

5.5 KEY TERMS

The Digitization
60
• Tagged Image File Format (TIFF): Originally created by Aldus
for use in desktop publishing .This type of files store images
including photographs and line art.
• Optical Character Recognition (OCR): OCR is the process of
taking an image of letters or typed text and converting it
into data the computer understands
• Joint Photographic Experts Group (JPEG): A commonly used
standard method of compression for photographic images.JPEG
uses lossy compression algorithm for images.
• DjVu: An alternative to PDF, since it gives smaller files than PDF
for most scanned documents. It uses image layer separation of text
and background images, progressive loading, arithmetic coding, and
lossy compression for monochrome images.
• Portable Document Format (PDF): Files that preserve the original
graphic appearance online for all types of documents, such as
magazine articles, brochures etc.
• WAV or WAVE: WAV is an audio format developed by Microsoft
and IBM.
• MP3: An audio compression file format that employs an algorithm
to compress the music files, achieving significant data reduction
while retaining near, CD-quality sound.
• RealAudio ram: A proprietary audio format developed by Real
Networks that uses a variety of audio codecs, ranging from low-
bitrates formats to high fidelity formats for music.

5.6 END QUESTIONS


40) Describe Tagged Image File Format?
41) What are the limitations of WAV format?
42) Write a note on Real Audio format?
43) Write a note on JPEG file format.
44) Why WAV file format is popular?
45) Why DjVu is a popular document file format?
46) What are the advantages of MP3 file format?
47) Explain Optical Character Recognition.
48) How to digitize audio and video?
49) Define PM files. Explain in detail.

Answer to check your progress questions


Check your progress -1:
PDF-Portable Document Format, JPEG- Joint Photographic
Experts Group.
Djvu primarily stored scanned documents, especially those
containing a combination of text, line drawing and
photograph.

The Digitization
61
LZW compression is the compression of a file into a smaller
file using a table-based lookup algorithm.
Moving Picture Experts Group.

BIBLIOGRAPHY
5. Douglas J Hickok, Daine Richard Lensniak, Michael C.Rowe ,
2005. “File type Detection Technology’, Midwest Instruction and
Computing Symposium.
6. Karresand Martin, and Shahmehri Nahid, 2006. ‘File type
identifications of Data Fragments by their Binary Structure’.
Proceeding of the IEEE Workshop on Information Assurance.
7. Ryan M. Harris. 2007. ‘Using Artificial Neural Networks For
Forensic File Type identifications’, Master’s Thesis, Purdue
University.
8. Roussev, Vassil, and Garfinkel, Simon. ‘File Classification
fragment- the case for Specialized Approaches’.Systemetic
Approaches to Digital Forensic Engineering. Oakland, California.
9. Sarah J. Mood and Robert F. Erbarcher.2008. ‘SADI- statistical
Anlysis for data Type Identification’. 3rd International Workshop on
Systematic Approached to Digital Forensic Engineering.
10. Robert F. Erbacher and John Mullholland, ‘Identification and
Localization of Data Types within Large-Scale File Systems’,
Processing of the 2nd International Workshop on systematic
Approaches to Digital Forensic Engineering, Seattle, WA, April
2007.

UNIT 6 DIGITIZATION-SCANNING, OCR


AND RE-KEYING
Program Name: BSc(MGA)
Written by: Srajan
Structure:
6.0 Introduction
6.1 Unit Objectives
6.2 The digitization chain
6.3 Scanning and Image Capture
6.3.1 Hardware-Types of Scanner and Digital Cameras
6.3.2 Software
6.4 Image Capture and Optical Character Recognition

The Digitization
62
6.4.1 Imaging Issues
6.4.2 OCR Issues
6.5 Re-Keying
6.6 Summary
6.7 Key Terms
6.8 End Questions

6.0 INTRODUCTION
In this unit we are going to know detail about digitization in term of
scanning and rekeying. As we learn earlier is that digitization is a process of
converting analog materials such as book, paper, film, and tapes in to digital
format which are readable by an electronic device. In other words, creating a
computerized representation of a printed analog is known as digitization.

There are various methods of digitization. In this unit we are going to


focus on digitization of text and images. We will also learn the issues such as
necessary hardware and software concern, resolution and also scanning and
image capture. Further we will learn about the OCR (Optical Character
Reorganization). We will put more focus on file type and resolution of a image.

6.1 UNIT OBJECTIVES:


After studying this unit you will be able to
Describe the digitization chain
Explain scanning and image capture
Describe the different types of scanners and digital cameras
Describe the applications of software in image capture
Explain the OCR

6.2 THE DIGITIZATION CHAIN


The digitization chain concept was introduced by Peter Robinson. The
main concept to establish the digitization chain is based on the fundamental
concept that to achieve the best quality image. It will help to digitize the
original data. We can achieve the best results with as few steps removed from
the original as possible. If there are more intermediates, then there more links
in the chain. Hence, the chain is composed of the number of intermediates that
come between the original object and the digital image. Later on Dr. S.D. Lee
was extended this idea such that the digitization chain developed into a circle
and every step of the project became a separate link.

The Digitization
63
Each link attains a level of importance, so that the entire project would
fail if one piece of the chain were to break. Though this is a very important and
useful concept in project development, in this section we will more learn about
the Peter Robinson’s concept of the digitization chain.

Project will flow more smoothly if it is having very few links in the digitization
chain. Firstly, the results will depend on the quality of the image being
scanned, regardless of the technology used by the project. It is very oblivious
that scanning image directly from the journal itself is going to make a huge
difference in quality. Scanning a copy of a microfilm of an illustration
originally found in a journal is also helpful. This is the main reason for
carefully choosing the hardware and software.

6.3 SCANNING AND IMAGE CAPTURE


As we discuss earlier, digitization is a process to convert analog image
into digital format. It is process to make an exact copy of material into different
format. For both text and image, to get the exact copy of the page is the first
step in digitization. A combination of hardware and software imaging tools are
needed to accomplish this. Following are some of the hardware and software
often used by the digital project creators.
6.3.1 Hardware- Flatbed Scanner and Digital Camera
There are few methods of image capture exits that we used today. The
equipments like high-end digital cameras to different types of scanners like
Flatbed, Sheet fed, Drum, Microfilm).for project we should the most available
option and one that is affordable also. In this aspect, the two most common
accessible image capture solutions are high-resolution digital cameras and
flatbed scanners.
Flatbed scanners:
Optical scanner means a device that can read text printed on paper and translate
the information into a language that the computer can understand and use.
Flatbed scanner is a very popular optical scanner. It has a flat glass surface on
which we can keep our document for scanning. The image which we want to
be scanned is placed face down and should be covered. When we start the
machine light-sensitive sensors are passed over the illuminated page and we
get scanned image. Scanner works by digitizing an image. Depending on
whether the pixel is filled or empty, it then represents each box with zero or a
one.

The Digitization
64
Fig 6.1: A Flatbed Scanner

A flatbed scanner sees the images and converts the printed text or
image into electronic codes that can be understood by the computer. After that
the scanning unit moves across the image to be scanned. It reads the image as a
series of dots and the generates the digitize image. Scanner has bit depth
feature. Because of bit depth scanner can capture different colors. The
resolution and color of the scanned image is depending on bit depth of the
scanner. Digitized image sent to the computer and stored as a file.
A Flatbed scanner has ADF (Automatic Document Feeder)
feature.ADF helps to take more than one pages and feed them one at a time
into the scanner. This will help to user to scan many pages without having
manually replaced the each paper. With the help of ADF we can scan both the
sides of an image.
There are different levels of flatbed scanner available in the market. We
can choose it as per our requirements. Different levels of flatbed scanner have
different capacity to scan. Entry level flatbed scanner can scan 8.35 to 11.7
inch size of document with 300-600 dpi resolution. Mid level flatbed scanner
can scan 12-14 inch size of document with 600-1200 dpi resolution. High –end
flatbed scanner can scan 14-24 inch size of document with more than 1200 dpi
resolution. A flatbed scanner can print approximately 11 color pages and 27
black and white pages in one minute.
The main advantage of flatbed scanner is it can scan any document
irrespective of its quality. It is very user friendly. Drawback of the flatbed
scanner is it often very large in size. It needs more space. Another disadvantage
of this scanner is that they are very expensive.
Digital cameras:
Digital cameras are very portable and easy to handle. Some large documents
that won’t fit to flatbed scanner can be digitized with the help of digital
camera. In flatbed scanner the document or page should be lie completely flat
on the scanning bed. This poses the problem with books. Digitizing with a
stand-alone digital camera could be a solution to this problem, as has been
taken up by many digital archives and special collections departments. Today
many digital cameras has voice capture feature that records vice also.

The Digitization
65
Fig 6.2: Digital Camera

Most of the digital camera has LCD, which helps to view images in
memory and in viewfinder. In digital camera we can see photos immediately.
These stored photos or images can be uploaded to a computer. Digital camera
as ability to digitize image with changing lighting is highly beneficial as it
would not harm the composition of work. Images can be produces at great sizes
as a result of these specifications. Sony, Canon, Nikon, Kodak, Olympus and
many other companies make digital cameras.
6.3.2 Software
Taking decisions regarding the specific recommendations for software
is a difficult task. In digitization process, there are no specific rules to be
followed. The method and process vary from one project to another depending
upon the use, suitability and personal preferences. Irrespective of the method of
digitization, all digitization projects use text scanning software and image
scanning software. There are wide ranges of text scanning software available,
all with varying capabilities. With the condition of the text being scanned, the
primary consideration with any text scanning software is how well it works
when working with old texts. It is import to find software that has the ability to
work through more complicated fonts and low quality page, as this software is
optimized for laser-quality printouts.
There are more choices of software depending upon what needs
to be done in terms of image manipulation. Adobe Photoshop is the most
common software for image-by image manipulation, including converting
TIFF to web deliverable JPEG and GIFs.

Check your progress-1

The Digitization
66
What is digitization?
Name two most important hardware devices required for image capture.
What is the use of flatbed scanners?
What is the use of digital cameras?
Name one software used for image manipulation.

6.4 IMAGE CAPTURE AND OPTICAL


CHARACTER RECOGNIZATION
OCR (Optical Character Recognition):
OCR is the process of taking an image of letters or typed text and
converting it into data the computer understands. For example you wanted to
digitize a magazine article, brochure or PDF document. For this you could
spend hours retyping and then correcting misprints or you can use scanner to
get material into digitize format. Obviously, a scanner is not enough to make
this information available for editing. Scanner can create an image or a
snapshot of the document, which is nothing but a raster image, which you
cannot edit.

Fig 6.3: OCR Process

OCR is currently the best method of digitizing typed pages of text.


OCR is the process of converting written or printed text into a form that can be
understood by a computer. Using OCR, your computer can take text from a
scanned page and insert it into a text file, or word processing document. In
other word, OCR works by scanning text character-by-character, analyzing the
resulting image, and then translating the character image into character codes,
such as ASCII, which are commonly used in data processing.

6.4.1 Imaging Issues


Before going further we have to know about the image and the purpose
of images that has been created. Following are some questions needs to be
answered regarding this:

The Digitization
67
• Are there preservation issues that must be considered or are the
images simply for Web delivery?
The reason for this is simple: the higher the settings necessary
for scanning, the higher quality the image need be. Once this
decision has been made, there are essential image settings that
must be established.

• What sort of image will be (black and white, Color, grayscale)

• At what resolution?
1. Image types
Basically, there are four main types of images. They are as follows:
• 8-bit grayscale
• 1-bit black and white
• 24-bit color
• 8-bit color.
With a single bit being represented by either a '1' or `0', a bit is
the key unit of information read by the computer. '1' represents
present while '0' denotes absence, with more complex
representations of information be adapted by multiple or
gathered bits.
The bit can either be black or white if it is a 1-bit black and white
image. This completely unsuitable for almost all images and is a rarely used
type. The only conformable image for this format would be line graphics or
printed text for which poor resulting quality does not matter. Saving it as a
PEG compressed image is not a feasible option and is yet another drawback
of this type, which is one of the most popular image formats on the Web.
As they encompass 256 shades of grey, 8-bit greyscale images are an
improvement from 1-bit images. It provides a clear image rather than the
resulting fuzz of a 1-bit scan and is often used for non-colour images. There
are times when non-colour images should be scanned at a higher colour
because the finite detail of the hand will come through distinctly, whereas
greyscale images are often regarded more than adequate. The uniform
recommendation is that images that are to be considered archival copies or
preservation should be scanned as 24-bit color.
Colour image of 8-bit is similar to 8-bit grayscale with the exception
that each bit can be one of 256 colors. As the format is appropriate for web
page images but can come out somewhat grainy, the decision to use 8-bit
color is completely project dependent on your requirement. Another factor to
be considered is the type of computer the viewer is using, as older ones
cannot cover an image above 8-bit. So it will need to convert a 24-bit image
to the lower format. Hence, storage space is the key factor to be taken into
consideration here. Likewise, an 8-bit image will be markedly smaller, when
it does not have the quality of a higher format.

The Digitization
68
In practice, the best scanning choice is 24-bit colour image. With each
bit having the potential to contain one of 16.8 million colours, this option
provides the highest quality image. The debates against this image format are
the cost and time necessary and size. Moreover, to make this decision,
knowing the objectives of the project will assist in this regard. If one tries to
create archival quality images, this is taken as the default setting. Even if the
original is greyscale, a 24-bit colour makes the image look more photo-
realistic. With archival quality imaging, the thing to remember is that if you
need to go back and manipulate the image in any way, it can only be copied
and adjusted. However, any kind of retrospective adjustment will be
impossible if you scan the image as a lesser format. An 8-bit greyscale image
cannot be converted into millions of colours, whereas a 24-bit colour
archived image can be made greyscale.

2. Resolution
The second issue we need to consider is a resolution of the image. In
simple language resolution means number of dots or pixels per inch i.e. dpi or
ppi. If there are more dots or pixels per inch then resolution of the image is
high. And obviously image looks clearer. Again, resolution is depend on
purpose for image being used. The resolution will need to be relatively higher,
if the image is being archived or needs to be enlarged. But the resolution drops
drastically, if the image is simply being laid on a web page. File sizes are
altered by the dpi ranges as with the options in image type. The file size will be
larger as per higher dpi. To explain the deviations, an informative table
(created by the Electronic Text Centre) can be replicated as follows, examining
an uncompressed 1" x 1" image in different resolutions and types.

Resolution 400*400 300*300 200*200 100*100

2-Bit Black and White 20K 11K 5K 1K

8-Bit Greyscale or colour 158K 89K 39K 9K

24-Bit Colour 475K 267K 118K 29K

In addition to being one of the best choices for archival imaging, the 400
dpi scan of a 24-bit colour image makes up the largest file size. Because
screen resolution rarely exceeds 100 dpi image resolutions, this is known to
be appealing for its small size. So, the dpi choice relies upon the project
objectives.

3. File Format
While finalizing the capture when using an imaging software program,
by clicking on the function 'save as', it can be seen that there are quite a few
image formats to choose from. There are three types of key image formats of

The Digitization
69
the process in terms of text creation, i.e., JPEG, GIF and TIFF. These key
image formats are the most common as these formats can be transferred to
nearly any software system or platform.
For archival image creation and retention as master copy, TIFF
(Tagged Image File Format) files are the most widely accepted formats.
Almost all platforms can easily read TIFF files, thus making it one of the
best choices while transmitting important images. With the TIFF format,
most of the digitization projects use image scanning, because it allows a
person to assemble as much information as possible from the original and
then saves the data. Moreover, the only demerit of the TIFF format can be
ascribable to the image size. But, once the image is saved, it can be read by
a computer with a completely different hardware and software system and
can also be brought forward at any point. Also, if there is any necessity to
modify the images, and then TIFF scanning of the images should be made.
For systems that have space restrictions, JPEG (Joint Photographic
Experts
Group) files are the securest data formats for Web viewing and transfer.
JPEGs are popular formats not only for their compression capabilities, but
also with image creators in addition to their quality aspects. TIFF is a
lossless compression, whereas JPEGs lossy compression formats. It is a
common phenomenon that the image loses bits of information when a file
size squeezes. But there will be no significant loss in image quality. At 24-
bit scanning, every dot has the alternative of 16.8 million colours, which is
more than what the human eye can really distinguish on the screen. With
the condensation of the file, the image misses some information to the
lowest degree which is likely to be detected by human's eye. The lossy
compression is the main disadvantage of this popular format. Once an
image is preserved by using the 'save as' option, the cast-away information
is lost. The significance of this is that certain parts of the image or the total
image cannot be magnified. Furthermore, re-working on the image results
in loss of more information. Therefore, archiving for JPEG formats is ran
recommended as there is no other possible way to retain all of the
information scam from the source. Nonetheless, in terms of storage size
and viewing capabilities, JPE formats are one of the best methods for
online viewing.
There are some older formats that are limited to 256 colours. These are
GI (Graphic Interchange Format) files. Without requiring as much storage
space like TIFFs, lossless compression formats are used by GIFs. Although
GIFs have no compression capabilities like a JPEQ they are still solid for
line drawings and graphic arts. They also, have the potential to be
transferred to transparent GIFs, in which the background the image can be
furnished invisible, thus permitting it to mix in with the web pa background.
Although frequently used in Web design, this can have a good use creating
text. It is plausible that a text character cannot be converted so that it can
translate by a Web browser. It could be that ISOLAT1 or ISOLAT2 has not
defined the character, or it is not defined by inline images (e.g., a

The Digitization
70
headpiece). For instance when an online version of the journal Studies in
Bibliography was created by the UVA Electronic Text Centre, there were
cases of inline special characters that simply could not be depicted through
the available encoding. The journal being a searchable full-text database,
furnishing a readable page image was not an option. Their solution to this
was to make a transparent image GIF; one that did not break up the flow of
the digitized text. In order to match the size of the surrounding text, these
GIFs were made and were afterwards introduced quite successfully into the
digitized document.
Continuing on the discussion of types of images, the topic of file size
generally arises frequently in the digitization process. It is the archive or
lucky project that has limitless storage space; therefore, most creators must
reflect so as to how to obtain quality images without taking up the 55mb of
space needed by a 400 dpi, archival quality TIFF. The wider aspect to this is
that lower the bit the better is the compression, which is not true. The
Electronic Text Centre has developed a figure that represents how 24-bit
images instead of8 bit images will give rise to a smaller JPEG in addition to
higher quality image file.
• 300 dpi 24-bit colour image: 2.65 x 3.14 inches:
Uncompressed TIFF: 2188K
`Moderate loss' JPEG: 59 K
• 300 dpi 8-bit colour image: 2.65 x 3.14 inches:
Uncompressed TIFF: 729 K
`Moderate loss' JPEG: 76 K
• 100 dpi 24-bit colour image:
2.65 x 3.14 inches:
Uncompressed TIFF: 249 K
`Moderate loss' JPEG: 9 K
• 100 dpi 8-bit colour image:
2.65 x 3.14 inches:
Uncompressed TIFF: 85K
`Moderate loss` JPEG: 12K
Although the image sizes might not seem to be significantly different, it
should be kept in mind that these results were estimated with an image
measuring roughly 3 x3 inches.
The storage space all of a sudden becomes problematic while turning these
images into page size. The compressed JPEG will take less space. Moreover, a
24-bit scanning provides a better image quality.

The Digitization
71
After discussing these three image formats, a decision has to be made as
to which one should be used for a relevant project. The best answer will be to
use a combination of all three image formats. TIFFs are not suitable for online
delivery. But if the images have any future use, either for printing or simply as
a master copy, later enlarging, manipulation or archiving, then there is no
other suitable format in which to stock the images. JPEGs and GIFs are the
best formats for online presentation. JPEGs cannot be enlarged (or else they
will pixelate), but these have smaller file size and better quality. JPEG
condition almost matches the TIFF formats in terms of viewing quality. The
types of images are associated with the project depend on how GIFs are used.
But, GIFs are a popular option for making thumbnail images that exhibit the
JPEG version linking to a separate page.

There has been much debate in terms of the creation of archival digital
image. As per the Electronic Text Centre, there is an uprising duality between
archival imaging and preservation imaging. Preservation imaging can be
specified as 'high-speed, 1-bit (simple black and white) page images shot at
600 dpi and stored as Group 4 fax-compressed files'. The results of this are
related to microfilm imaging. It ignores the source as a physical object in
addition to preserving the text for reading purposes. In order to protect the
source from constant handling (an international means of accessibility),
archiving often supposes that there has been digitization of objects. But, any
chance of presenting the object as an artifact is eliminated by this type of
preservation. Entirely different set of requirements are needed for archiving an
object. Film imaging is the only imaging that can be considered of having an
archival value. This is believed to last at least ten times as long as a digital
image. However, the idea of archival imaging cannot be neglected, and it is
still discussed amidst funding bodies and projects.

For archiving there is no prescribed standard. Different projects and


places recommend different models. Keeping this in mind, the following
resolution, type and Format are advocated:
• TIFF: Given the discussion of the format above, this should not be of
any surprise. The only choice for archival imaging is TIFF files due to
their cross-platform capabilities and completes retention of scanned
information. The images are the closest digital replication available and
they maintain all of the scanned information from the source. The size
of the file, especially when scanned at 24-bit, 600 dpi, will be quite
large, but well worth the storage space. Without placing the TIFF image
online, but as a viewing copy it is simple to make a JPEG image from
the TIFF.
• 600 dpi: Being a tough recommendation, many projects maintain that
scanning in at 300 or 400 dpi provides ample quality to be considered
archival. As an archival standard, 600 dpi allows for quite large JPEG
images to be produced and also provides excellent details of the image.
Therefore, many top international digitization centers, such as Oxford,
Virginia and Cornell recommend 600 dpi as an archival standard. The
only restrictive aspect being the file size, in terms of archival images a
person need to try and get as much storage space as possible. As offline

The Digitization
72
storage on writeable CD-ROMs is another option, the master copies do
not have to be held online.
• 24-bit: As the example shows, the file size of the subsequently
compressed image does not benefit from scanning at a lower bit-size.
But there really is trivial ground to scan an archival image at anything
less. Whether the source is greyscale or colour, the images have a
higher quality at this level and are more realistic.

6.4.2 OCR Issues


The process of scanning printed pages as images on a flatbed scanner
and then using OCR software to recognize the letters as ASCII text is called
Optical Character Recognition. The OCR has both the tools for acquiring a
scanned image and recognizing the text in it. OCR technology recognizes the
patterns of dots and converts them into characters. Depending upon the type of
scanning software you are using, the resulting text can be piped into many
different word processing or spreadsheet programs.

Ideal source material for OCR


In addition to very clear copies and the originals, mono-spaced fonts
like Courier suits most to OCR. The following source material might also be
considered:

• Black text on white background

• A clean copy; not a fuzzy multigenerational copy from a copy


machine

• Standard type font (for E.g. Times, New Roman etc.) Fancy
fonts may not be recognized.

• Single-column layout

• There should be 12 point or greater font size.

OCR limitations
Following are some drawbacks of OCR:

• During the text scanning, besides paragraph marks and tab stops,
most documents formatting are lost. (Italic, Underline, Bold).

• A single –column editable text file is the output from a finished text
scan. This text file always requires proof reading and spellchecking
in addition to reformatting to desired final layout.

The Digitization
73
• Before scanning plain text files or printouts from a spreadsheet, the
text must be imported into a spreadsheet and reformatted to match
the original.

• Using text from a source with font size less than 12 points or from a
fuzzy copy results in more errors.

Some source materials that are not suitable for OCR


• Very small-sized

• Forms (especially with check boxes and boxes)

• Blurry copies or multigenerational fuzzy or from a copy machine

• Handwritten text

• Mathematical expressions

• Handwritten revisions in draft copies of documents

• Unusual fonts and fancy text

The main goal of recognition technology is to re-create the text, the in


addition to elements of the page, including layout and tables. Concept of the
scanner may be referred in this context in addition to how it takes a copy of
the image by replicating it with the patterns of bits (the dots that are either
unfilled or filled). By examining the patterns of dots and turning them into
characters OCR technology evaluates this matter. The resulting text can be
piped into many different word-processing or spreadsheet programs,
depending on the type of scanning software being used.
However, OCR software program technology is aimed at optimizing
laser printer quality text (including OmniPage). The reasoning behind this is
simple. Scanning software attempts to examine every pixel in the object and
then convert it into a filled or empty space. As a laser quality printout is easy to
read, it has very clear and distinct characters on a crisp white background
(without interfering with the letters clarity). However, the software capabilities
begin to degrade once books become the object type. Therefore, if you decide
to use OCR for the text source, the first thing that must be considered is the
document condition that is to be scanned. If there are instances of broken type
or damaged plates or the characters in the text are not fully formed, the
software will have difficulty in reading the material. The significance of this is
that late 19th and 20th century texts have a much better chance of being read
well by the scanning software. As you remain further aside from the present,
the OCR becomes much less reliable with the conflicts in printing. The
alterations in paper, from a bleached white to a yellowed, sometimes foxed
background, produces noise that the software should screen out. The font
differences then add disturbance on the recognition potentialities. With the

The Digitization
74
computer-set texts of the late 20th century, the black letter and exotic typeface
found in the hand-press period contrast noticeably.

If you are scanning a document to print or to compensate for an


accidentally deleted file, the advantages of exporting text in different word
processing formats are quite useful. There are a few issues that should take
priority with the text creator. While using a software program such as
OmniPage, a scan that retains some formatting but not a complete page
element replication should be aimed for. When text is saved with formatting
that relates to a specific program (WordPerfect, even RTF, Word), it is infused
with a level of hidden markup (a markup that explains to the software program
what the layout of the page should look like). You want to be able to control
this markup in terms of the long-term preservation of the digital object and
text creation. The best option is scanning at a setting that will retain paragraph
format and font. This allows noticing the basic format of the text. If you opt
for the choice that eliminates all formatting and do not scan with this setting,
the result will be the text that includes nothing more than word spacing
(without any paragraph breaks, no page breaks, no font differentiation,
accurate line breaks, etc. If you have decided to use your own encoding,
scanning at a mid-level of formatting will assist you. The structural markup
chosen for the project could be added while proofreading the text. The text can
be saved out in a text-only format, once this has been completed. This way,
you will have a basic level of user-dictated encoding in addition to the
digitized text that is saved in a way that will eliminate program added markup.

6.5 RE-KEYING
There are still many situations where the documents or project prevents the
use of OCR for the text creator. If the text is of a degraded or poor quality, then
it will take quite a good amount of time correcting the OCR mistakes by simply
typing in the text from scratch level. There is also an issue of the amount of
information to be digitized. There might not be enough time to sit down with
560 volumes of texts (as with the Early American Fiction project) and process
them through OCR even if the document is of relatively good quality.
Although this varies from study to study, the general rule thumb is that a best-
case scenario is three pages scanned per minute. The process putting the
document on the scanner, flipping pages, or the subsequent proofreading not
taken into consideration though. When addressing these concerns, the viable
solution becomes re-keying the text if OCR is found incapable of handling the
project digitization.
Whether to handle the document in-house or outsource the work
becomes next question to address. All the necessary elements such as
hardware, software, and time are taken into consideration while deciding to
digitize the material in-ho Moreover, a few issues that come into play with in-
house digitization are to be kept mind. The speed of re-keying is the primary
concern. Research assistants working the project, or graduate students from

The Digitization
75
the text creator's local department generally the re-keying. Paying someone to
re-key the text on an hourly basis often pro more expensive than outsourcing
the material. Another problem is that a single person typing in material
generally misses keyboarding errors. On the other hand, if the member is
familiar with the source material, then there is a chance to automatic correct
those things that seem incorrect. Therefore, during in-house digitization
processing, these concerns should also be treated from the beginning.
The most popular choice with many digitization projects is to
outsource material to a professional keyboarding company. Also, by hiring
keyboarders who not have a subject specialty in the text being digitized—
many often do not speak language being converted—they avoid the problem
of keyboarders subconsciously altering the text. Established by the project
creator, keyboarding companies are able to put in a base-level encoding
scheme into the documents, thereby getting rid some of the more basic tagging
tasks.
As with most steps in the text-creation procedure, the answers to
these questions will be dependent on projects. For a project that plans to
digitize a collection of works, there will be a marked difference in the
decisions made from those made by at academician who creates an electronic
edition. Thus, it reflects back on the significance of the document analysis
stage. The requirements of the project must be recognized la addition to
identifying the external influences (such as equipment availability, project
funding and staff size,) that affect the decision-making process of the project.

Check your progress-2


What is OCR?
What is a full form of OCR?

6.6 SUMMARY
The main concept to establish the digitization chain is based on the
fundamental concept that to achieve the best quality image. It will help
to digitize the original data.
There are few methods of image capture exits that we used today. The
equipments like high-end digital cameras to different types of scanners
like Flatbed, Sheet fed, Drum, Microfilm).for project we should the
most available option and one that is affordable also. In this aspect, the
two most common accessible image capture solutions are high-
resolution digital cameras and flatbed scanners.

The Digitization
76
Digital cameras are very portable and easy to handle. Some large
documents that won’t fit to flatbed scanner can be digitized with the
help of digital camera.
The main goal of recognition technology is to re-create the text, the in
addition to elements of the page, including layout and tables.
Resolution means number of dots or pixels per inch i.e. dpi or ppi. If
there are more dots or pixels per inch then resolution of the image is
high. And obviously image looks clearer.

6.7 KEY TERMS


• Flatbed scanner: A flat glass bed, quite similar to a cop machine,
on which the image is placed face down and covered. The scanner
then passes light-sensitive sensors over the illuminated pages.
• Digital camera: The most dependable means of capturing high-
quality digital images. A digital camera digitizes directly from the
original and can work with objects of any size or shape, under many
different lights.
• Optical Character Recognition (OCR): The process of scanning
printed pages as images on a flatbed scanner and then using OCR
software to recognize letters as ASCII text.

6.8 END QUESTIONS


50) What is Digitization chain? Explain in detail.
51) Explain the advantages and disadvantages of flatbed scanner?
52) Explain two most important hardware devices required for image
capture in detail.
53) What is digital camera?
54) What is the important software programs used in data capture?
55) What is resolution? Explain in details.
56) What are the ideal source materials for OCR?
57) What are the materials that not suited to OCR?
58) What are the limitations of OCR?
59) Write a note on Re-keying.

Answer to check your progress questions


Check your progress -1:

Digitization is the process of converting information into


a digital format.
High-resolution digital cameras and flatbed scanners.
It can scan any document irrespective of its quality. It is
very user friendly.

The Digitization
77
Digital cameras are very portable and easy to handle. Some
large documents that won’t fit to flatbed scanner can be
digitized with the help of digital camera.
Photoshop

Check your progress -2:


The process of scanning printed pages as images on a
flatbed scanner and then using OCR software to recognize
letters as ASCII text.
Optical Character Recognition.

BIBLIOGRAPHY
11. Robert F. Erbacher and john Mullholand.2007.’Identification and
localization of data types within Large-scale file systems’,
Proceedings of the 2nd international workshop on systematic
approaches to digital forensic Engineering, Seattle, WA.
12. Ryan M. Harris. 2007.’Using Artificial Neural Networks For
Forensic File Type Identification’, Master’s Thesis, Purdue
University.
13. Douglas J. Hickok.Daine Richard Lesniak, Michael C. Rowe, 2005.
‘File Type Detection Technology’, Midwest Instruction and
Computing Symposium.
14. Karresand Martin, and Shahmehri Nahid, 2006. 'File Type
Identification of Data Fragments by their Binary Structure'.
Proceedings ofthe JAYE Workshop on Information Assurance,
pp.140-147.
15. Sarah J. Moody and Robert F. Erbacher. 2008. `SARI —
StatisticalAnalysis for Data Type Identification', 3rd International
Workshop on Systematic Approaches to Digital Forensic
Engineering.
16. Roussev, Vassil, and Garfinkel, Simson. 'File Classification
Fragment-The Case for Specialized Approaches', Systematic
Approaches to Digital Forensics Engineering (IEEE/SADFE 2009),
Oakland, California.
17. IrfanAhmed, Kyung-suk Lhee, Hyunjung Shin and ManPyo Hong,
'On Improving the Accuracy and Performance of Content-based
File Type Identification', Proceedings of the 14th Australasian
Conference on Information Security and Privacy (ACISP 2009),
pp.44-59, LNCS (Springer), Brisbane, Australia, July 2009.
18. IrfanAhmed, Kyung-suk Lhee, Hyunjung Shin and ManPyo Hong,
'Fast File-type Identification', Proceedings of the 25thACM
Symposium on Applied Computing (ACM SAC 2010), ACM,
Sierre, Switzerland, March 2010.

The Digitization
78
19. Robert F. Erbacher and John Mulholland, 'Identification and
Localization of Data Types within Large-Scale File Systems',
Proceedings of the 2nd International Workshop on Systematic
Approaches to Digital Forensic Engineering, Seattle, WA, April
2007.

UNIT 7 MICROFORM
Program Name:BSc(MGA)
Written by: Mrs.Shailaja M. Pimputkar,Srajan
Structure:
7.0 Introduction
7.1 Unit Objectives
7.2 History
7.3 Uses of Microfilm
7.4 Advantages and Disadvantages of Microfilm
7.5 Readers and Printers
7.6 Microfilms and Cards used in Media
7.7 Image creation
7.7.1 Film
7.7.2 Cameras
7.7.2.1 Microfiche camera
7.7.2.2 Roll film camera
7.7.2.3 Flow roll camera
7.7.2.4 Flat film
7.7.2.5 Computer output microfilm
7.8 Storage and preservation
7.9 Duplication
7.10 Digital conversion
7.11 Format conversion
7.12 Summary
7.13 Key Terms
7.14 End Questions

7.0 INTRODUCTION

The Digitization
79
In this unit we are going to learn about the term ‘Microform’.
Microforms are any forms. It can be either paper or on film. Microform can
contain micro reproductions of documents for transmission, storage, reading
and printing. Images of microform are generally reduced approximately
twenty- five times from their original document size.

In this section we are going learn three formats of microform. 1)


Microfilms 2) Aperture Cards and 3) Microfiche. As every coin has two sides
here also we will learn about the advantages and disadvantages of the
microfilm.

7.1 UNIT OBJECTIVES:


After studying this unit you will be able to
Explain the different uses of microfilm
Describe the advantages and disadvantages of microfilm
Explain the readers and printers
Describe the different types of microfilms and cards used in media
Describe the process of image creation
Explain the procedure of duplication
Explain the process of digital conversion
Explain the need for format conversion and uses of the same

7.2 HISTORY

In 1839, an English scientist, named John Benjamin Dancer,


known as the "Father of Microphotography," began to experiment and
produced micro-photographs using Daguerreotype. In 1853 he successfully
sold microphotographs as slides to be viewed with a microscope. In 1851,
James Glasier, who was an astronomer, first suggested microphotography as a
document preservation method, followed by John Herschel in 1853. Both of
them attended the 1851 Great Exhibition in London and were highly
appreciated, where the exhibit on photography greatly influenced Glaisher. In
1859, Rene Dagron, a French optician, used the Dancer’s techniques and was
granted the first patent for microfilm. Glaisher termed it as ‘the most
remarkable discovery of modern times’, and recommended in his official report
to use microphotography for preserving documents.
In 1896, Reginald A. Fessenden , a Canadian engineer,
suggested microform were a compact solution to engineers’ unwieldy, but
frequently consulted materials. He made a proposal that up to 150,000,000
words could be made to fit in a square inch, and that a one foot cube could
contain 1.5 million volumes.

The Digitization
80
In 1920, George McCarthy, New York City banker, was
developed the first practical use of commercial microfilm. In 1925, He got a
patent for his Checkograph machine. This machine designed to make
permanent film copies of all bank records.
The American Library Association endorsed microforms at the
annual meeting held in 1936. Before it was officially accepted, microfilms
were used in related fields. In the years between 1927 and 1935, the library of
Congress microfilmed more than three million pages of books and manuscripts
in the British Library.
In 1934, the first microform print –on –demand service was
implemented by the United States National Agriculture Library, which was
successively followed by similar commercial concern, Science Service. And in
1938, University Microfilms was established and the Harvard Foreign
Newspapers Microform Project was implemented.
Early cut-sheet microforms and microfilms were printed on
nitrate film. It was risky for holding institutions because nitrate film is
explosive and flammable. From late 1930s to 1970 Microfilms were usually
printed on a cellulose acetate base, which prone to tears, vinegar syndrome,
and redox blemishes. The result of chemical decay is vinegar syndrome, which
produces ‘buckling and shrinking, embrittlement and bubbling’. Redox
blemishes are red, orange and yellow spots 15-150 micrometer in diameter
created by oxidative attacks on the film, and are largely due to poor storage
conditions.
In 1970 also developed computer output microform
applications. Computers are directly used to produce Microforms, which used
to produce parts catalogs, hospital and insurance records, telephone listings,
college catalogs, patent records, publisher's catalogs and library catalogs.
Although this technique is widely used, the permanence of microfilm masters
on film is the standard for most libraries and those applications where
preservation is an issue. Microforms will have a future not only in the short
term but probably in the more distant future as well.

7.3 USES OF MICROFILM

Microfilm was first used in military during the France-Prussian War of


1870-71. That time pigeon post was the only communication mode between the
provincial governments in Tours and Paris. As pigeons were not able to carry
paper dispatches, the Tours government turned to microfilm.
During World War II, Victory mail and British ‘Airgraph’ system were
used to deliver mails. Communication through this system was possible by
photographing large amounts of censored mail reduced to thumb-nail size onto
reels of microfilm that weighted much less than the originals.
One more benefit of using microfilm was that the small, light weight
reels were transported by air and much quicker than any other transport mode.

The Digitization
81
• If you want to keep your document for more than 7 years, then
microfilm is probably the best media to use. Microfilm is used
for long term storage of your documents.
• Microfilm enables libraries to greatly expand access to
collections. Besides being compact, its storage cast is so far less
than paper documents. 98 document size pages normally fit on a
fiche, thereby reducing to about 0.25 per cent original material.
• Microfilm can be last up to 400 years and is readable to eye.
This means you do not have to need any software to read these
files.
• A roll of microfilm can hold up to 2500 images on them.
• In the mid 20th century, libraries started using microfilm as a
preservation strategy for deteriorating newspaper collections.
• The use of microfilm was also used to save space.

7.4 ADVANTAGES AND DISADVANTAGES OF


MICROFILM

The choice of using microfilm will depend on the application and length of
time the document or image needs to be stored.
The following are the advantages of using microfilms:
• Strength and stability
Without putting rare, fragile or valuable items at risk of theft or
damage, it enables libraries to great expand access to collection.
Microfilms are breaks rarely.
• Storage Capacity
Besides, being compact, its storage cost is far less than paper
documents. Ninety eight documents sized pages normally fit on one
fiche, thereby reducing to about 0.25 per cent original material.
Microfilms can reduce space storage requirements up to 95 per cent,
when compared to filling paper.
• Cheaper Cost
Distribution of microfilm is cheaper than paper copy. It has lower
reproduction and carriage cost than that of printed paper.
• Storage Condition
This film can have a life expectancy of 500 years; if appropriate storage
conditions are maintained. Microfilms are stronger than any traditional

The Digitization
82
film. Instead of cellulose microfilms are made of polyester. The
polyester will not change with humidity or temperature.
• Data Retrieval
It is easy to view because it is analog CC. The format, unlike digital
media, does not need any software to decode the data stored thereon. A
person having knowledge in language can instantly comprehend with
the only need of a simple magnifying glass. The problem of software
obsolesce is eliminated by it.
The following are the advantages of using microfilms:
• Image produced through microforms are generally too small to be read
with the naked eye. To make it reader-friendly, libraries must use
special readers used for projecting full-size images on a ground-glass or
frosted acrylic screen.
• With loss of clarity and halftones photographic illustrations reproduce
poorly in microform format.
• It is often very difficult to use the microfilm viewed by using Reader
machines. It requires users to carefully wind and rewind until they have
reached the point where the data being looked for is stored.
• As reader-printers are not always available, it limits the user’s
capability to make copies for their own uses. Also, users cannot use
conventional photocopy machines.
• A user can easily misfile a fiche when stored in the highest-density
drawers which is not available after that. To solve this problem,
microfiche is stored in a restricted area and can be retrieved when
needed and lower-density drawers with labeled pockets for each card
are used by some services.
• Color microform being very expensive, it discourages most libraries
supplying color films. Besides this, color photographic dyes are also
likely to degrade over a long period, results in the loss of information,
since color materials get usually photographed by using black and white
film.
• A user devoting some time in reading microfilms on a machine may be
prone to headache and/or eye strain.
• Microfiche, like all analog media formats, lacks the interesting features
found in digital media. While digital copies have much higher copying
fidelity, analog copies degrade with each generation. A user can also
index and easily search digital data.

Check your progress-1

The Digitization
83
Who is known as “father of Microphotography”?
How nitrate film was harmful in 1930s?

7.5 READERS AND PRINTERS


• Desktop Readers
Desktop readers are boxes with translucent screen at the front on to
which is projected an image from a microform. They have appropriate
fittings for any microform in use and may give a choice of magnifications. In
addition, they normally contain motors used to advance and rewind film. A
reader is used to read blips to find any required image, when coding blips are
recorded on the film.

Fig 7.1 : Desktop Reader

• Portable Reader
Portable readers are made by plastic and can be folded easily for
carrying; when open hey project an image from microfiche on to a
reflective screen. Following image is the example of the portable
reader.

The Digitization
84
Fig 7.2 : Indus 456-HPR portable Reader
• Reader Printer
A reader printer was developed in the mid 20th century. This reader
printer allowed for the viewer to see the microfilm, but also print
what was shown in the reader. Positive or negative films and
positive and negative images on paper can be accepted by
microform printers. Using new machines, a user can scan a
microform image and save it as a digital file.

Fig 7.3 : Reader Printer

The Digitization
85
Check your progress-2

What is desktop reader?


What is the advantage of portable reader?
What is the use of reader printer?

7.6 MICROFILMS AND CARDS USED IN MEDIA

Flat Film

Flat film of 105*148 mm is used to take micro images of very large


engineering drawing. They may carry a title photographed or written along one
edge. Typical reduction is about 20, representing a drawing that is 2.00*2.80
meters, which is 79*110 in. these films stored as microfiche.

Microfilm
To use roll films, the standard length is 30.48m (100 ft). The user can
store roll microfilm is open reels or put into cassettes. Around 600 images of
large engineering drawings can carry by one roll of 35MM film. Also one roll
of 35 mm film may carry 800 images of broadsheet newspaper pages. A film of
16 mm may carry 2400 images of letter sized images as a one stream of micro
images along the film set so that lines of text are parallel to the sides of the film
or 10,000 small documents, perhaps cheques or betting slips, with both sides of
the originals set side-by-side on the film.

Fig 7.4 : Microfilm

Microfiche
Microfiche, an ISO A6 certified flat film, is 105*148 m in size. As
shown in (Fig 7.5). It contains a matrix of micro images. All microfiche are
read with text parallel to the long side of the fiche. In simple words we can
say microfiche is a sheet of film that has very small photographs of the

The Digitization
86
pages of a newspaper, magazine, etc., which are viewed by using a
special machine. There may be two types of frames: landscape or portrait.

Fig 7.5 : Microfiche Film

A portrait image of about 10*14 mm is considered as the most common


use format. Magazine pages or office size papers need a reduction of 24 or 25.
The user stores microfiche in open top envelops that re put in drawers or boxes
as file cards, or fitted into pockets in purpose made books.

Ultra fiche

Ultra fiche is a superb compact version of a microfiche or microfilm,


which is used to store analog data at much higher densities. Using suitable
peripherals. Ultra fiche can be created directly from computers. Ultra fiche is
generally used to store the data collected from highly data-intensive operation,
such as remote sensing.

Aperture Cards

Fig 7.6: An Aperture card

An aperture card is a type of punched card. Aperture cards have holes.


The user mounts a 35 mm microfilm chip in the hole inside of a clear plastic
sleeve, or secures over the aperture by an adhesive tape.
The card is basically punched with machine readable metadata
associated with the microfilm image, and printed across the top of the card for
visual identification. These cards are used for engineering drawings, related to
all engineering disciplines. Over 3 million cards are available in libraries. The
user may store aperture cards in drawers or in freestanding rotary units.

The Digitization
87
7.7 IMAGE CREATION
7.7.1 Film

Generally high resolution panchromatic monochrome stock is used by


microfilming. A positive color film that yields good reproduction and high
resolution can be used alternatively. Generally, a roll film is 16, 35 and 105
mm wide and 30 meters or more in length. They are non-perforated.
Continuous processor is used to develop, fix and wash the roll film.

7.7.2 Cameras

A terrestrial camera is mounted with vertical axis above a copy, which


is stationary during exposure to create microform media. A flow camera
smoothens the movement of the copy through the cameras to expose the film
that moves with the reduced image. Also, it may be produced by computer
output microfilm (COM).

7.7.2.1 Microfiche Camera

All microfiche cameras are terrestrial with a step and repeat mechanism
to advance the film after exposure, the film is processed individually by hand
or by using s dental X-ray processor. For getting high output, cameras are
loaded with a roll of 105 mm film. The film, which is exposed, is developed as
a roll. Sometimes, this roll is cut individual fiche after processing or kept in roll
form to duplicate.

7.7.2.2 Roll film cameras

Fig 7.7: Roll film cameras

For engineering drawings, very often a freestanding open steel structure


is provided. The user may make the movement of a camera on a vertical track.
For filming drawings are placed on a large table with centers under the lens.

7.7.2.3 Flow roll film cameras

A camera is contained in a box. Some versions of the cameras are


transportable and some versions of the cameras are used for bench top. A stack
of material to be filmed in a tray is maintained by the operator; the camera,

The Digitization
88
with the help of the machine, takes one document after the other automatically
for advancement. The documents are seen by camera lens as they pass a slot
and the film behind the lens advances exactly with the image.These cameras
records cheques and betting slips.

7.7.2.4 Flat film

Flat film is the simplest microfilm camera that is still in use is a rail
mounted structure at the top of which is a bellows camera for 105*148 mm
film. The original drawing is held vertical by a frame or a copy board. The
horizontal axis of camera passes through the centre of the copy. It is designed
in such that the structure may be moved horizontally on rails.
In the dark room, a dark slide may be inserted with a film or the camera
may be attached with a roll film holder which after an exposure advances the
film into a box and cuts frame off the roll for processing as a single film.

7.7.2.5 Computer output microfilm

Computer output microfilm is a technology is basically used for


copying information from electronic media to microfilm. The equipment that
accepts the data stream from a mainframe computer is available.

Fig 7.8: Computer output Microfilm

COM devices are used when there are large amount of data.
Within the equipment, a light source makes images; this is the negative
of text on paper. The main advantage to using computer microfilm for
document archival is that a single microfiche card can hold 230 images, and a
1-cubic-foot storage box, containing 6,000 cards, can hold 1,380,000 images.
Another advantage of COM is that it gives best image quality at a very
reasonable cost as compare to paper printing. A microfilm plotter, sometimes
called an aperture card plotter, accepts a stream that may be sent to a computer

The Digitization
89
pen plotter. It produces corresponding frames of microfilm. They produce
microfilm as 35 or 16mm film or aperture cards.

7.8 STORAGE AND PRESERVATION

• Microforms should be handling with care.


• The preservation master copy of any fiche or film should be kept in a
different location to the duplicating master and reference copies.
• Clean lint-free cotton gloves should be used at all times when
handling silver halide film.
• Silver halide master films should not be used for reference purposes
as the film rolling mechanisms on the reader and printer equipment
can severely scratch the gelatin emulsion.
• Reference films should not be left in viewing equipment as
prolonged light exposure will affect image quality.
• Films and fiche should be returned to their protective packaging
immediately after use. Do not leave microform material loose on a
work surface.
• Viewing equipment should be maintained and the work environment
clean.
• Chemical stability is highly affected by low temperature and low
relative humidity. He recommended temperature for storing
microfilms is less than 21 °C, with relative humidity of less than 60
%.
• Black-and-white microfilms should be kept at 8 to 12 °C and 30 to
40 per cent relative humidity, and color microfilms should be kept at
less than 5 °C and 30 to 40 per cent relative humidity.
• Storage furniture should be made of coated metal.
• The storage should be free-resistant and should be free
contamination.

7.9 DUPLICATION

Diazo Duplication
Diazo duplication is an economical, convenient method of reproducing
technical documents, blueprints, graphs, and textual materials of any format.
Diazo material is responsive to ultra-violate light but can be handle in hours of
daylight. Special photosensitive papers such as SK-5, SSN-2 and Mp types
which has high resolution, coloring and contrast are used to make diazo
duplication.
It is uncovered by placing it in touch with a master film and passing
extrapolative UV light through the master on to the copy film. Areas covered

The Digitization
90
by dark parts of the master are sheltered from light while those in adjacent with
clear parts of the master are sensitized.
Vesicular Duplication
Vesicular film and diazo film have the same characteristics. Vesicular
duplicating process employs diazo compounds, but the content which is light
sensitive is incorporated in a thermoplastic colloid.
The diazonium salt release nitrogen gas when exposed to UV light and
then heated to about 130oC. It forms minute bubbles within the emulsion layer.
Vesicular film is sensitized with a diazo dye, which after exposure is developed
by heating. The diazo compound present in the dark areas is destroyed quickly
even though the source of light remains unclear. The dissociation or breakdown
of the diazo compounds results in the generation of millions of minute bubbles
of nitrogen into the film. It helps in producing an image that diffuses light. It
produces a good black appearance in a reader; however, it cannot be used to
create further copies.

7.10 DIGITAL CONVERSION


Digital conversion is a conversion where microform is converted into
digital. It is possible by using an optical scanner, which projects film onto a
CCD array and captures it in a raw digital Format.

Fig 7.7 : Equipment for digital Conversion


Different types of microform are dissimilar in shape and size; therefore,
scanners can usually handle only one type. There are some that have the
possibility of swapping modules for the different types of microform. In such
cases, software is used for converting the raw capture into a standard image
format for archival.

The Digitization
91
Following are some reasons to convert from microfilm to digital:

• Microfilm condition – Over time microfilm tends to deteriorate,


especially if it is not stored in optimal storage environment. We
have seen cases where essential records have been completely lost
due to complete deterioration of the microfilm. In less extreme
cases, the image quality can simply decline to the point where it is
no longer legible.

• Ease of access –The roll needs to be pulled, loaded onto the


microfilm reader, and then the appropriate image must be located.
If the microfilm records need to be accessed frequently, and
substantial time savings will be gained from the conversion.

• Lack of hardware support – As new microfilm creation is phased


out; the number of companies supporting microfilm readers is
declining. In some cases, the cost of supporting or purchasing a new
microfilm reader will pay for the cost of conversion.

The resulting files must be organized such that it is useful. It can be


possible in different ways, depending upon the source media and the required
usage. If the scanner is capable of capturing and processing them, a similar
arrangement can be made for image files. To provide automated full-text
searchable files, optical character recognition (OCR) is also employed
frequently.

Check your progress-3

What is the use of flat film?


What is microfiche film?
What is diazo duplication?

7.11 FORMAT CONVERSION


The use may apply these conversions to camera output or to release
copies. Single microfiche is cut from rolls of 105 mm film. Using a bench top
device available, an operator can cut the exposed frames of roll film and fit
them into ready-made aperture cards.

The Digitization
92
Transparent jackets are made A5 size each with six pockets into which
strips of 16 mm film may be inserted. The equipment lets an operator insert
strips from a roll of film, which is particularly useful as frames may be added
to a fiche at any time. Pockets are created by using a thin film, so that
duplicates may be made from the assembled fiche.

7.12 SUMMARY
Microfilm enables libraries to greatly expand access to collections.
Besides being compact, its storage cast is so far less than paper
documents. 98 document size pages normally fit on a fiche, thereby
reducing to about 0.25 per cent original material.
Color microform being very expensive, it discourages most libraries
supplying color films. Besides this, color photographic dyes are also
likely to degrade over a long period, results in the loss of information,
since color materials get usually photographed by using black and white
film.
Microfiche is a sheet of film that has very small photographs of
the pages of a newspaper, magazine, etc., which are viewed by
using a special machine.
Ultra fiche is generally used to store the data collected from highly
data-intensive operation, such as remote sensing.
Aperture card is basically punched with machine readable metadata
associated with the microfilm image, and printed across the top of the
card for visual identification.
Diazo duplication is an economical, convenient method of reproducing
technical documents,
blueprints, graphs, and textual materials of any format.
Vesicular duplicating process employs diazo compounds, but the
content which is light sensitive is incorporated in a thermoplastic
colloid.
Digital conversion is a conversion where microform is converted into
digital. It is possible by using an optical scanner, which projects film
onto a CCD array and captures it in a raw digital Format.

7.13 KEY TERMS


• Aperture cards: Card used for engineering drawings. They are
Hollerith cards into which a hole has been cut. The user mounts a
35 mm microfilm chip in the hole inside of a clear plastic sleeve, or
secures over the aperture by an adhesive tape.
• Micro card: A card that is similar to Microfiche, but printed on
cardboard rather than photographic film.

The Digitization
93
• Microform: A generic term that can refer to any medium, whether
it is transparent or opaque, bearing micro images.

7.14 END QUESTIONS


60) Explain the uses of microfilm?
61) What are the advantages of Microfilm?
62) What are the disadvantages of microfilm?
63) What are readers and printers?
64) How to take care for microforms?
65) Explain the different types of microfilms and cards used in media.
66) What are the advantages of computer output microfilm (COM) ?
67) Explain the process of digital conversion?
68) Why we need to convert microfilm into digital?
69) What is format conversion?

Answer to check your progress questions


Check your progress -1:
John Benjamin Dancer
It was risky for holding institutions because nitrate film is
explosive and flammable.

Check your progress -2:


Desktop readers are boxes with translucent screen at the
front on to which is projected an image from a
microform.
Portable readers are made by plastic and can be folded
easily for carrying; when open hey project an image from
microfiche on to a reflective screen.
This reader printer allowed for the viewer to see the
microfilm, but also print what was shown in the reader.

Check your progress -3:


Flat film of 105*148 mm is used to take micro images of very large
engineering drawing. They may carry a title photographed or
written along one edge.
Microfiche is a sheet of film that has very small photographs
of the pages of a newspaper, magazine, etc., which are
viewed by using a special machine.

The Digitization
94
Diazo duplication is an economical, convenient method of reproduc
ing technical documents,
blueprints, graphs, and textual materials of any format.

BIBLIOGRAPHY
20. Meckler,Alan marshall. 1982. Micropublishing.Westport, CT :
Greenwood.
21. Baker, Nicholson, 2001. Double Fold: Libraries and Assault on
Paper. New York: Random House
22. Dictionary.com Unabridged

UNIT 8 RECOMMENDATION FOR


MICROFILM DIGITIZATION
Program Name:BSc(MGA)
Written by: Srajan
Structure:
8.0 Introduction
8.1 Unit Objectives
8.2 Picture Quality
8.3 Format and Compression
8.3.1 Formats
8.3.2 Data Compression and Decompression
8.4 Software Form
8.5 Software Requirement for Image Viewing
8.6 Hardware Requirement for Image Viewing
8.7 Long-term Preservation of the Digital Conversion form (Migration)
8.8 Financial Viability
8.9 Digitization and Optical Character Recognition
8.10 Summary
8.11 Key Terms
8.12 End Questions

8.0 INTRODUCTION
As we learn in the previous units about the digitization. Digitization is
the conversion of non-digital data to digital format. In digitization, information
is arranged into units. Also we know that microform can be either paper or on

The Digitization
95
film. Microform can contain micro reproductions of documents for
transmission, storage, reading and printing. Images of microform are generally
reduced approximately twenty- five times from their original document size.
Microfilms are one of the formats of microform.

In this section we are going to learn about the digitization of the


microfilm. As a general rule, the reproduction quality of the digital
conversation form will be determined by the purpose for which it is to be
applied where good quality microfilm is available as a long-term storage
medium. We will learn about the software and hardware requirements for
image viewing.

8.1 UNIT OBJECTIVES:


After studying this unit you will be able to
Describe the picture Quality
Describe the procedure of format and compression
Explain storage form
Describe software requirements for image viewing
Explain long-term preservation of the digital conversion form.
Explain hardware requirement for image viewing.
Describe the procedure of digitization and optical character recognition

8.2 PICTURE QUALITY


Microfilm can be used as long term storage medium if the
quality is good. Bitonal digitization on pan-chromatic ahu microfilm is enough
for the reproduction of printed text, including line drawings, and for modern
non-impact typescript (plastic carbon band, and inkjet and laser printers).
It is mandatory to use gray scale for digitizing manuscripts,
pencil and crayon drawings, typescript produced with silk ribbon illustrations
and drawings, other material with varying shades of grey, and black white and
colour photographs. A sixteen grayscale (4 bit) is generally sufficient digitize
contrast-enhancing ahu film. 256 grayscale (8 bit) should be used for
digitization from halftone film. Digitization with grayscale requiring adequate
storage has implications for cost at all stages of the process. Thus, it should be
under take where such reproduction quality is necessary.
Through bitonal digitization, a resolution of 615 dpi (for 256 grayscale
414 is required to reproduce the small 'e' at a height of 1 mm at higher quality.
WI dpi (256 grayscale 256 dpi), medium quality is achieved. Lower quality is
ac from 277 dpi (256 grayscale 185 dpi).
In digitizing from film, the size of the smallest element that is to be
clearly determines the required resolution. With printed texts, it is the height of
the ‘e’ and with manuscripts; it is the doubled letter width. To apply the
suitable form the quality index, resolution requirements are determined with

The Digitization
96
respect to the s these elements. For bitonal digitization, the following formula
is used to calculate the quality index:
qi = (0.039h)/3, where his the height of the small 'e' in millimeters. The
resolution ‘a’ is represented in dpi.
For digitization with grayscale, the formula is:
qi = (a H 0.039)/2 is used for digitization with grayscale.
In the background, given the good quality reserves of the microfilm, it
will be sufficient for most purposes to aim for a digital standard form of
medium quality. As a result user then able to calculate the required
resolution based on the quality index qi = medium quality, e.g. resolution in
dpi a = 3 H 5/0.039h (here h is the height of small 'e'. Where the height of the
small 'e' is 1 mm, it throws a value of 384. Formula is a = 2 H 5/0.039h for
digitization with grayscale. Applying this formula value of 'a' is 256 for an
`e' of the same height. For digitization with grayscale, formula is a = 2 H
5/0.039h, which gives a value of 256. Often letters of this s (about 7 pt) are
used in footnotes.
As an indication, the aim should be 350 400 dpi for bitonal
digitization, 300 for grayscale. To decide the quality required for each
purpose, test runs with typical films should be used.

8.3 FORMAT AND COMPRESSION


The image data which is readable without being turned should be supplied
perfectly in a continuous format. This image data is suitable for the largest
possible number al applications.
8.3.1 Formats
• TIFF (Tagged Image File Format)
The Tagged Image File Format (tiff) is a very popular and ideal format for
image data. Basically TIFF is a largely platform-independent. The user can
read tiff files and further process it on differing equipment with differing
systems and programs.
Although it is also important while working with TIFF file, systematic
standardization is required. Reason behind this is if there will be any variations
allowed by the tiff format may not be compatible with the installed software.
Here, it is always advisable before doing final work careful discussion and
probably experimental runs with test data are very important. Tiff G 4 provides
for uncompressed and compressed data supply and also available for data
compression without loss of black-and-white material. In case there is a
probability of loss-free compression, it should be used for the delivery of data
to save storage space. As all programs cannot work with compressed tiff data,
it is a must to establish compatibility of the application in advance. If any
doubt, uncompressed supply is to be recommended.
• JPEG (Joint Photographic Experts Group)

The Digitization
97
Another format known as The Joint Photographic Experts Group format or
JPEG format is often used to transfer half-tone and colour pictures, has
variable compression ratios that are all lossy, and hence, it should not be
recommended. This is a commonly used standard method of compression for
photographic images. JPEG uses lossy compression algorithms for images. It
specifies how an image is transformed into a stream of bytes, and not how
those bytes are encapsulated in any particular storage medium. Another
standard developed by the Independent JPEG Group, called JFIF (JPEG File
Interchange Format) mentions the method of developing a file suitable for
computer storage and transmission from a JPEG stream.
Lossy compression is not a method for long-term storage as it could result
in irretrievable loss of data while decompressing or while migrating from one
lossy compression to another. Lossy compression refers as when an image is
compressed and then uncompressed, the decompressed image differs from the
original scanned image.
The advantage of digital information stored in compressed forms is that it
occupies little area, thereby considerably minimizing the storage cost.
Distortions can be particularly severe for high compression ratios; it is possible
to find out the degree of loss by adjusting the compression parameters.
It is wise to agree with the service provider on the organization of the
material appropriate to each application because image data can be organized
differently. Conventionally, each picture is stored in a separate file. Collection
of related pictures in one file (multiple tiff) is possible only with documents
containing no more than a few pages.
To use additional data in the Internet, it is recommended to convert data
into platform-independent formats, which allow the inclusion of the widest
variety of documents. Today, such conversions are part of the service offered
by most of the specialist companies. Depending upon the requirement, this
format should be added to the contract.

8.3.2 Data Compression and Decompression

Compression of data aids in curtailing the use of costly resources,


including hard disk space, transmission bandwidth, etc. However, it is
necessary to decompress data that is compressed to make it suitable for usage,
which can adversely affect certain applications. For instance, at times
compression for videos needs costly hardware in order to ensure speed so that
the user can view the videos while they are getting decompressed. The designs
of data compression schemes, thus, constitute various features, such as trade-
offs, the degree of compression, the quantity of distortion (in case of a lossy
compression scheme, as well as the computational resources needed for
compressing and decompressing data.
Image compression
Image compression can be defined as the use of data compression on
digital images. This process reduces the redundancy of the image data so that
data can be stored or transmitted efficiently. Figure 8.1 displays a chart

The Digitization
98
showing the relative quality of various jpg settings and also compares saving a
file as a jpg normally and using a 'save for web' technique.

Fig. 8.1 Relative Quality of Various jpg Settings


Video compression involves reduction of the amount of data used to
represent digital video images, and is a blend of spatial image compression and
temporal motion compensation. Video compression is an example of the
concept of source coding in Information theory. Table 8.1 presents a history of
International Video Compression Standards.
Table 8.1 History of Video Compression Standards

Year Standard Publisher DRM- Popular


free Implementations

1984 H.120 ITU-T yes

1990 H.261 ITU-T yes Videoconferencing,


Videotelephony
1993 MPEG-1 Part 2 ISO, IEC yes Video-CD

1995 H.262/MPEG-2 ISO, IEC, no DVD Video , Blu-Ray


Part 2 ITU-T , Digital Video
Broadcasting, SVCD
1996 H.263 ITU-T Videoconferencing,
Videotelephony,
Video on Internet
1999 MPEG-4 Part 2 ISO, IEC no
(DivX, Xvid ...)

2003 H.264/MPEG-4 ISO, IEC, no Blu-Ray , Digital


AVC ITU-T Video Broadcasting,
iPod Video , HD DVD

8.4 STORAGE FORM


Digital audio tapes (DAT) or CD-R (recordable) should be used to
transfer the digitized image data. By standardization (DIN 66211 for DAT, ISO

The Digitization
99
9660 for CD-R), readability independent of hardware is guaranteed for both
media. In the near future, the current storage capacity of 650 Mb per CD-R and
2 Gb per DAT tape will increase.
When loss-free compressed or uncompressed image data has been
secured on at least two data carriers then the digital conversion form is
dependably secured. It has been proved that their contents are identical and
readable with no difficulty. In the simplest case, the two data carriers (the
'primary data carrier' and the 'working duplicate') with the same content will
be created by repeated successive transfer of the image data.
It is essential to reach a binding agreement with the company
undertaking the digitization, which will help it store the transferred material
for at least as long as it takes for the customer to check and secure the data.
Multiple working duplicates should be created from it to ensure that
independent hardware is guaranteed for both media. Performance of a
decompression test for each stored digital copy further enhances data security.

Check your progress-1


What is a full form of TIFF and JPEG?
What is lossy compression?
What is image compression?
What is a use of image compression?

8.5 SOFTWARE REQUIREMENTS FOR IMAGE


VIEWING

To access digitized images, several programs for viewing and manipulation


are available for PC and UNIX environments. They are 'Viewer ' software,
obtainable as public-domain software or shareware programs. It is
recommended that each institution should be installed with only one specific
standardized software, whose compatibility with the supply of digitized
conversion formats can be tested in advance.
Software used for controlling and displaying digitized images, as well as
for rapid access should be chosen keeping in view its specific applications. The
requirements that have been outlined serve as the performance criteria for the
viewer components of this application software.
Each viewer software should have the following features:
• Page-turning forward and backward
• Magnification of the whole image and of selected parts of the image

The Digitization
100
• Reduction of the whole image
• Use of whole screen for display
• Option of return to the original image
• Image inversion
• Image rotation
• Display of technical information from the headers, such as picture
size, format, resolution, bit depth and print
Another important feature is that it is very necessary to the software should
be capable for converting image into other formats as well as for the
compression of the image. For instance, xv is available as shareware in the
UNIX world. Depending upon the hardware installed, suitable viewers are
contained in the supply range of the operating systems; e.g., hp-ux image view.
For PCs, Imaging for Windows is a feature that is available without any extra
charge with Windows 95. The other examples of suitable software are PixView
2.1 from Pixel Translation, Scan Mos uvp from ms Electronic Service, or, with
limits, Hijaak Pro 2.0 from North American Software.

8.6 HARDWARE REQUIREMENTS FOR IMAGE


VIEWING

As we seen earlier how software is important in image viewing, hardware


is also as important as software. Hardware installation has to be done very
carefully as it should be fulfill the requirements for inspection and use of
digitized images must be provided at each institution. Digitized images
containing relatively large quantities of data than text files, it needs a heavier
data bus and ram, if the picture recovery time is to be kept within acceptable
limits. The minimum requirements are full filled by PC systems based on
processors of type 468 with 66 MHz or Pentium, with Windows 3.11 or
higher, 16 Mb ram and a hard disk in the gigabyte range.

For ergonomic design of the work station, particular importance is given


to the size of screen (at least 17 inches diagonally), speed, graphic card and
appropriate drive. Normal PC screens with 14 inches are not suitable for the
representation of image, quite apart from the question of resolution. Since the
resolution capacity of normal PC colour screens is about 75 dpi, the image
resolution needs to be reduced to produce it on the screen. Large screens that
are manufactured especially for image work can reach higher resolutions, up
to 120 dpi. In principle, the digital conversion form provides a higher
resolution, but this becomes clear only with magnification of the selected parts
of the screen.

The Digitization
101
8.7 LONG TERM PRESERVATION OF THE
DIGITAL CONVERSION FORM
(MIGRATION)

Even where a high-quality microform is available alongside the digital


conversion form, and thus allows, if necessary, for repeated digitization, the
converted format must be preserved in the long term. Only on financial
grounds, repeated digitization is not successful. Given the growing
significance of electronic information systems in research and teaching, the
digitized images should be useable in the future for many possible
applications. Therefore, the whole data should be preserved long term by
retaining as much information as possible, i.e., with loss-free compression or
uncompressed, in a format that lets every conceivable use. The storage of data
that have been compressed and formatted only for one specific application is
not sufficient.
Therefore, the loss-free compressed or uncompressed image data must be
transferred to new systems in a tiff format or in a platform-independent tiff
consequential format. A planned concept for this adaptation must be followed
in line with technical progress, and must not omit any development step. The
regular adaptation must consider not only the expected durability of the
storage medium, but also the currency of the format and the availability of the
hardware and software required for reading. The rapid succession of
innovations in hardware and software, which hardly respect the efforts
towards standardization (scarce in this area anyway), can create problems of
compatibility. Enough care should be taken while performing migration. The
results of each image should be checked. This is because even if one bit of
data is lost from a graphic file, it could result in dire consequences such as the
loss of a complete image. Thus, prior to replacement of systems, suitable
measures—both at organizational and technical levels—should be taken. One
of the objectives of migration is to store information in at least two long-term
storage mediums, secure against interference, in a platform-independent
format compatible with the EDP system being used. Thus, the complete
contents of the transferred image data can be checked against the data source
of the earlier generation, as long as the EDP system that produced it is
available.

8.8 FINANCIAL VIABILITY

A service bureau should generally digitize microforms. According to


earlier recommendation, the costs of digitizing a uniformly produced 35 mm
microfilm depends not only on the size of the task, the mode (bitonal or
grayscale) and the resolution, but also on the quality of the film and the type
and readability of the filmed material As the costs of digitization are also
dependent on the market situation, it is impossible to give any general
indication of prices that will have long-term validity.

The Digitization
102
The option between digitization with a general rising of the resolution
on the one hand and with grayscale on the other has an indirect attitude on the
cost involved in conversion. Higher densities of data mean higher costs in the
supply of data, storage, and handling. It is also important to take account of
the consequential costs of any planned migration. If needed, it may be more
inexpensive to digitize a second time from the microfilm, rather than to
constantly migrate the data.
The cost factors mentioned take account only of digitization itself.
According to experience, further costs are incurred by manual turning,
splicing images out of the general frame and marking. Programming costs as
well as the initial cost of programming the film scanner must also be
considered based on customer requirements. Finally, there are the costs of
downloading data, the carrier medium, operating the CD-RR, and packing and
transport. The cost of production increases where individual work and image
enhancement using special software are necessary to improve quality

8.9 DIGITIZATION AND OPTICAL CHARACTER


RECOGNITION

Optical character recognition (OCR) is a machine process, which converts


visible alpha-numeric signs into coded data (codes with respect to
alphanumeric signs and their context), according to a more or less standard
pattern of recognition. Here lies a fundamental difference between fully
automatic text recognition and trainable recognition, which supports pattern
recognition with dictionaries, linguistic methods and features of 'artificial
intelligence'. The text recognition programs more and more integrate
dictionaries and substitution lists adjustable according to the degrees of
certainty. Systems work with fuzzy logic and probabilities for preventing the
substitution of inaccurate characters that were wrongly recognized as accurate.
Some systems contain an interesting feature, known as 'mixed mode'. Signs or
groups of signs that are not recognized with certainty are retained as images
and stay in that un coded form, in place of the remaining correctly recognized'
text.
Page segmenting, besides reliable text recognition, is an essential
performance feature of text recognition systems; that is, the interpretation of
contextual information like columns, graphics, and, blocks of text.
Additionally, features are deskew, segmenting of individual units, and
recognition of types of handwriting and signatures or of more than one
language in the same document.
If there are more than four or five mistakes per 1,000 units, giving out
by hand is more economical. Otherwise, the economical cut-off point for
machine text recognition is at 99.95 per cent.
Reliability of text recognition depends basically on the background, the
kind and size of the writing, and the contrast between text and background.
Dirt on the material and omissions from the image information caused by

The Digitization
103
incomplete or irregularly printed letters results in disruption of text
recognition. Reliability also depends on the density of image information. The
greater amount of image information being processed increases the higher
recognition rate. Therefore, higher resolution in digitization can improve the
recognition rate, as with digitization in grayscale.
In essence, the quality criteria that have been mentioned also apply to
microfilm. To achieve high resolution and adequate contrast, the correct
standard background density and minimal ground shade are important.
Digitizing negative film avoids the disruption caused by dirt and scratches. In
practice, there has not yet been enough experience with machine text
recognition in conjunction with microfilm to allow the formulation of reliable
views.

Check your progress-2


What is microform?
What is the full form of OCR?
What is the use of OCR?

8.10 SUMMARY
Digitization of microfilm should not aim at the best possible
result in the way that is mandatory for direct digitization of
endangered original material.
The reproduction quality of the digital conversion form will be
determined by the purpose for which it is to be applied where
good-quality microfilm is available as a long-term storage
medium.
To achieve high resolution and adequate contrast, the correct
standard background density and minimal ground shade are
important.
Digitizing negative film avoids the disruption caused by dirt and
scratches.
In practice, there has not yet been enough experience with
machine text recognition in conjunction with microfilm to allow
the formulation of reliable views.

8.11 KEY TERMS

The Digitization
104
• Data Compression: A process of encoding information using fewer
bits (or other information-bearing units) than an encoded
representation would use, through use of specific encoding
schemes.
• Joint Photographic Experts Group (JPEG): A commonly used
standard method of compression for photographic images. JPEG
uses lossy compression algorithms for images. It specifies how an
image is transformed into a stream of bytes, and not how those
bytes are encapsulated in any particular storage medium.
• Lossy Compession: When an image is compressed and then
uncompressed, the decompressed image is usually not quite the
same as the original scanned image. This is called lossy
compression.
• Microform: A generic term that can refer to any medium,
transparent or opaque, bearing microimages.
• Tagged Image File Format (TIFF): This type of file formats
stores images, including photographs and line art. Originally created
by Aldus for use in desktop publishing, the TIFF formats is widely
supported by image-manipulation applications, by publishing and
page layout applications, by scanning, faxing, word processing,
optical character recognition (OCR) and other applications.

8.12 END QUESTIONS


70) Explain the picture quality?
71) Write a note on TIFF format.
72) Write a note on JPEG file format.
73) What is data compression?
74) What are the features of image viewing software?
75) Explain the hardware requirement for image viewing?
76) What is financial viability of digital microforms?
77) Write a note on long-term preservation of the digital conversion
form.
78) Describe the procedure of digitization and optical character
reorganization?
79)

Answer to check your progress questions


Check your progress -1:
TIFF (Tagged Image File Format)
JPEG (Joint Photographic Experts Group)

The Digitization
105
Lossy compression refers as when an image is compressed
and then uncompressed, the decompressed image differs
from the original scanned image.
Image compression can be defined as the use of data
compression on digital images.
This process reduces the redundancy of the image data so
that data can be stored or transmitted efficiently.

Check your progress -2:


A generic term that can refer to any medium, transparent
or opaque, bearing micro images.
Optical character recognition
OCR is a machine process, which converts visible alpha-
numeric signs into coded data (codes with respect to
alphanumeric signs and their context), according to a
more or less standard pattern of recognition.

BIBLIOGRAPHY
23. Dictionary. coin Unabridged (v 1.1). New York: Random
House, 2006. Meckler, Alan Marshall. 1982.
Micropublishing. Westport, CT: Greenwood.
24. Bourke, Thomas A. 'The Curse of Acetate; or a Base Conundrum
Confronted', Microform Review 23(1994): 15-17.
25. Saffady, William. 2000. Micrographics: Technology for the 21st
Century. Prairie Village, Kansas: ARMA International.
26. `Seidell MicrofihnViewer in Production'. American
Documentation1.2 (Apr, 1950): 118.
27. Arlitsch, Kenning, and John Herbert. 'Microfilm, Paper, and OCR:
Issues in Newspaper Digitization', Microform and Imaging Review
33, (Spring 2004): 59-67.
28. Baker, Nicholson, 2001. Double Fold: Libraries and the Assault on
Paper. New York: Random House.
29. Rider, F. 1994. The Scholar and the Future of the Research Library.
New York: Hadham Press.

UNIT 9 GIS AND SCANNING TECHNOLOGY


Program Name:BSc(MGA)

The Digitization
106
Written by: Srajan
Structure:
9.0 Introduction
9.1 Unit Objectives
9.2 Working of a Scanner
9.3 Types of Scanner
9.4 General Features Of a Scanner
9.5 Types of Scanning
9.6 Processing of Scanned Document
9.7 Choice of Scanning or Digitization
9.8 Accuracy of Scanned Images
9.9 Scanning Products
9.10 Summary
9.11 Key Terms
9.12 End Questions

9.0 INTRODUCTION
In this unit we are going to learn about the scanners. Now a day we can
see scanner can be used everywhere like schools, offices, institutes, shops and
also in GIS. Scanning is basically a process that converts paper maps in to
digital format.

As we are discussing about the GIS i.e. geographic information system,


maps are the important part of the system. We are going to see how scanner is
useful in this GIS. Earlier, most maps were prepared conventionally. New
maps can be produces but that was the time consuming process and also
increase the amount of work. These paper maps have to be first converted in to
computer usable digital format.

The technology used for this conversation is called scanning and the
device used for the operation is called scanner. We will learn the whole process
of scanning a document in this unit. We are also going learn about different
types of scanner that are available such as flatbed scanner, transparency
scanner, handheld scanner etc.

9.1 UNIT OBJECTIVES:


After studying this unit you will be able to
Describe the working of a scanner

The Digitization
107
Describe the different types of scanner
Explain the general features of a scanner
Explain the processing of a scanned document
Explain the difference between scanning and digitization
Describe the accuracy of scanned images
Describe the scanning products

9.2WORKING OF A SCANNER
In this section we are going to learn how the scanner works. As we all
know scanning is the process in which we can copy contain on paper into our
computer. In scanner there is a beam of bright white light. This light passes on
the image by the scanner; the light is reflected back to the photosensitive
surface of the sensor of the scanner head. Each pixel transfers a gray tone
value. As we all know values given to different shades of black in the image
ranging from 0 (black) to 255 (white), i.e., 256 values to the scan board
(software). The software interprets black in terms of 0 and white in terms of 1.
Thus, a monochrome image of the scanned portion is obtained. The complete
image is scanned in tiny strips by the scan head as it moves forward; all the
while the relevant information is continuously stored by the sensor. The
software running the scanner receives the information from the sensor into a
digital image and is known as one pass scanning.
The most essential component of a scanner is its scanner head that can
move along the length of the scanner. The scanner head incorporates either a
charged-couple device (CCD) sensor or a contact image (CIS) sensor. A CCD
is composed of numerous photosensitive cells or pixels packed together on a
chip. For ensuring the best image quality, the most advanced large format
scanners employ CCDs with 8000 pixels per chip.
Scanning a colour image is different as the scanner head needs to scan
the same image for three different colours: red, green and blue. For scanning a
colour image, older colour scanners had to scan the same area thrice over for
these three different colours. This type of scanner is known as three-pass
scanner. Now most colour scanners, however, use colour filters to scan in one
pass scanning all these three colours at once. Theoretically, a colour CCD
works similar to a monochrome CCD. But each colour is created by blending
red, green and blue. Therefore, a 24-bit RGB CCD presents 24 bits of
information for each pixel. A scanner using the different three colours (in full
24 RGB modes) can normally make up to 16.8 million colours.
Now a new technology has emerged which is full width, single-line
contact sensor array scanning that enables the scanner to operate at previously
undoable speeds. With this new technology, the document to be scanned
passes under a line of LEDs that capture image.

The Digitization
108
Check your progress-1
What is the most essential component of a scanner?
What is a color range of grey tone?

9.3 TYPES OF SCANNERS


A scanner is a device that converts images to a digital file you can use with
your computer. There are many different types of scanners available for
performing similar jobs; however, they handle the job differently using
different technologies and results they produce depend on their varying
capabilities. Some scanners are flatbed scanners, film scanners, video capture
devices, handheld scanners, and drum scanners.
1. Flatbed Scanner:
Flatbed scanner is also called a reflective scanner. This is a most
popular type of desktop scanner. This scanner is used for scanning
prints or other flat materials. Reflective art is mostly scanned by using
flatbeds. These scanners work by shining white light onto the object
and reading the intensity and color of the light that is reflected from it.

Fig 9.1: Flatbed Scanner

Following are types of flatbed scanner.


• Entry-level flatbed scanners usually have the following
specifications: 8-1/2" x 11" scanning area, 300 to 400 spi scanning
ability (often interpolated to 800, 1200, or 1600 "spi"), 8-bits per color
channel, and low cost. The scanner often comes together with highly
effected value-added software like Adobe Photoshop. These machines
frequently offer excellent price/performance ratio.
• Mid-level flatbed scanners are different from entry-level flatbed
scanner. First they are more costly than entry level flatbed scanner.

The Digitization
109
Second, they’re meant for professional market, they rarely come
bundled with "value-added" software such as Photoshop. Third, and
most importantly, they have significantly better specifications. For
example, a typical mid-level flatbed scans at 600x1200 spi and 10-bits
per color, which result in scans of significantly higher quality. Some
mid-level scanners may also provide a larger scanning area.
• High-end flatbed scanners have more features which are more useful
to professionals. Features like a noise-free design, large scanning area,
high dynamic range, and high resolution. In addition, a premium price
is offered for these scanners.

2. Film Scanners

Fig 9.2: Film Scanner

This type of scanner is also called a slide or transparency scanner. This


scanner let you scan everything from 35mm slides all the way up to
4x5-in. transparencies. These scanners work by passing a tiny beam of
light through the film and reading the intensity and color of the light
that emerges. The Photo CD scanning process works with a high quality
film scanner.
These scanners are a bit costly as they are targeted professionals. For
users requiring only an occasional transparency san can opt for flatbed
with transparency adapter.
3. Video digitizers
Video digitizers are used for multimedia purpose, especially creating
QuickTime movies.
Like Flat bed scanner, same digital CCD arrays found in video cameras.
An analog signal at (50 or 60 MHz) which drives other analog devices
such as VCRs and television sets, or is captured onto video tape is

The Digitization
110
produced by digital CCD array. Using specialized hardware and
software, the analog video signal can be digitized. Video capture
software is very much similar to traditional scanning software, while
the hardware is usually a board that fits inside your computer.
4. Drum Scanners
Drum scan has the highest quality but they are very expensive. Drum
scanners are used by professional color trade shops for producing color
separations for high-end printing. For greater dynamic range and color
accuracy Drum scanners use PMT (Photo Multiplier Tube) technology
instead of CCD technology. In PMT, the document to be scanned is
mounted on a glass cylinder. At the center of the cylinder is a sensor
that splits light bounced from the documents into three beams. Each
beam is sent through a color filter into a photo multiplier tube where the
light is changed into an electric signal.

Fig 9.3: Drum Scanner

Even though they are costly, drum scanners offer features which are
unavailable to desktop scanners including direct conversion to CMYK,
auto sharpening, batch scanning, greater dynamic range, and huge
image scanning areas. Drum scanner are different from other scanners
because of their productivity. Drum scanners can produce more scans
per hour than a desktop unit because the process of scanning to CMYK
is done automatically.

5. Handheld Scanners

Hand scanners are portable and low price than a flatbed scanner. These
scanners generally plug into a computer's printing port, as opposite to a
SCSI port, thus allowing them to be conveniently transported between
workstation to workstation. Many people use them with a notebook or

The Digitization
111
laptop. The major drawback of hand scanners is that they are less
accurate than flatbeds. Reason behind this is they have weaker light
sources and often produce uneven scans. Now most hand scanners
provide an alignment template that helps users guide through the
process of scanning. To help stabilize its scanner, the manufacturer
ships a motorized "self-propelled" unit.

Fig 9.4: Handheld Scanner

To achieve high quality results, hand scanners offer 400 spi


resolution and 24-bit color. This scanner has 4" to 5" wide scan head
compels a user to create make multiple passes to scan even average-
sized documents. You need to use the supplied stitching software to
merge these partial scans back together again which is a time
consuming task. Nonetheless, most people use hand scanners as they
can produce high quality, very popular, quick and easy, and low-cost
scans.
6. Miscellaneous
Leaf's Lumina camera cum scanner is recently developed equipment.
The Lumina is actually a scanner, but it resembles to be a digital
camera. Lumina uses standard Nikon bayonet lenses, therefore it is
extremely flexible. For razor-sharp images and greater dynamic range,
it scans at 2700 x 3400 and 36-bits deep.

Fig 9.5: Leaf’s Lumina

The Digitization
112
Using a standard fax-modem and proprietary Trio software, pocket-
sized device from Trio Information Systems lets you convert any fax
machine into a 1-bit scanner or printer.
Pacific Crest offers a business card scanner. As its name
suggested, this scanner is helpful for those who need to input and file
tons of business cards.
7. Digital cameras

Digital cameras are used to take photographs of three-dimensional


objects, much like a regular camera. But advantage is you don’t have to
wait for film developing and processing. Studio-only units provide
larger image size and dynamic range; however, they need attachment to
a host computer –which barely a portable solution. Portable units with
high-resolution, high-quality features are expected in the market in the
near future.
8. Stand - Alone Oversize Digitizers

Many manufactures provide sheet-fed and oversize digitizers for large


originals up to 40" wide which are mainly used as engineering
drawings. These digitizers are somewhat unusually related to the
automatic document feeders for flatbed scanners.

Fig 9.6: Sheet-fed scanner


In the case of such digitizers the original is pulled through the scanning
mechanism. They differ from the other scanners in that they support a
stationary scanner head. They generally have a striking similarity with
CAD pen plotters. Generally, roll-fed digitizers can only scan in line art
and grayscale modes because of the large image area involved, and
subsequent large file size. These devices are also quite expensive
because of their uniqueness and specialization.

The Digitization
113
Check your progress-2
What is a popular type of desktop scanner?
What are the features of high-end flatbed scanner?
What is Leaf’s Lumina?
What is the importance of Pacific Crest?
What is a full form of PMT used in Drum scanner?

9.4 GENERAL FEATURES OF SCANNER

The performance of a scanner depends on its features. The general features


of the scanner are as follows:
• Speed:
The size of a scanned document regulates the speed of a scanner.
For large documents, the time taken for scanning is relatively
more. Likewise, the scanning speed I also affected by resolution.
When resolution is high, the scanning process becomes slow and
takes much greater time compared to scanning in low resolution.
• Resolution
Resolution is the degree of sharpness of a displayed character or
image, and is considered as an important property of a scanner as
well as a scanned image. Generally, resolution is expressed in dots
per linear inch for a seamier. Therefore, 300 dpi is meant as 90,000
dots per inch. An increase in the scan resolution is usually
understood as an increase in the size of the image. Large image
mean larger memory consumption. As a result a compromise has to
be maintained between image size and image resolution.
• Type of Interface software:
The communication between the application and the scanner
hardware that conducted through the scanner interface software.
Normally, scanners are available in two interfaces:
Small Computer System Interface (SCSI) scanner
Parallel Interface Scanners
SCSI scanner scans fast. This interface generally exists in the
technically advanced scanners. Some low-end scanners, which are slower than
the SCSI interface scanners, use Parallel Interface.

9.5 TYPES OF SCANNING


Scanning devices use a magnetic or photoelectric source for scanning
and converting images into electric signals. These signals can be processed by

The Digitization
114
an electronic apparatus such as a computer. The image that a user can scan and
convert is graphics, colored or black and white texts, and pictures.
1. Black and white raster scanning
It is a simplest way to convert a document which can be performed on
line drawings, reduce media, text or any one color document. This
method provides the best solution for archiving and storage projects, in
which documents are viewed and printed but never changed. Therefore,
it is an ideal solution as the first stage in a planned document
conversion project.
Applications
• Archival drawing libraries
• Electronic document distribution
• Vectorization templates
A user can convert drawings into files for quick and low cost
library access. But if the original drawings are of poor quality it will
create a problem. When a document is scanned, imperfections such as
background, dirt, residue, or stray markings on original source
documents are introduced and stored along with original drawing
content. These imperfections, besides reducing legibility, can enlarge
file size, often by a factor of two or three times. Much of the
background ‘noise’ and ‘dirt’ contained in poor quality source
documents are electronically removed by a raster clean-up process.
Clean file results in files that are smaller and easier to store and
retrieve. The media storage cost of Clean File is also less.
2. Grayscale and color raster scanning
Grayscale and color images can be quite large. A user must make sure
that the system has the ability to handle files whose size is often
measured in terms of megabytes because every pixel is virtually
populated with a value. If a user tries to compress the file, it results in
little or no reduction in file size.
3. Grayscale or color scanning
Grayscale or color scanning is most commonly used to:
• Load background images into high-end drawing or mapping
software as an information base for advanced project work.
• Capture images for use in desktop publishing applications.
• Analyses the frequency of the color ranges, mainly for infrared
and vegetation photos.
Applications
• Navigation Charts (air and nautical)
• Full color maps
• Aerial photography

The Digitization
115
• Brochures and artwork
• Toposheets
• Full color maps
• Cartographic base data for high-end mapping systems
Sometimes, a user needs to collect only the selected information from
source documents such as toposheets and other colour originals. This
information may include components such as hydrology, oil and gas fields,
contours and transportation networks. Instead of using black and white printing
plate separately, separate images of map features are created that can be
differentiated by colour. For example, you can extract elevation contours from
a colour image of a toposheet. This extraction process is much faster; therefore,
the manufacturing cost involved is more than capturing data directly from the
colour image.
In addition, the colour image can be preserved for using as a visual
background reference or just as archived information. The resulting file is
more manageable and much smaller than the image containing all the colours
found on the source document.

Check your progress-3


What is the use of scanning?
What is a use of black and white raster scanning?
Write two interfaces of the scanner?

9.6 PROCESSING OF SCANNED DOCUMENTS

Scanning can be done for both the raster as well as vector images. For
raster images scanning converts an image into an array of pixels, thus creating
an image in raster format. When an image is created by a series of pixels and
also arranged in rows and columns, the image is called raster file. The image
is captured by a scanner by assigning a row, a column, and a colour value (a
grayscale, black or white, or a colour) to each pixel. A continuous image is
'painted', one pixel at a time and also one row at a time. The concept
associated in raster scanning is 'resolution'. Resolution means number of
pixels per each in the image. Documents are generally done at resolutions
between 200 and 500 ppi for scanning of large formats.
If user wants the higher quality then image required higher resolution
which increases file size. An increase in the resolution from 200 to 300 dpi
increases the size of the file not by 50 per cent but by 125 per cent, from

The Digitization
116
40,000 to 90,000 pixels per square inch. A black-and-white scale scan needs
less storage than a grayscale scan at the same resolution, and a colour image
needs even more.

Most GIS (Geographic information system) applications are, however,


based on vector technology; therefore, vector formats are the most common.
In vector format, the coordinates of the starting and ending points of the line
represent the position of a line. Manual digitization of image is one of the
simplest methods to obtain a vector format of a map or an image. Digitization
is a process that involves tracing the features of a source map by using a
pointing device, called a digitization cursor. The position of the cursor is
converted into a digital signal that can be indexed to show the actual
coordinates of the point. But, it takes more time, patience and labour. But then
the operator can easily convert the scanned image into vector format by heads
up digitization in which the operator uses a mouse for interactive editing and
cleaning the raster image and for removing stray marks or line gaps gathered
during the course of the scanning. In addition, tools let the user select
individual raster features for vector conversion, invoke automatic line
following and thinning vector conversion, direct keying of attribute data, and
other tools for hastening the process of vector conversion (Figure 9.3).

Fig. 9.3: Digitizing Table

9.7 CHOICE OF SCANNIG OR DIGITIZATION


As we discus earlier that scanner support both the raster as well as vector
images. User needs to decide over the type of data model to use, raster or
vector. The choice of data structure for using any particular application is,
however, often an arbitrary decision, as GIS software generally supports both
the structures. Data structure involves logical arrangement of data in a suitable
format helpful for the system to manage it. No matter which model and
structure an operator chooses, data needs to be converted into a format that
can be used by the GIS. Converting data into digital format is labour-
intensive, and can account for up to 80 per cent of the total system cost.

The Digitization
117
Though scanning is fast and easy and speed yet the resulting raster
images are without the intelligence required for vector-based GIS. To keep the
file to a manageable size, a fair degree of operator expertise is also required to
apply compression techniques. An operator can apply vectorization
automatically or interactively to produce intelligent vector files.
Table digitizing gives advantage to employ low-cost digitizing
equipment. However, operator training is required to obtain good results. On
the other hand, the procedure is laborious and time-consuming; therefore, it
costs more.
One more option, such as raster-to-vector conversion and pattern
recognition in this trade between cost, quality, productivity and usability.
While scanning, digitization of table needs bulk conversion from text
documents to line art and even video images. Advanced techniques have been
developed to enter material from other sources. These techniques range from
simple programs facilitating the keyboard entry of survey co ordinates to the
techniques reconciling aerial photographs with base maps. Remotely sensed,
photogrammetric and CAD-generated data represent further potential as input
sources.

9.8 ACCURACY OF SCANNED IMAGES

Scanned images being the main source of input data for GIS, have
increased the use of scanners in the GIS environment. The excessive use of
scanners in this environment has further forced to give a second thought about
the limitation of scanners in producing accurate scanned images. Since most
GIS software need very specific accuracy, the accuracy of input data has to be
quantified before its use. Generally, the average GIS database requires the
input data be accurate to at least 0.018". It indicates that the location of an
input data must fall within 0.018" of its actual geographic location at the scale
of the map. Therefore, a scanner cannot produce more positional accuracy
error than the maximum error permissible in the GIS. A user can easily
quantify standard accuracy issues such as source availability, media stability
and differences in data collection mechanisms and can decide whether the
resultant data is acceptable for their GIS before integration. With the recent
entry of scanned data, a new issue has been raised to be dealt with—the
accuracy of the input scanner. Scanners still have the tendency to be quite
expensive; therefore, the effect of scanning large amounts of data that do not
meet the accuracy requirements of the GIS can be damaging. Given this, users
must be capable of measuring the accuracy of their own scanners and service
providers must be capable of proving the accuracy of the scanner to their
customers. The accuracy of a scanner is defined as its capacity to produce an
image with output dimensions exactly proportional to input document. A user
can dimensionally correct the scanned image within the specified tolerances;
however, it cannot be predicted about the data within the body of the image.
Even though the image may have exactly the right amount of pixels, features
inside the image may be approximately three-or four-tenths of an inch from

The Digitization
118
their correct location at the scale of the map, despite the fact that the scanner is
operating within stated accuracy specifications. Depending on the scale of the
source map, three-tenths of an inch can translate to several hundred meters of
error on the ground. This specification is usually unacceptable for any GIS.
Hence, it is necessary to have an idea about the accuracy of the scanned
image, so that corrective measures can easily be incorporated in the analysis.

9.9 SCANNING PRODUCTS

The central objective of several major international companies is to


produce wide format scanners primarily for use in the GIS industry. The
following is a discussion about various scanning products offered by these
companies.
1. Contex scanning technologies
Contex Scanning Technologies offers a large range of wide format colour
scanners. These scanners come along with unique CAD Image/SCAN +
FEATURE software that provides a host of features such as Colour Feature
Extraction, Rotation, Alignment, Cropping, Despeckling, Hole filling,
Reversing, Mirroring, and so on. It provides users the flexibility to scan at any
specific resolution to suit the needs of the software or printer that is used.
2. Abakos digital images
Abakos offers the Deskan range of colour scanning systems, which can be
used by users requiring a highly accurate large format colour scanning
capability. These scanners include the following important features: powerful
in-built Raster Editing Capability (cost-effective raster to vector conversion)
and OCR (text recognition), manipulation and editing with Auto scan function
for faster scanning. Additionally, the system is compact, light and easily
transportable.

3. VIDAR systems corporation


VIDAR designed a full line of high-quality, large-format scanners to fulfil
the needs of GIS professionals and for reprographics industry. All scanners
include VIDAR's Trulnfo, which is a scanner control, and archiving software
that allows capturing quality images, while quickly and efficiently indexing,
organizing, and sharing scanned documents.
4. Anatech scanners

Action Imaging Solutions introduced the Anatech scanners which can be


used for scanning maps, colour schematics, drawings, photographs, and other
opaque colour media. By using patented pinch roller document transport
technology we can protect document damage and by minimizing skew effects
we can ensure accuracy. The features ofthe scanners include: All digital, low
noise design for colour fidelity along with onboard, real time colour
classification scheme etc.

The Digitization
119
5. Widecom scanners
Widecom is one of the leading providers of high performance wide format
scanners. The following are the main features of the scanners: It can scan
documents up to a 1/ 2-inch thick at high speeds and can digitally save items
such as foam boards, artwork, and other unusually thick documents. The
advanced filters such as deskew, diffuse, despeckle, and sharpen, individual
RGB adjustment, smooth, and Gamma Correction of WIDECOM help in
providing better scanned documents.

Check your progress-4


What is digitization Cursor?
What is Widecom scanner?
What is the full form of GIS?

9.10 SUMMARY
A scanner is an electronic input device that converts analog information
of a document such as a map, a photograph or an overlay into a
computer usable digital format.
Scanning automatically captures map features, text, and symbols as
individual cells or pixels, and produces an automated image.
The essential component of a scanner is its scanner head that can move
along the length of the scanner. The scanner head incorporates either a
charged-couple device (CCD) sensor or a contact image (CIS) sensor.
A CCD is composed of a number of photosensitive cells or pixels
packed together on a chip. F
Providing better image quality, the most advanced large format
scanners use CCDs with 8000 pixels per chip.
Colour scanners use colour filters to scan in one pass scanning all the
three colours red, green and blue at once.
A new technology has emerged which is full width, single-line contact
sensor array scanning that enables the scanner to operate at previously
undoable speeds. With this new technology, the document to be scanned
passes under a line of LED's that capture image.
Different types of scanners are available for performing similar jobs;
however, they handle the job differently using different technologies and
the results they produce depend on their varying capabilities.
The most popular type of desktop scanner is the flatbed scanner.
Reflective art is mostly scanned by using flatbeds.

The Digitization
120
9.11 KEY TERMS
• Digital cameras: Used to take photographs of three-dimensional
objects; much like a regular camera, users are not required to wait
for film developing and processing.
• Drum scanner: used by professional color trade shops for
producing accuracy. Drum scanner use PMT (photo Multiplier
Tube) technology instead of CCD technology.
• High-end flatbed scanners: these substitute for drum scanners and
provide features that meet user demands such as a noise-free design,
high dynamic range, large scanning area, and high resolution.
• Leaf’s lumina: Lumina is actually a scanner, but it resembles a
digital camera. Lumina uses standard Nikon bayonet lenses; and
therefore, it is extremely flexible.
• Transparency scanners: Multi-format transparency scanners
enable you to scan everything from 35mm slides to 4*5”
transparencies.
• Widecom scanners: One of the leading providers of high
performance wide format scanners.

9.12 END QUESTIONS


80) How the scanner works?
81) Write a note on flatbed scanner.
82) What is video digitizer?
83) What is a use of drum scanner?
84) What are the features of the scanner?
85) Where the grayscale and color scanning mostly used?
86) What are the applications of black and white raster scanning and
color scanning?
87) Give brief description of the different types of scanning products.

Answer to check your progress questions


Check your progress -1:
Scanner Head
ranging from 0 (black) to 255 (white),
Check your progress -2:
Flatbed Scanner
Noise-free design, large scanning area, high dynamic range,
and high resolution.

The Digitization
121
Leaf's Lumina camera cum scanner is recently developed
equipment. The Lumina is actually a scanner, but it
resembles to be a digital camera.
This scanner is helpful for those who need to input and file
tons of business cards.
Photo Multiplier Tube

Check your progress -3:


Scanning devices use a magnetic or photoelectric source
for scanning and converting images into electric signals.
These signals can be processed by an electronic apparatus
such as a computer. The image that a user can scan and
convert is graphics, colored or black and white texts, and
pictures.
It is a simplest way to convert a document which can be
performed on line drawings, reduce media, text or any
one color document.
SCSI and Parallel interface

Check your progress -4:


It is a pointing device used in digitization process to trace the
features of a source map.
One of the leading providers of high performance wide format
scanners.
Geographic Information System

BIBLIOGRAPHY
30. Digital Projects Guidelines. Arizona State Library, Archives and Public
Records http://www.hb.az.us/digitaV
31. The NINCH Guide to Good Practice in the Digital
Representation, and Management of Cultural Heritage Materials
(Version 1.1 ofthe First Edition, published February 2003,
http://www.nyu.edu/its/humanities/ninchguide/)
32. RLG Tools for Digital Imaging
http://www.r1g.org/preserv/RLGtools.html
33. SOLINET. Disaster Mitigation and Recovery Resources
http://www.solinet.net/ preservation/preservation templ.cfm?doc_id=71
c.

UNIT 10 DIGITIZATION PROJECT


MANAGEMENT
The Digitization
122
Program Name:BSc(MGA)
Written by: Srajan
Structure:
10.0 Introduction
10.1 Unit Objectives
10.2 Project Planning
10.3 Copyright Issues Associated with Digitizing Images
10.4 Determining the Costs of a digitization Project
10.5 Standards and Guidelines to Consider
10.5.1 Metadata
10.5.2 Image Standard and Guidelines
10.5.3 Preservation and Storage standards and Guidelines
10.5.4 Presentation Device
1.05.5 Transmission Issues
10.6 Selecting the Equipment and software
10.6.1 Preparing Materials for Digitization
10.7 Workflow Process
10.8 Maintenance/Management and quality Control
10.8.1 Quality Control
10.9 Migration of Data to New Formats
10.10 Storage, backup and preservation
10.11 Summary
10.12 Key Terms
10.13 End Questions

10.0 INTRODUCTION
As we discussed in earlier unit about what digitization is and other
thing related to digitization. In this unit we are going to learn how to manage
that digitizing project. As we know digitization is the conversion of non-digital
data to digital format. In digitization, information is arranged into units.

The process of digitization requires careful management. The goal of


the management should be to establish a flexible, adaptable system whose staff
and procedures can accommodate the change. There are many aspects as to
consider while doing project management. Expenses are decided during budget
time. Also we are going to learn about the copyright issues comes while doing
digitization of images. So basically, this unit throws light on the numerous

The Digitization
123
factors that affect the digitization project management and identifies methods
for successful address of these issues.

10.1 UNIT OBJECTIVES:


After studying this unit you will be able to
Explain the process of project planning
Explain the concept of copyright issues associated with digitizing images
Explain the cost determination of a digitization project
Explain the process of selecting equipment and software for the project
Describe workflow process
Explain the importance of maintenance/management
Explain the process of quality control
Describe migration of data to new formats
Describe the process of storage, backup and preservation

10.2 PROJECT PLANNING


Before starting the digitization project you need to first visualize your
project from start to end. It is also important before embarking on a digitization
project; an institution should allocate sufficient time and other resources for the
following:

• Assessing the needs of the institution, and deciding where


digitization is appropriate and where it is not
• Researching technological options
• Defining the project
• Choosing standards
• Developing requirements statements
• Making plan for implementing the project, including milestones and
a timetable
• Monitoring, evaluating, and adjusting the project as per the
requirement

It is also important to think about how your digital material will be


preserved for longevity and sustainability, predict the cost of the
project, and be knowledgeable about best practices, standards, and
overall digitization policies for your institution. Implementation of
a digitization project in several stages can provide the flexibility to
accommodate possible alternatives along the way.
Making a management policy for digital assets
The planning process should take into account the organization of a
policy for the management of digital assets. Just like an organization requires

The Digitization
124
a collection management policy, it should also have a policy for generating
and maintaining digital assets, which form new type o f valuable ' collection' .
The policy should define at least the following:
• Copyright and legal policies for staff
• The method of managing digital images after they are
created
• The method of documenting image content and technical
information
• Plans for safe conservation, storage and preservation of
surrogate images and master images to ensure their longevity
• Making plans for migration to new formats and technologies
as needed
• Making plans for digitization and documentation of new
objects
The policy should be reviewed periodically to determine whether project
plans or policies need any adjustment.

Defining the audience


The end-users, both within and outside the organization should be
identified well before the digitization process is carried out. Moreover, the
organization should ensure the participation of the users at the development
stage.
Identification of potential internal users will help carve out the
digitization strategies of the organization. The members of the organization
should help define their imaging needs and how digital images can suit their
needs. The member should identify the departments and staff for the
participation and establish institutional goals for preparing the use of digital
images.
The project leader should interview staff members, volunteers and
others who will use images, about not only immediate uses, but also future
ones. Internally, digital images can be used in many ways across an
organization. Images may connect to management systems for the illustration
of artifacts and collection records for loans, insurance and other collections
management functions. Moreover, they can also be used for documenting the
intellectual property (IP) of the organization. For publication and illustration
of newsletters, brochures and postcards, high-resolution images may be
needed. Specialty uses for high-resolution images like detailed conservation
or analysis should also be considered. In order to be accessed by the general
public, the digitized content could be uploaded on the organization website or
public access terminal. The digitized material can also be used in creating
CD-ROMs or publications.

The Digitization
125
Decisions regarding these requirements should be made well before the
start of the process. This is because the method of using images will determine
their quality and the resolution required, which will later affect both the choice
of scanning technology and overall system requirements.
Though future use determines the choice ofquality and resolution,
images digitized at the highest resolution possible serve the greatest number of
purposes.

Evaluating assets
A careful assessment is required of what images the organization currently
has, considering the following questions:
• In what formats are those images?
• What objects have already been photographed?
• How are the images stored?
• Are digitized images from a previous project available?
• What is the quality of the images?
• At what resolution have the digital images been stored?
To determine what images are held in all parts of an organization and to
know what formats these images are currently available, a survey of all of the
photographic holdings should be carried out. In a large organization, generally
all the departments have images for their own use; whereas a smaller a
organization possesses fewer existing digital or photographic resources.
Next is an assessment of the currently available images. Digitization of
already available images such as colour transparencies will cost less and
consumes more time than beginning 'from scratch'. If you have scanned
images from photographs or transparencies, it is recommended to use only
good-quality images. Retaking of the photographs of some objects may be
required in case if the images are not in a good condition or do not represent
the original object properly. The images that are good, professionally
photographed images created with a colour bar or greyscale should be ideally
digitized.

Even if the digital images exist already; you have to make sure that their
resolution is appropriate for the current needs, and whether the related
documentation is sufficient. New photography adds significantly to the time
and money required for a digitization project, especially when the objects to
be photographed require considerable time for preparation. For instance, large
objects, such as canoes, may need to be transported from storage to a suitable
place to be photographed; complex objects, such as costumes, may require a
great deal of preparation.
Appropriate information (metadata) relating to object in an image,
technical capture information, and attribution must be provided or created
simultaneously as images are produced to present for more information. The
procedures followed in the documentation require significant amounts of staff

The Digitization
126
time, but are important for the longstanding success of an imaging project and
the future management and repurposing of the digital assets created- in the
project.
The following are the other important aspects of an evaluation of assets:
• Think about the quality of documentation available for each image.
• Ensure that the institution has copyright to both the photograph and
the object.
• Survey the current software and equipment.
• The requirements for physical space (both physical space for staff
and equipment and disk storage spaces) should be considered.
• Examine the existing staff resources to help define needs.
Understanding the importance of planning:
If we are to use digitization as a tool to provide worthwhile, enduring
access to some treasured cultural and historical resources, we necessarily must
take time at the beginning to become informed, to establish guidelines, and to
proceed in rational, measured steps to assure that such reformatting of visual
matter is accomplished as well and as cost-effectively as possible.
Once the current image assets of the organization are determined, the
scope of the project must be defined.
Only few organizations systematically digitize all or very large parts of
their collections, but most organizations take a 'project' approach to
digitization. Whether the aim of the project is to digitize all or only part of the
collection, before the start of the proceeding, a plan outlining what is to be
digitized and in what order is needed should be formulated.
Digitizing projects successfully require sufficient resources, including the
following:
• Trained personnel
• Digitization technology and equipment (hardware and software)
• Adequate physical space for the process
• Funding
Consider also the following issues:
• Does the text to be digitized have enough intrinsic value to
guarantee digitization?
• What institutional or project goals (institutional process, internal
or external visibility) might be secured by digitizing?
• Will the process of digitization significantly facilitate or
increase use by an identifiable constituency?
• What are the benefits/costs of digitizing images vs digitizing an
entire collection for which there is a particular requirement?
• Does the existing product meet the identified needs?

The Digitization
127
• Are rights and permissions for electronic distribution secured or
securable?
• Does the current technology produce images of high quality to
meet the stated requirements and uses?
• Does technology permit digital capture from a photo
intermediate? Does the project required to begin 'from scratch'
with either a new photo or using the technology of digital image
capture?
• Does the institution have capability in the necessary
technology?
• Will all or part of the collection be digitized to support effectual
collection management practices or public access to collections
information?
• How will the objects to be digitized be selected?
• Will the ongoing activities or exhibit development help
determine what objects are digitized?
• Will digitization take place in-house or be outsourced?
• What quality of digitization is needed? Is the cost reasonably
priced? What compromises might be needed between cost and
quality?
• How will digital objects be categorized and stored? What
metadata (or information) about each one will be inserted?
• How will they be linked to the original object? How will digital
objects be searched for and located once they have been
created?
• How will digital assets thus created be managed on an ongoing
basis?

Preparing the project plan


As we discuses at the start of this unit that planning is very important while
making any project. As project should proceed further we should determine the
goals and requirements. Members associated with the project should think of
all the ways images can be used and reused to make the use of the material to
the greatest possible extent.
While determining the requirements, the following questions should be
considered:
• What images will be used for?
• Will digital images replace traditional photographic images?
• How will images be made available?
• What standards will be followed?
Decisions taken while planning the project affect the entire process of
digitization. For example, decisions about the resolution of scanned images or

The Digitization
128
the amount of documentation can dictate how the images themselves are used.
Because of poor initial choices of technology or documentation, the images
should be rescanned in a few years to make the project successful.
The following broadly defined tasks or phases should be part of the overall
Plan:
i. Planning
• Define the purpose, goals, scale and scope of the project.
• Survey the current images to assess the strengths of the collection.
• Evaluate the current documentation and standards used for creating
it.
• Analyze technical standards.
• Look at available equipment for inventory.
• Set priorities.
• Develop and document a plan, including the workflow strategy
• Identify the staffing needs.
• Assess the costs and implications of implementing projects in-house
vs contracting out the work.
• Secure funding.
• Select/hire/recruit and train staff to form a working group or project
team.

ii. Data preparation


• Select data documentation standards, and technical formats and
standards.
• Take care of the copyright of the material.
• Determine and record the information about copyright restrictions
and permissions, conserving/preserving and tracking the movement
of digital objects.
• Document the photographs of collection material properly, whether
they are being outsourced or digitized in-house.
• Where images exist, ensure that images and their documentation are
stored together.

iii. Image capture


• Purchase and set up equipment.
• Take high-quality photographs of objects.

The Digitization
129
• Where photographs already exist, scan the photographs of objects
(or send them to an outside source, with explicit instructions about
requirements).
• Store high-resolution images securely.
• Perform quality control and evaluation.

iv. Storage and delivery


• Store the photographs of collection materials properly.
• Store quality digital images on a server.
• Store any CDs produced securely in proper environmental
conditions.
• Link digital images to collection management database.
• Execute internal evaluation.
• Maintain and refresh data.
• Make images available to a variety of online users.
• Ensure the offsite storage of copies for security purposes.

Make a realistic timeframe for the project, realizing that the time allocated
to each stage depends upon the size of the collection, the preparation time
required, the staff available for the project and the current state of the
documentation and collections management system. It is also required to
decide whether all the matter needs to be digitized or only parts of the
collection.
Prioritizing the work
Although the long-standing goal is to digitize the entire collection yet the
project can probably be accomplished over time in accordance with financial
and staff constraints. The work to be carried out should be prioritized in
accordance with the project plan defined earlier. Usually, priority should be
given to the following:
• Images for which the copyright clearance of both the object and
the image is available.
• Iconic images much associated with the institution
• Images for which good documentation is available.
• Objects used in exhibits, current or upcoming projects
• New objects
• Images of the museum that could be developed into a
promotional publication or virtual tour

The Digitization
130
• Well-maintained collections of particular significance or special
public and/or educational appeal
• Images depicting a certain theme or following subject area
• Natural groupings in the collection
Documenting the plan
It is essential to document the plan and process. Normally, a project plan
consists of a timeline, which indicates the start and end dates for the major
activities as well as milestones or major deliverables. Staff members or
departments responsible for each activity may also be identified by the
documentation. Moreover, this documentation would be of much help in
identifying the appropriate staff in case some members leave the organization.
While devising a long-term strategy, the key plan should include periods
of assessment for determining whether strategies should be altered. A well-
defined project uses resources optimally, thereby yielding good results.
Defining the resources required
A digitization project has an impact on staffing, budget, workload,
available space and equipment. It is essential to hire or train staff with the
necessary skills (at the least, to document and manipulate images, if the work
is outsourced). In case an existing staff is trained, it is important to assess how
the ongoing workload will be affected. Consider how the digitization project
will affect the overall plans of the institution and if the institution has other
major plans that need to harness more of these resources.

Skills required
The following are the skills required in a digitization project:
Collections management/subject specialists
• Knowledge of cultural material documentation practices
• Descriptive information about objects and information about images
• Familiarity with requirements for reproducing cultural materials
• Cataloguing and properly documenting digital objects

Administration
• Project leadership
• Project management
• Supervision of production

Preparation
• Preparation of detailed instructions for digitization, whether the
work is accomplished in-house or outsourced

The Digitization
131
• Preparation of objects for digitization
• Preservation, archiving and disposal of digital objects

Systems support
• Technical expertise in the operation of digitization hardware and
software
• Experience with image scanning, processing and quality control
Reproduction services
• Performing quality reviews and upgrading the procedure of
monitoring digitization
In small organizations, the same people perform many of these tasks;
some of them may be volunteers. In other cases, many of these works may be
outsourced.
Securing preservation through proper storage
Digitization helps preserve original materials. It is unnecessary to expose
objects to handling and light very frequently. If one has a high-quality digital
image, he/she may derive other image formats from it.
However, high-resolution digital assets have preservation and storage
needs as well. Although it may still be possible to retrieve the images in the
future, it is better to plan for the preservation of the image collection.
Retrieving images in the future become costly if the software or hardware
used to store and retrieve them become old-fashioned.
Considerable storage space is required by digital images; it costs more.
This should be added to the project budget. High-resolution archival copies of
the images should be kept even if the most important aim is to add images to
the collections management system at low-resolution. A separate storage area
other than that of the working collection management is required for high-
resolution archival copies of the images. Theses archival copies should be
stored in a fixed medium such as CD-ROM, Digital Versatile Discs (DVD),
and tape backup or related device. Even though such storage mechanisms cost
much during implementation, they eventually turn out to be profitable in the
long run. There is a need of prioritized plan, with built-in review periods to
assess potential charges to technology and storage media.

Establishing responsibility
For any digitization project become successful, it should have strong
support from the management.
The capabilities of the current staff and their interest in learning new
technologies must are realistically assessed. The project leader can survey
various divisions of the station to make sure that staff members understand the
goals of the project. The managerial and departmental tasks will change as
new priorities are fixed and new obtained. Instead of forcing staff members to
take on new assignments that they had not anticipated assuming, it is much

The Digitization
132
better to stress the positive opportunities for professional development that the
digitization project makes available.
After determining the responsibilities for these tasks, it is important to
ensure members understand that all staff members understand that this
responsibility has been assigned. Proper communication among the staff
members is the key to a successful project.
Ensuring access
Technology keeps changing. Refreshing and migrating data are the two
recommended strategies for avoiding obsolescence. They are described as
follows:
• Copying of digital files from one storage device to another of the
same type is known as refreshing. It is all the same like creating
a duplicate CD-ROM. When the digital files are in a non-
proprietary format and independent of hardware and software
this method becomes viable. In order to read non-proprietary
formats hardware and software will still be required. When the
files in a proprietary format are refreshed, problems may arise as
the specifications of the file format may have changed. In these
cases, there may be difficulty in accessing the files.
• Changing or converting data into newer or non-proprietary
standard formats and then transferring onto a newer type of
storage media is known as data migration.
The above strategies protect valuable data. However they also entail some
cost involving time and equipment.

10.3 COPYRIGHT ISSUES ASSOCIATED WITH


DIGITIZING IMAGES
With latest technologies, digitized images cannot only be made available
and accessed via the Internet, but also reproduced quickly and with astonishing
clarity more than ever. Therefore, copyright protection becomes an issue.
While digitizing images the protection of copyright of artists/creators is of
paramount importance. Some of the issues which need to be considered are as
follows.
• When the underlying work is still protected by copyright:
Authorizations should be obtained from the artist or creator of
the photograph which has to be digitized. An organization should
ensure that the work being photographed and then digitized has
been licensed for reproduction prior to photographing and
digitizing an image. No authorization is required if the work at
issue is in the public domain.
• When the photograph is still protected by copyright:
Organizations should also ensure that they hold the rights to
digitize the photograph since the digitization of an existing
photograph is also a reproduction. There are two ways by which

The Digitization
133
the rights can be obtained. First, ensure that the organization
holds the copyright on the photograph through an agreement
with the photographer. Second, negate these rights when the
photograph is digitized in the later phase. If the photograph
which is being digitized falls into the public domain, then these
authorizations are no longer needed.
• When the digitized image will be modified: In the due course
of digitization, if the image is somehow modified or either
cropped or discolored then rights associated with copyright, such
as moral rights, may become an issue. Moral rights are always
held by the artist or author of the original work which is the
subject matter of the image. When the copyright of a photograph
has been assigned to another party photographers also hold moral
rights in their photographs. Moral rights can never be transferred
from one thing to another but can be waived since they run for
the length of the copyright.
In the above cases if the image is manipulated by discoloring,
cropping or modifying in any way that may prejudice the artist or
creator or photographer, the organization should be ensure that it
obtains a waiver of moral rights from the artist or creator and/or
photographer. The moral rights of the artist or creator or
photographer are no longer an issue if the work—the image or the
photograph being digitized—falls into public domain.

Rights management and protection technologies


Licensing may provide very less protection for aesthetic goods that
retain their value over a long period even if supported by
registration. For any work distributed over networks, licensing to
end-users, as in the software industry, alleviates some problems.
However, this may require a moderation of demands on the part of
owners, as well as an emphasis on educating end-users. A number
of technologies including watermarking, encryption, digital
signatures and fingerprinting have been developed and are being
the issue of protection of digital images has attracted considerable
attention.

Many organizations depend upon encryption for securing their


material, even though it cannot ensure absolute protection. These
days other mechanisms such as watermarks, signatures and
fingerprints are used to discourage misuse and copyright
infringement.

The following are some of the latest technologies which have


been developed the copyright holder's interests:
• Digital signature as proof of ownership

The Digitization
134
• Visible and invisible watermarking
• Digital fingerprint
• Various rights management systems along with the secure
container technology
• Encryption technology

10.4 DITERMINING THE COSTS OF A


DIGITIZATION PROJECT

Costing of any digitization project is depending on the purpose and


requirement of that project. It depends upon the person associated with the
digitization project whether digitizing in-house or contracting the work out.
Being realistic about of 'savings' from the digitization of images it is important
to anticipate and budget them.
Management needs to expect initial costs based on the determined in the
project planning phase. However, they should also that long-term benefits are
great. The long-term benefits include preservation of original objects,
enhanced collections management documentation, allowance for enhanced
information on museum intellectual property, increased visibility for the
institution, and so on.
The following are the various constituents of total cost for a digitization
project:
• Material or capital costs, including equipment such as hardware
and software, equipment and image manipulation software
• Documentation
• Human resources; either hiring new or training the existing staff
• Equipment costs for image capture, digital image storage and
maintenance of digital images
• Transportation and handling of objects to be photographed or
images going to an outside source (mainly for two-dimensional
objects — costs will be higher for three-dimensional objects)
• Sufficient space and facilities for equipment and any necessary
new staff
• Set-up time
• Insurance costs related to transportation
• Film processing and/or scanning
• Photography and/or treatment of current photographs
• Image manipulation, i.e., adjusting images for their intended
purpose

The Digitization
135
• Quality control
• Ongoing maintenance
Cost sharing with another institution along with pooling resources for
equipment and/ or staff costs may be helpful to consider.
The largest expense will be the subject expertise required for
documentation, locating, reviewing and assembling source material, preparing
and tracking it, and quality control rather than the actual scanning or
photography. In order to do the project in-house, the cost will be in the form of
training the current staff, hiring new staff and purchasing new equipment. The
best way to do the image manipulation is by investigating possibilities such as
hiring interns or students from a community or technical college. Salary of
each member of the team involved in the project can be estimated on hourly
basis in case of short-term projects. Imposing work on a particular staff may
lead to stress; so redistribution of task is very much essential for these types of
projects. Though a project is contracted out it may still require some staff
training to carry out the work.
Expertise staff is required for preparation time of project, transportation
of heavy objects, unbinding manuscripts, conservator checking of objects for
damage, the photographic setup if photographs suitable for scanning are not
available for all the objects.
Digitizing Images In –house v/s Contracting Out
Advantages:
In-House
i. Retain control over all aspects of imaging
ii. Some flexibility in defined requirements
iii. Learn by doing and developing in-house expertise
iv. Build production capability
v. Security of source material

Contracting Out
i. Lower labour cost.
ii. Costs of technological obsolescence are absorbed by the digital service
provider.
iii. Expertise and training of the digital service provider.
iv. Set-up cost per image, prices can be negotiated based on volume which
facilitates budget and project planning.
v. Limited risk.
vi. Variety of options and services.

Disadvantages:
In-House
i. Limits on production capabilities and facilities.
ii. Institution incurs costs of technological obsolescence.

The Digitization
136
iii. Need to set up technical infrastructure: space,
digitization equipment, and computers.
iv. Larger investment.
v. No set-up price per image.
vi. Impact on other activities.
vii. Need for trained staff, training.
viii. Institution pays for equipment, maintenance and
personnel rather than for product.
ix. Equipment support.
Contracting Out
i. Quality control not on site.
ii. Images will still need to be manipulated by museum
staff. Random samples of the images produced should be
conducted.
iii. Possible inexperience with organization needs.
iv. Transporting material—security and handling issues,
especially with 3-D objects.
v. Needs must be clearly defined in contract or there will
be communications problems.
vi. Vulnerability due to instability of digital service
providers (companies in business for over 2 years are
considered viable).

Contact specifications for the digital service provider must be


carefully defined if it is decided to contract out the project with a clean
and clear statement of the need for consistent results. A professional
photographer may be hired to work with the museum staff to work by
possibly compromising between the two approaches.

10.5 STANDARDS AND GUIDELINES TO


CONSIDER

Museums are more capable of managing their collections. They make use
of the proper database management technologies and documentation in
conjunction with digital imaging projects. The type of data along with the
digitized material determines how they can be searched, sorted and displayed.
10.5.1 Metadata
An indispensable part of any responsible digitization program is known as
metadata. For high-quality metadata standards and for various purposes a
considerable attention has been paid to the definition of metadata. In many
instances institutions will already have substantial metadata about the analog

The Digitization
137
object (for instance, catalogue records) much of which can be applied to the
digital object. The availability of accurate metadata is as important as the
digital surrogates themselves which can be helpful for accessibility, usability
and effective asset management. The cost of creating a metadata will be
reduced by building on existing metadata. While selecting material for
digitization you may wish to give priority to the material for which partial
metadata already exists.

While assessing resource requirement it is crucial to remember and to


determine the status of the existing metadata. In many libraries, archives and
museums do have a backlog of cataloguing work, and part ofthe collection
selected for digitization could fall into this category. However, in an ideal
world the existing catalogue or finding aid would be complete and up to date.
Therefore, some time is required to devote for locating the missing information
for the metadata records. Decision must be taken whether to seek information
just for those fields required for the metadata, or to update the original
catalogue record in its entirety. Considering the possibility of seeking extra
funds or devoting more resources to the process of digitization provides an
economical opportunity for institutions to expand their metadata. The new
elements which are required for the metadata record of the digital object can be
automatically generated. For example, automatic metadata creation is an
updated feature of much high-end digital camera software and of some of the
OCR systems. The efficiency and accuracy of technical metadata can be
greatly improved by developing the own system. Creating a metadata record
will usually take as long as creating the digital surrogate and if detailed
encoding schemes such as Encoded Archival Description or Text Encoding
Initiative are used, this process can be considerably long. The main problem of
the metadata tool is that it poses a problem for the efficient creation and
management of metadata for many projects. Therefore, it is likely to be a
significant element of manual work, whether this lies in adding digital objects
to existing electronic catalogues, creating records for web-based delivery such
as Dublin Core, or implementing encoded metadata schemes such as EAD.

10.5.2 Image Standards and Guidelines

In order to create digital images no published standards or guidelines are


required for determining the level of image quality. Out of the several studies
that were conducted to determine the optimum image resolution and the image
file formats, most of them indicated that higher the image quality, the greater
the longevity of the images. Image processing becomes easier by choosing a
common process and format while digitizing a collection.
These master images or archived images should be stored in an offline
mode or kept in an accessible read-only mode. These can be accessed only
sometimes to ensure their security and keep these images in the original
format. The process involved capturing the image should produce the digital
images of the highest quality and feasible in terms of resolution and good
colour depth. Master images can be used to create subsequent surrogate
images.

The Digitization
138
Low resolution formats are required for the digital images used for
visual inferences in an electronic database, such as the World Wide Web.
Without having to a repeat the image capture process many master images,
surrogate images, or working copies, can be produced for a variety of
purposes. For thumbnail access, a lower linage resolution may be required. A
substantially higher resolution image may be required 'Ex digital images of
high-quality printing. Each type of surrogate image may require afferent
image editing and enhancements.
First it must be necessary to determine the intended uses for the images
to ascertain the quality required for digital imaging. The larger or more
detailed reproductions require images of higher quality and the most common
use for digital images is to make them available over the World Wide Web as
low quality thumbnail images via a collections management system. Digital
reproduction or printing is less common, but is increasing in importance.
Detailed analysis of works of art, etc., requires substantially higher quality of
images especially images for conservation work.
Depending on the requirements previously determined such as the
resources available, the size and scope of the project are the recommended
rule of thumb to capture images in the highest quality feasible.
10.5.3 Preservation and Storage Standards and Guidelines
In the overall digitization project the preservation and storage of digital
assets must be an integral component. For continued access to digital
resources long-term provisions should be allowed.
A significant factor in determining image quality and image storage
requirement is the resolution of the images, along with the colour depth.
High-quality images such as digital master images require substantial
amounts of computer storage. Therefore, higher the quality of image selected,
the greater the storage requirements. Generally much less storage space is
required for surrogate images created from master images.
Offline or semi-online storage formats are generally used when the
master images are generally stored offline and are accessed rarely. Although
CD-ROM is a common storage device but it has a limited space for storage
digitized data. Nowadays, Digital Versatile Discs (DVD) has become quite
popular as they far exceed from that of CD-ROM's storage capacity:
Although digital tape has the drawback of relatively slow access but it
is also another format used primarily for large storage requirements. Digital
Audio Tape (DAT) and Digital Linear Tape (DLT) are the common formats
of digital tapes. This allows digital access as large-capacity jukeboxes (large
CD changers) are also available for each of these formats.
One of the devices that can be used to store digitized images is the
magnetic tape. However, it is relatively impermanent owing to its inherent
instability, which leads to chemical deterioration and physical wear from
use. Failure of optical discs may take place because of warping, corrosion or
cracking in the reflective layer, dye deterioration, or delaminating.

The Digitization
139
Cooler and dryer storage conditions will extend life expectancy as
storage conditions are also important in preserving digital images. The
conditions which are required for storing the digitized devices are
temperatures in the range of 10-20° C, and a relative humidity between 20
per cent and 50 per cent. As a security measure, a backup copy of all
masters should be generated and stored offsite.

10.5.4 Presentation Devices

Display monitors are generally relatively low-resolution devices. These


monitors and printers of the personal computers are the most common
presentation devices. Most colour printers are capable of printing at high-
resolution and colour depth and the images for printing tend to be of much
higher resolution.
10.5.5 Transmission Issues

The key factors which are responsible for the transmission of digital
images are the size of the image files and the speed of the network.
Therefore, smaller the size of the image file, faster would be the access.
Display monitors are mostly low-resolution devices and the primary reason
for transmission of images is for their display. Therefore, for both internal
and external networks such as the World Wide Web a low-resolution
surrogate image should be created for the display.

Check your progress-1


What is metadata?
What are the key factors which are responsible for the transmission of
digital images?

10.6 SELECTING THE EQUIPMENT AND


SOFTWARE

Computers for the imaging process should be selected based on the


requirements identified and efficiency to handle high-resolution images.
Some of the available computer platforms which are often used for image
capture and processing are PC, Macintosh and UNIX. Therefore, a high-end
computer is required for processing power of the digital images. The factors

The Digitization
140
which must be considered while choosing a PC equivalent imaging
workstation are as follows:
• CPU: A Pentium 400 MHz or better processor is
recommended for intensive image editing since digital images
make heavy demands on the central processing unit and
results in slow function of the system.
• RAM (Random Access Memory): A 30-MB image file
requires 90 MB of memory. Therefore, an advanced imaging
software application is normally required which is three times
the size of the image file. Moreover, more memory may be
required if additional software is used simultaneously.
• Disc storage: Auxiliary storage is also recommended using
high-density floppy drives such as Zip drives and a CD-ROM
writer. Therefore, storage requirements are at a premium
while working with large image files.
• Display monitor: Monitors should be as large as possible and
be capable of displaying 24-bit colour (16.7 million colours)
which supports a 72 Hz refresh rate, and have a video board
with sufficient memory. Hence, this is a major part of the
system for image processing and verification.
• Image software: In order to optimize images, high-end
imaging software such as Adobe PhotoShop should be used.
There are also several types of freeware and shareware
products that are available on the Web.

10.6.1 Preparing Materials for Digitization

As mentioned earlier, both time and expertise are required for large
objects to be moved from storage to the photographic set-up and some objects
(such as costumes) require installation with other objects. Hence, time and
skill are two major things which are required for imaging projects that entail
either traditional or digital photography of three-dimensional objects. While
photographing three-dimensional objects different views of the same object
maybe required. In order to avoid unnecessary delays all the equipment,
including supports and accessories, should be on hand before photographing.
The materials should be reviewed before an imaging plan is decided
upon while capturing two-dimensional objects. Historical photographs, not
photographs of objects in the collection, maybe scanned directly. Before the
image capture technique is decided medieval manuscripts may require more of
cautiousness as they are more delicate and may require expert curatorial and
conservation help.
Pre-scanning quality control is most important while achieving the
highest quality digital images. The projects which require scanning of images
already on hand will require some staffs to check the images for quality and to

The Digitization
141
ensure that the images are not blemished and the accession numbers are
correct.

10.7 WORKFLOW PROCESS

When the pilot studies were carried out for the workflow process and the
projects were interviewed the outcome was a benchmarking. These were
undertaken for a variety of reasons:
• Technical forecasting
• Technical feasibility
• Training needs
• Workflow analysis
It must be remembered that there may be no corresponding benefit and will
vary for different types of content while considering technical forecasting or
prototyping, particularly in relation to costs. As such the cost-benefit may
simply be realized by the ability of the project to pay back debt by making
small payment in regular intervals on the equipment. Few projects charge users
for the digital deliverables. A device that enables the digitization of material
that previously could not be captured, such as a 3D modeler, may not make
financial sense if a project has to be built based on in a profit or depreciation
margin. Therefore, a new high-resolution camera may pay dividends for fine
textual or line art material, but not so for colour images. The public access
benefit may outweigh the financial costs if the device makes an important
collection more widely available.
It is important to build a project design and development cycle where any
form of pilot study is undertaken.
It is important to include all the relevant costs, not just the obvious items
such as equipment and staff time if you are considering using a cost model. A
checklist of the factors should be built into a cost model. While digitizing an
image collection, for instance, one maybe well generated in a number of
different ways of digital objects—archival masters, delivery masters,
thumbnails and other deliverables—which in turn will require storage,
tracking, documentation and upkeep. Digital asset management is the area
where one must be aware of cost estimates. Significant commitment of
resources is required in a process which needs to be planned carefully.

10.8 MAINTENANCE/MANAGEMENT AND


QUALITY CONTROL

Typical imaging projects consist nearly about 50,000 images or more. With
this quantity of data, planning for managing the digital assets must become an
integral part of the overall digitization project. Digital imaging projects must

The Digitization
142
include a policy for managing the digital assets as mentioned in the planning
process.

10.8.1 Quality Control


The processes which are used to ensure digitization and metadata creation
are done properly are quality control (QC) and quality assurance (QA).
QC/QA plans should address accuracy requirements and acceptable error rates
for all aspects evaluated. They should have plans and procedures address issues
relating to the image files, the associated metadata, and the storage of both (file
transfer, data integrity, etc.). It may be appropriate to use a statistically valid
sampling procedure to inspect files and metadata for large digitization projects.
Mostly QC/QA processes are done in a two-step process: first, the scanning
technician will do initial quality checks during production and second, a second
check by another person.

Throughout all the phases of digital conversion a quality control program


should be initiated, documented and maintained. All specifications and
reporting requirements should be addressed with a quality control plan which is
associated with each phase of the conversion project.
1. Completeness
Completeness implies that the required image files are completely verified
and the associated metadata has been provided.
2. Inspection of digital image files
Inspection of digital image files refers to the overall quality of the digital
images and metadata that will be evaluated using the following procedures.
While viewing the images at a 1 to 1 pixel ratio or 100 per cent magnification
on the monitor, the visual evaluation of the images shall be conducted.
We recommend that whichever quantity is larger, should be inspected for
compliance with the digital imaging specifications, at a minimum often images
or 10 per cent of each batch of digital images.
3. Quality control of metadata
In the process of any digital imaging project, quality control of metadata
should be integrated. Since metadata is generated frequently and altered at
many points during an image's life cycle, metadata review should be a
continuous process that extends across all phases of an imaging project and
beyond. Metadata should be subject to quality control procedures similar to
those used for verifying the quality of digital images because it is critical to the
identification, discovery, management, access, preservation and use of digital
resources.

A formal review process should also be designed for metadata due to the
image quality. Issues such as who will review the metadata, the scope of the
review, and how great a tolerance is allowed for errors should be asked.
It is less likely that automated techniques will be as effective in assessing
the accuracy, completeness and utility of metadata content (depending on its

The Digitization
143
complexity), which will require some level of manual analysis. Practical
approaches to metadata review may depend on how and where the metadata is
stored and the extent of metadata recorded. Rather than machine evaluation
skilled human evaluation is required ho access metadata quality. However,
some aspects of managing metadata stored within a system can be monitored
using automated system tools.
The following areas can serve as a starting point for metadata review
although there are no clearly defined metrics for evaluating quality of the
metadata. In general, it is good practice to review metadata at the time of image
quality review.
• Adherence to standards set by institutional policy or by the
requirements of the imaging project
• Procedures for accommodating images with incomplete metadata
• Relevancy and accuracy of metadata
• Consistency in the creation of metadata and in interpretation of
metadata
• Consistency and completeness in the level at which metadata is
applied
• Evaluation of the usefulness of the metadata being collected
• Synchronization of metadata stored in more than one location
• Representation of different types of metadata
• Mechanics of the metadata review process
Specifically, we consider:
• Verifying accuracy of file identifier
• Verifying accuracy and completeness of information in image
header tags
• Verifying the correct sequence and completeness of multi page
items
• Adherence to agreed-upon conventions and terminology

4. Documentation
Quality control data should become an integral part of the image metadata
at the file or the project level such as logs, reports, and decisions should be
captured in a formal system. This data may have long-term value that could
have an impact on future preservation decisions.
5. Testing results and acceptance/rejection
If more than 1 per cent of the total number of images and associated
metadata batch are found to be defective for any of the reasons listed above,
the entire bat should be re-inspected based on the randomly selected
sampling. Any specific err should be corrected found in the random
sampling and any additional errors found the re-inspection. The specific

The Digitization
144
defective images and metadata that are found shoo be redone if less than 1
per cent of the batch is found to be defective.

10.9 MIGRATION OF DATA TO NEW FORMAT

In order to preserve the integrity of digital objects and to retain the ability
to retrieve, display and use them, the data must be transferred to new media
types and formats.
The storage media should be subjected to inspection at regular intervals
to detect any deterioration. Moreover, continuous review should be done about
the latest technology so that images which are at the risk of becoming obsolete
must be migrated to new media or format.
A digital image must be a top priority for any digital preservation
strategy for protecting the integrity. The preservation of images can be done in
which content, defined in term of structure and format, poses integrity
problems for digital archives. Planning of a migration strategy becomes
difficult, as it can be very difficult to anticipate when migration is necessary,
how much formatting is required, and how much the entire process will cost.
Data quality can be degraded by the process of migration itself and this fact has
implications for the overall integrity of the data.

10.10 STORAGE, BACKUP AND PRESERVATION

Both the master images and the surrogate images must be considered for
storage. Secure off-site storage is essential. A backup strategy including all
image formats must be put in place for all data created which includes all work
in progress during image creation and image enhancement phases.

A long-term preservation strategy for the master images should be


considered because the storage medium will deteriorate with time, depending
on the environmental conditions of the storage area. The contents should be
migrated to another storage medium of the same type or another type of storage
medium as required. Also, making it necessary to migrate to newer devices the
advancement of technology may make current storage media outdated.

Check your progress-2


What is quality Control?

The Digitization
145
10.11 SUMMARY
• In the technology domain, change and unpredictability are facts
of life, and often represent opportunities rather than disasters for
a well-planned project.
• Before an institution embarks on a digitization project, it should
allocate adequate resources of time and money. In addition,
future requirements should be taken into consideration, so that
future options are not limited by rapid technological change.
• Implementation of a digitization project in several stages can
provide the flexibility to accommodate possible alternatives
along the way.
• Establishment of a policy for the management of digital assets
should be part of the planning process. The policy should be
reviewed periodically to determine whether project plans or
policies need any adjustment.
• Prior to digitization, the targeted users of images, both inside and
outside the institution, should be determined. Also, the users
should be involved in the development of the project, if possible.
• Identification of potential internal uses will help carve out the
digitization strategies of the institution.
• Images may connect to collections management systems for the
illustration of artifacts and collection records for loans, insurance
and other collections management functions.
• Whether the aim of the project is to digitize all or only part of the
collection, before the start of the proceeding, a plan outlining
what is to be digitized and in what order is needed.
• Digitizing projects successfully require sufficient resources.
With latest technologies, digitized images can not only be made
available and accessed via the Internet, but also reproduced
quickly and with astonishing clarity more than ever.
• In order to create digital images no published standards or
guidelines is required for determining the level of image quality.
• Based on the requirements identified and adequate power to
handle high-resolution images, the choice of computers for
imaging projects should be made.
• It is important to include all the relevant costs, not just the
obvious items such as equipment and staff time if you are
considering using a cost model.
• A checklist of the factors should be built into a cost model.

The Digitization
146
• In order to preserve the integrity of digital objects and to retain
the ability to retrieve, display and use, the data must be
transferred to new media types and formats. Both the master
images and the surrogate images must be considered for storage.

10.12 KEY TERMS


• Metadata: An indispensable part of any responsible digitization
program is known as metadata.

• Digital Audio Tape (DAT): A common format of digital tapes.


Large-capacity jukeboxes (large CD changers) are also available
for each of these formats.
• Display monitor: Monitors should be as large as possible and
be capable of displaying 24-bit colour (16.7 million colours)
which supports a 72 Hz refresh rate, and have a video board
with sufficient memory. Hence, this is a major part of the
system for image processing and verification.
• Quality Control (QC): The processes which are used to
ensure digitization and metadata creation are done properly
are quality control (QC) and quality assurance (QA).

10.13 END QUESTIONS


88) Explain the importance of planning?
89) What are the issues should be taking into consideration while doing
digitization?
90) Write the brief note the process of project planning.
91) What are the skills required in a digitization project?
92) What is metadata? Explain in detail.
93) What are the factors needs in to consideration while choosing the
workstation?
94) What do you understand by quality control?
95) What are the copyright issues associated with digitizing images?
96) What can be done to achieve the highest quality of digital images?
97) Explain the process of determine the costs of a digitization project?

Answer to check your progress questions


Check your progress -1:

An indispensable part of any responsible digitization


program is known as metadata.

The Digitization
147
The key factors which are responsible for the transmission
of digital images are the size of the image files and the
speed of the network.

Check your progress -2:


The processes which are used to ensure digitization and
metadata creation are done properly are quality control
(QC) and quality assurance (QA).

10.14 FURTHER READING


1. Digital Projects Guidelines. Arizona State Library, Archives and Public
Records http://www.hb.az.us/digital/

2. The NINCH Guide to Good Practice in the Digital Representation and


Management of Cultural Heritage Materials (Version 1.1 ofthe First
Edition, published February 2003,
http://vvww.nyu.edu/its/humanities/ninchguide/)

3. RLG Tools for Digital Imaging


hap://www.r1g.org/preserv/RLGtools.html

4. SOLINET. Disaster Mitigation and Recovry Resources


http://www.solinet.net/ preservation/preservation
templ.cfm?doc_id=71 c.

The Digitization
148

You might also like