0% found this document useful (0 votes)
14 views

Chapter 3 Data Formats

Uploaded by

liyana saidin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Chapter 3 Data Formats

Uploaded by

liyana saidin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 56

CHAPTER 3:

Data Formats
The Architecture of Computer Hardware
and Systems Software:
An Information Technology Approach
Contents
1. Alphanumeric Data
2. Image Data
3. Audio Data
4. Data Compression
5. Internal Computer Data Format

Chapter 3 Data Formats 3-2


Introduction
 In Chapter 2 – all data are stored in binary form
 However, as human beings  normally do not use/work in
binary form
 Our
 Communication made up of
 Language / Images / Sounds
 Written communication
 Alphanumeric characters and symbols
 Others
 We communicate with : photograph @ chart @ diagram @some images
(still frames @ moving)
 In this chapter, we consider what it takes to get different
types of data into computer-usable form and the
different ways in which the data may be represented,
stored and processed
Chapter 3 Data Formats 3-3
 Original data : character / image / sound
& other form is brought by different
input device initially into the computer
 Later, converted into an appropriate
computer representation to be
processed, stored and used within the
computer system

Chapter 3 Data Formats 3-4


Sources of Data
 Binary input
 Begins as discrete input
 Example: keyboard input such as A 1+2=3 math
 Keyboard generates a binary number code for each key
 Analog
 Continuous data such as sound or images
 Requires hardware to convert data into binary numbers

Figure 3.1 with this


color scheme
Computer
A 1+2=3 math Input
device
1101000101010101…

Chapter 3 Data Formats 3-5


different input devices …
 Keyboard
 Straightforward
 Generate a binary number code for each key
 Video camera @ microphone
 Analog data
 Data continuously changing with time
 Require h’ware designed to convert sounds to binary
numbers
 Might use special s’ware
 Image
 Number of colors represented by each data point
 Number of horizontal and vertical data points

Chapter 3 Data Formats 3-6


Common Data Representations
Type of Data Standard(s)
Alphanumeric Unicode, ASCII, EDCDIC
Image (bitmapped) GIF (graphical image format)
TIFF (tagged image file format)
PNG (portable network graphics)
Image (object) PostScript, JPEG, SWF (Macromedia
Flash), SVG
Outline graphics and fonts PostScript, TrueType
Sound WAV, AVI, MP3, MIDI, WMA
Page description PDF (Adobe Portable Document
Format), HTML, XML
Video Quicktime, MPEG-2, RealVideo, WMV
Chapter 3 Data Formats 3-7
Alphanumeric Character Data
 Much of the data that will be used in a
computer are originally provided by
human-readable form
 In a from of letters of the alphabets,
numbers and punctuation
 These data are entered as characters,
number digits and punctuation are
known as alphanumeric data !!!

Chapter 3 Data Formats 3-8


Data Types: Alphanumeric
 Alphanumeric:
 Characters: b T
 Number digits: 7 9
 Punctuation marks: ! ;
 Special-purpose characters: $ &
 Numeric characters vs. numbers
 Numbers are often processed differently from text
 Numbers may consist of more than a single digit
 The conversion between character and number is not
“automatic” within the computer
 As character @ number  based on the data itself
 Remain as character – phone number
 Variables in C++ – declaration (accept numerical characters)

Chapter 3 Data Formats 3-9


 Alphanumeric data are stored in computer in
binary form  each character must be
translated to a corresponding binary code
representation as it enter the computer
 The choice of code used is ARBITRARY
(random)
 Computer does not “recognize” letters – but
only binary numbers; it does not matter to
the computer what code is selected
 What does matter is CONSISTENCY!

Chapter 3 Data Formats 3-10


 OUTPUT  including numbers, also
exits the computer in alphanumeric
form, either through printed output or as
output on a video screen
 Therefore, the output device must
perform the same conversion in reverse
 It is obviously important that the input
device and the output device
recognize the same code!

Chapter 3 Data Formats 3-11


Representing Characters
 There are 3 alphanumeric codes are in
common use:
 ASCII
 most widely used coding scheme
 codes can be stored in a byte, e.g: “G” = 4716
 defined as a 7-bit code = 128 entries in ASCII table
 EBCDIC
 IBM mainframe (legacy)
 codes can be stored in a byte, e.g: “G” = 4716
 Defined as a 8-bit code = 256 entries in EBCDIC table
 Unicode: developed for worldwide use
Chapter 3 Data Formats 3-12
ASCII (“as-key”)
 Developed by ANSI (American National
Standards Institute)
 Represents
 Latin alphabet, Arabic numerals, standard
punctuation characters
 Plus small set of accents and other
European special characters
 ASCII
 7-bit code: 128 characters

Chapter 3 Data Formats 3-13


ASCII Reference Table
MSD
LSD 0 1 2 3 4 5 6 7
0 NUL DLE SP 0 @ P p
1 SOH DC1 ! 1 A Q a W
2 STX DC2 “ 2 B R b r
3 ETX DC3 # 3 C S c s
4 EOT DC4 $ 4 D T d t
5 ENQ NAK % 5 E U e u
7416
6 ACJ SYN & 6 F V f v
7 BEL ETB ‘ 7 G W g w
111 0100
8 BS CAN ( 8 H X h x
9 HT EM ) 9 I Y i y
A LF SUB * : J Z j z
B VT ESC + ; K [ k {

C FF FS , < L \ l |
D CR GS - = M ] m }
E SO RS . > N ^ n ~
F SI US / ? O _ o DEL

Chapter 3 Data Formats 3-14


EBCDIC (“ebb-see-dick”)
 Extended Binary Coded Decimal Interchange
Code developed by IBM
 Restricted mainly to IBM or IBM compatible
mainframes
 Conversion software to/from ASCII available
 Common in archival data
 Character codes differ from ASCII
ASCII EBCDIC
Space 2016 4016
A 4116 C116
b 6216 8216
Chapter 3 Data Formats 3-15
EBCDIC (“ebb-see-dick”)

Chapter 3 Data Formats 3-16


Unicode
 Most common 16-bit form represents 65,536
characters
 ASCII Latin-I subset of Unicode
 Values 0 to 255 in Unicode table
 Multilingual: defines codes for
 Nearly every character-based alphabet
 Large set of ideographs for Chinese, Japanese
and Korean
 Composite characters for vowels and syllabic
clusters required by some languages
 Allows software modifications for local-
languages
Chapter 3 Data Formats 3-17
Unicode

Chapter 3 Data Formats 3-18


Collating Sequence
 Both ASCII and EBCDIC are designed
so that the order of the letters is such
that a simple numerical sort on the
codes can be used within the computer
to perform alphabetization
 The order of the codes in the
representation table is known as its
collating sequence (pull together)

Chapter 3 Data Formats 3-19


Collating Sequence
 Alphabetic sorting if software handles mixed
upper- and lowercase codes
 In ASCII, numbers collate first; in EBCDIC,
last
 ASCII collating sequence for string of
characters
Letters Numeric Characters
Adam A d a m 1 011 0001
Adamian A d a m i a n 12 011 0001 011 0010
Adams A d a m s 2 011 0010

Chapter 3 Data Formats 3-20


2 Classes of Codes
 Printing characters
 Produced on the screen or printer
 Control characters
 Control position of output on screen or printer
 VT: vertical tab  LF: Line feed
 Cause action to occur
 BEL: bell rings  DEL: delete current character
 Communicate status between computer and I/O
device
ESC: provides extensions by changing the meaning of a
specified number of contiguous following characters

Chapter 3 Data Formats 3-21


Control Code Definitions

Chapter 3 Data Formats 3-22


Keyboard Input
 There are 2 different scan codes for every key on
the keyboard
 Scan code
 Two different scan codes on keyboard
 One generated when key is struck and another when key is
released
 Converted to Unicode, ASCII or EBCDIC by software in
terminal or PC
 Advantage
 Easily adapted to different languages or keyboard layout
 Separate scan codes for key press/release for multiple key
combinations
 Examples: shift and control keys

Chapter 3 Data Formats 3-23


Other Alphanumeric Input
 OCR (optical character reader)
 Scans text and inputs it as character data
 Used to read specially encoded characters
 Example: magnetically printed check numbers
 General use limited by high error rate
 Bar Code Readers
 Used in applications that require fast, accurate and repetitive input
with minimal employee training
 Examples: supermarket checkout counters and inventory control
 Alphanumeric data in bar code read optically using wand
 Magnetic stripe reader: alphanumeric data from credit cards
 Voice
 Digitized audio recording common but conversion to alphanumeric
data difficult
 Requires knowledge of sound patterns in a language (phonemes) plus
rules for pronunciation, grammar, and syntax

Chapter 3 Data Formats 3-24


Optical character reader

Bar code reader Bar code


Chapter 3 Data Formats 3-25
Magnetic stripe reader

Chapter 3 Data Formats 3-26


Image Data
 Alphanumeric data  traditional medium of
business
 Latest  images is important in business
computing environment … such as
 Photographs, figures, icons, drawings, charts and
graphs
 Images : different shapes, sizes, textures,
colors and shadings

Chapter 3 Data Formats 3-27


 Images used within the computer fall into 2
distinct categories:
 Bitmap or raster images of photos and paintings
with continuous variation (GIF,JPEG)
 Object or vector images composed of graphical
objects like lines and curves defined geometrically
(graph in spreadsheet)
 Differences include:
 Quality of the image
 Storage space required
 Time to transmit
 Ease of modification

Chapter 3 Data Formats 3-28


Bitmap Images
 Bitmap images are made up of pixels –
representing individual points on the
image
 Storage of bitmap image – need large
amount of memory
 Reducing the pixel size improves the
resolutions!

Chapter 3 Data Formats 3-29


Chapter 3 Data Formats 3-30
Bitmap Images
 Used for realistic images with continuous variations in
shading, color, shape and texture
 Examples:
 Scanned photos
 Clip art generated by a paint program
 Preferred when image contains large amount of detail
and processing requirements are fairly simple
 Input devices:
 Scanners
 Digital cameras and video capture devices
 Graphical input devices like mice and pens
 Managed by photo editing software or paint software
 Editing tools to make tedious bit by bit process easier

Chapter 3 Data Formats 3-31


Bitmap Images
 Each individual pixel (pi(x)cture element) in a
graphic stored as a binary number
 Pixel: A small area with associated coordinate
location
 Example: each point below represented by a 4-bit
code corresponding to 1 of 16 shades of gray

Chapter 3 Data Formats 3-32


Bitmap Display
 Monochrome: black or white
 1 bit per pixel
 Gray scale: black, white or 254 shades
of gray
 1 byte per pixel
 Color graphics: 16 colors, 256 colors,
or 24-bit true color (16.7 million colors)
 4, 8, and 24 bits respectively

Chapter 3 Data Formats 3-33


Storing Bitmap Images
 Frequently large files
 Example: 600 rows of 800 pixels with 1 byte for
each of 3 colors ~1.5MB file
 File size affected by
 Resolution (the number of pixels per inch)
 Amount of detail affecting clarity and sharpness of an
image
 Levels: number of bits for displaying shades of
gray or multiple colors
 Palette: color translation table that uses a code for each
pixel rather than actual color value
 Data compression

Chapter 3 Data Formats 3-34


GIF (Graphics Interchange Format)

Chapter 3 Data Formats 3-35


GIF (Graphics Interchange Format)
 First developed by CompuServe in 1987
 GIF89a enabled animated images
 allows images to be displayed sequentially at fixed
time sequences
 Color limitation: 256
 Image compressed by LZW (Lempel-Zif-
Welch) algorithm
 Preferred for line drawings, clip art and
pictures with large blocks of solid color
 Lossless compression

Chapter 3 Data Formats 3-36


JPEG
(Joint Photographers Expert Group)
 Allows more than 16 million colors
 Suitable for highly detailed photographs
and paintings
 Employs lossy compression algorithm
that
 Discards data to decreases file size and
transmission speed
 May reduce image resolution, tends to
distort sharp lines

Chapter 3 Data Formats 3-37


Other Bitmap Formats
 TIFF (Tagged Image File Format): .tif (pronounced tif)
 Used in high-quality image processing, particularly in
publishing
 BMP (BitMaPped): .bmp (pronounced dot bmp)
 Device-independent format for Microsoft Windows
environment: pixel colors stored independent of output device
 PCX: .pcx (pronounced dot p c x)
 Windows Paintbrush software
 PNG: (Portable Network Graphics): .png (pronounced
ping)
 Designed to replace GIF and JPEG for Internet applications
 Patent-free
 Improved lossless compression
 No animation support

Chapter 3 Data Formats 3-38


Object Images

Chapter 3 Data Formats 3-39


Object Images
 Created by drawing packages or output from
spreadsheet data graphs
 Composed of lines and shapes in various
colors
 Computer translates geometric formulas to
create the graphic
 Storage space depends on image complexity
 number of instructions to create lines, shapes, fill
patterns
 Movies Shrek and Toy Story use object
images

Chapter 3 Data Formats 3-40


Object Images
 Based on mathematical formulas
 Easy to move, scale and rotate without losing
shape and identity as bitmap images may
 Require less storage space than bitmap
images
 Cannot represent photos or paintings
 Cannot be displayed or printed directly
 Must be converted to bitmap since output
devices except plotters are bitmap

Chapter 3 Data Formats 3-41


Popular Object Graphics Software
 Most object image formats are proprietary
 Files extensions include .wmf, .dxf, .mgx, and .cgm
 Macromedia Flash: low-bandwidth animation
 Micrographx Designer: technical drawings to illustrate
products
 CorelDraw: vector illustration, layout, bitmap creation,
image-editing, painting and animation software
 Autodesk AutoCAD: for architects, engineers,
drafters, and design-related professionals
 W3C SVG (Scalable Vector Graphics) based on XML
Web description language
 Not proprietary

Chapter 3 Data Formats 3-42


PostScript
 Page description language: list of
procedures and statements that
describe each of the objects to be
printed on a page
 Stored in ASCII or Unicode text file
 Interpreter program in computer or output
device reads PostScript to generate image
 Scalable font support
 Font outline objects specified like other
objects
Chapter 3 Data Formats 3-43
PostScript

Chapter 3 Data Formats 3-44


Bitmap vs. Object Images
Bitmap (Raster) Object (Vector)

Pixel map Geometrically defined shapes

Photographic quality Complex drawings

Paint software Drawing software

Larger storage requirements Higher computational requirements

Enlarging images produces jagged Objects scale smoothly


edges
Resolution of output limited by Resolution of output limited by
resolution of image output device

Chapter 3 Data Formats 3-45


Video Images
 Require massive amount of data
 Video camera producing full screen 640 x 480 pixel true
color image at 30 frames/sec 27.65 MB of data/sec
 1-minute film clip 1.6 GB storage
 Options for reducing file size: decrease size of image,
limit number of colors, reduce frame rate
 Method depends on how video delivered to users
 Streaming video: video displayed as it is downloaded from
the Web server
 Example: video conferencing
 Local data (file on DVD or downloaded onto system) for
higher quality
 MPEG-2 & MPEG-4: movie quality images with high
compression require substantial processing capability

Chapter 3 Data Formats 3-46


Audio Data
 Sound is normally digitized from an audio source,
such as microphone or amplifier
 Since the original sound wave is analog in nature, it
is necessary to convert it to digital form for use in the
computer
 The technique used is…
 The analog waveform is sampled electronically at regular
time intervals
 Each time a sample is taken, the amplitude of the sample is
measured by an electronic circuit that converts the analog
value to a binary equivalent
 The circuit that performs this function is known as an A-to-D
converter

Chapter 3 Data Formats 3-47


Waveform Audio

Sampling rate
normally 50KHz

Chapter 3 Data Formats 3-48


Sampling Rate
 Number of times per second that sound is
measured during the recording process.
 1000 samples per second = 1 KHz (kilohertz)
 Example: Audio CD sampling rate = 44.1KHz
 Height of each sample saved as:
 8-bit number for radio-quality recordings
 16-bit number for high-fidelity recordings
 2 x 16-bits for stereo

Chapter 3 Data Formats 3-49


MIDI (Musical Instrument Digital
Interface)
 Music notation system that allows computers
to communicate with music synthesizers
 Instructions that MIDI instruments and MIDI
sound cards use to recreate or synthesize
sounds.
 Do not store or recreate speaking or singing
voices
 More compact than waveform
 3 minutes = 10 KB

Chapter 3 Data Formats 3-50


Audio Formats
 MP3
 Derivative of MPEG-2 (ISO Moving Picture
Experts Group)
 Uses psychoacoustic compression techniques to
reduce storage requirements
 Discards sounds outside human hearing range:
lossy compression
 WAV
 Developed by Microsoft as part of its multimedia
specification
 General-purpose format for storing and
reproducing small snippets of sound

Chapter 3 Data Formats 3-51


Data Compression
 The volume of multimedia data, particularly video, but
also sound and even high resolution still images,
often makes it impossible or impractical to store,
transmit and manipulate the data in its normal form
 Therefore, it is necessary to compress the data
 There are many different data compression
algorithms, but it only fall into 2 categories
 Lossless
 Compresses data in such a way that the application of matching
inverse algorithm restores the data to its original form.
 Lossy
 Assumes that the user can accept certain amount data
degradation as a trade-off to save storage space.

Chapter 3 Data Formats 3-52


Compression Algorithms
 Repetition
 0587000034000 01587043403
 Example: large blocks of the same color
 Pattern Substitution
 Scans data for patterns
 Substitutes new pattern,
makes dictionary entry  Pe  pi  ed
 Example: 45 to 30 bytes  er  ck  pe
plus dictionary  Pi
 Peter Piper picked a peck of pickled peppers.
  t   p    a   of  l   pp  s.

Chapter 3 Data Formats 3-53


Internal Computer Data Format
 All data stored as binary numbers
 Interpreted based on
 Operations computer can perform
 Data types supported by programming
language used to create application

Chapter 3 Data Formats 3-54


5 Simple Data Types
 Boolean: 2-valued variables or constants with values
of true or false
 Char: Variable or constant that holds alphanumeric
character
 Enumerated (constant)
 User-defined data types with possible values listed in
definition
 Type DayOfWeek = Mon, Tues, Wed, Thurs, Fri, Sat, Sun
 Integer: positive or negative whole numbers
 Real
 Numbers with a decimal point
 Numbers whose magnitude, large or small, exceeds
computer’s capability to store as an integer

Chapter 3 Data Formats 3-55


That’s all for today.
Thank you.

You might also like