Lecture 4 24 3 2022
Lecture 4 24 3 2022
Data
Representation
Chapter Goals
• Distinguish between analog and digital
information
• Explain data compression and calculate
compression ratios
• Explain the binary formats for negative and
floating-point values
• Describe the characteristics of the ASCII and
Unicode character sets
• Perform various types of text compression
3
Chapter Goals
• Explain the nature of sound and its
representation
• Explain how RGB values define a
color
• Distinguish between raster and
vector graphics
• Explain temporal and spatial video
compression
4
Data and Computers
5
Data and Computers
Data compression
Reduction in the amount of space needed to store
a piece of data or the bandwidth to transmit it
Compression ratio
The size of the compressed data divided by the
size of the original data
A data compression technique can be
lossless, which means the data can be retrieved
without any loss of original information
lossy, which means some information may be lost in
the process of compression
6
Analog and Digital Information
7
Analog and Digital Information
Analog data
A continuous representation, analogous to the actual
information it represents
Digital data
A discrete representation, breaking the information up
into separate elements
8
Analog and Digital Information
A mercury Vinyl
thermometer Record
is an analog
device
9
Analog and Digital Information
Digitize
Breaking data into pieces and representing those
pieces separately
10
Electronic Signals
11
Electronic Signals (Cont’d)
14
Binary Representations
Why?
17
Representing Negative Values
Problem: Two zeroes (positive and negative)
19
Representing Negative Values
48 (signed-
magnitude)
-1
47
20
Representing Negative Values
Try these:
4 4 -
1
-3 - (- 3) -
2
21
Representing Negative Values
Two’s Complement
23
Representing Negative Values
-127 10000001
+ 1 00000001
-126 10000010
24
Number Overflow
If each value is stored using 8 bits, then 127 + 3
overflows:
Overflow error
01111111
+
00000011
10000010
Apparently, 127 + 3 is -126. Remember when we said
we would always fail in our attempt to map an infinite
world onto a finite machine?
25
Representing Real Numbers
Real numbers are numbers with a whole part and a
fractional part (either of which may be zero)
104.32
0.999999
357.0
3.14159
26
Representing Real Numbers
Same rules apply in binary as in decimal
1 (halves position),
2 (quarters position),
3 (eighths position)
…
Instead of 1/10, 1/100, ….. in base 10
27
Representing Real Numbers
A real value in base 10 can be defined by the
following formula where the mantissa is an
integer
Fundamentally, the
floating-point used by
computers is very similar,
but uses complicated
tricks to represent more
numbers and improve
efficiency
29
Representing Real Numbers
Scientific notation
A form of floating-point representation in which the
decimal point is kept to the right of the leftmost
digit
30
Representing Text
What must be provided to represent text?
Character set
A list of characters and the codes used to
represent each one
33
The ASCII Character Set
34
The Unicode Character Set
Extended ASCII is not enough for
international use
One Unicode mapping uses 16 bits per
character
How many characters can this mapping
represent?
The first 256 characters correspond exactly
to the extended ASCII character set
35
The Unicode Character Set
36
Text Compression
If storage or bandwidth is scarce, how can we store
and transmit data more efficiently?
37
Keyword Encoding
38
Keyword Encoding
Given the following paragraph,
We hold these truths to be self-evident, that all men
are created equal, that they are endowed by their
Creator with certain unalienable Rights, that among
these are Life, Liberty and the pursuit of Happiness.
— That to secure these rights, Governments are
instituted among Men, deriving their just powers from
the consent of the governed, — That whenever any
Form of Government becomes destructive of these
ends, it is the Right of the People to alter or to abolish
it, and to institute new Government, laying its
foundation on such principles and organizing its
powers in such form, as to them shall seem most
likely to effect their Safety and Happiness.
39
Keyword Encoding
The encoded paragraph is
We hold # truths to be self-evident, $ all men are
created equal, $ ~y are endowed by ~ir Creator with
certain unalienable Rights, $ among # are Life,
Liberty + ~ pursuit of Happiness. — $ to secure #
rights, Governments are instituted among Men,
deriving ~ir just powers from ~ consent of ~ governed,
— $ whenever any Form of Government becomes
destructive of # ends, it is ~ Right of ~ People to alter
or to abolish it, + to institute new Government, laying
its foundation on such principles + organizing its
powers in such form, ^ to ~m shall seem most likely to
effect ~ir Safety + Happiness.
40
Keyword Encoding
What did we save?
Original paragraph
656 characters
Encoded paragraph
596 characters
Characters saved
60 characters
Compression ratio
596/65
6 = 0.9085
41
Could we use this
Run-Length Encoding
In some types of data files, a single value
may be repeated over and over again in a
long sequence
Replace a repeated sequence with
– a flag
– the repeated value
– the number of repetitions
*n8
– * is the flag
– n is the repeated value
– 8 is the number of times n is repeated
42
Run-Length Encoding
Original text
bbbbbbbbjjjkllqqqqqq+++++
Encoded text
*b8jjjkll*q6*+5 (Why
isn't J encoded? L?)
The compression ratio is
15/25 or .6
Encoded text
*x4*p4l*k7
Original text
xxx
xpppplkkk
43 kkkk
Huffman Encoding
The characters ‘X’ and ‘z’ occur much less
frequently than ‘e’ and the space character in
most text.
44
Huffman Encoding
“ballboard” would be
10100010
01001010
11000111
1011xxxx
compression ratio
4 bytes / 18 bytes = 0.222
assuming 16-bit Unicode
Try “roadbed”
Note: only the part of the code needed to encode “ballboard” and “roadbed” is
shown. In the full code, every character would have an encoding, and the most
common characters would have the shortest encodings.
45
Huffman Encoding
To decode
Look for match left to right, bit by bit
Record letter when a match is found
Begin where you left off, going left to right
46
Huffman Encoding
Try it!
Decode
1011111001010
47
Huffman Encoding
Technique for determining codes
guarantees the prefix property of the codes
48
Representing Audio Information
49
Representing Audio Information
A stereo sends an electrical signal to each
speaker, which then vibrates to produce sound.
Your MP3 player and ear buds do the same thing.
50
Representing Audio Information
Some data
is lost, but a
reasonable
sound is
reproduced
52
Representing Audio Information
• CDs store audio (or other)
information digitally
– Pits (reflect poorly)
– Lands (reflect well)
• Read by low intensity
laser
• Receptor converts
reflections into binary digits
• Bit string represents audio
signal
53
Audio Formats
Audio Formats
– WAV, AU, AIFF, VQF, and MP3
– Use various compression techniques
MP3 is dominant
– MPEG-2, audio layer 3 file
– MPEG = Motion Picture Experts Group
– Based on studies of interrelation between ear and brain,
discards frequency information that isn’t perceived by
humans (science!)
– Additional compression by a form of Huffman encoding
54
Representing Images and Graphics
Color
• We take it for granted, but what is it really?
55
Representing Images and Graphics
Color is expressed as an RGB (red-green-
blue) value – three numbers that indicate the
relative contribution of each of these three
primary colors
A few TrueColor
RGB values and
the colors they
represent
58
Representing Images and Graphics
A color palette is a set
of colors, for example
• Colors supported by a
monitor
• Web-safe colors for
use with Internet
browsers
• Colors from which
user can choose
• Colors used in an
image
59
Digitized Images and Graphics
• Pixels (picture elements)
– Dots of color in image (or display device)
• Resolution
– Number of pixels in image (or device)
• Raster Graphics
– Treat image as collection of pixels
– Most common formats: BMP, GIF, PNG, and JPEG
• Vector Graphics
– Treat image as collection of geometric objects
– Most important formats: Flash and SVG
60
Digitized Images and Graphics
• BMP (bitmap)
– TrueColor color depth, or less to reduce file size
– Well suited for compression by run-length encoding
61
Digitized Images and Graphics
• JPEG (Joint Photographic Experts Group)
– Averages hues over short distances
• Why? Human vision tends to blur colors together within
small areas (science!)
• How? Transform from the spatial domain to the frequency
domain, then discard high frequency components (math!)
• Sound familiar? Essentially the same idea used in MP3
– Adjustable degree of compression
Whole
picture
67
Representing Video
Temporal compression
A technique based on differences between
consecutive frames: If most of an image in two
frames has not changed, why should we waste
space duplicating information?
Spatial compression
A technique based on removing repetitive
information within a frame: This problem is
essentially the same as that faced when
compressing still images
68
Thanks
Queries