100% found this document useful (1 vote)

457 views

Mpeg 2VideoCompression

Interlaced scanning was developed in 1930's as a bandwidth saving technique. Advantages: - High vertical detail retained for still portions of the scene - Reduced vertical detail for moving areas - flicker at edges of objects (e.g.

Uploaded by

Sreejith Sl

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

457 views

Mpeg 2VideoCompression

Uploaded by

Sreejith Sl

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 138

MPEG-2 Video

Compression
November 29, 1999

Michael Isnardi
Sarnoff Corporation
e-mail: [email protected]

Reproduction in any form requires written permission from the Sarnoff Corporation.

© 1995-99
1 Sarnoff Corporation
MPEG Video Outline
• Introduction
– Video Basics
– Human Vision Basics
– Colorimetry Basics
– Video Compression Basics
• MPEG-1 Video
• MPEG-2 Video
• Rate Control, VBV, Stat Mux
• Practicing the Art of MPEG
• ATSC Video Constraints and Extensions

a single scan line

Video Video Cable

Video
Camera
Camera

Video Monitor
Voltage wall wall
(proportional
to brightness) forehead
waveform of scan line shown
hair hair

Time
sync and blanking active video

Active
Video

525 625
Horizontal lines
lines
Blanking (PAL-
(NTSC)
Europe)

Vertical Blanking

y Scan lines viewed edge-on

Active
Video

Note: All scan lines

are sampled at
each time instant.

Vertical Blanking
time

x
Frame Period

y Scan lines viewed edge-on

Active
Video

Note: Alternate scan

lines are sampled
at each time instant.

Vertical Blanking
time
Frame Period
x
Field Period

Nominal Frame Rates: 30 Hz (NTSC), 25 Hz (PAL-Europe)

Active
“601” Video 240 lines
(NTSC)
or
720 pixels 288 lines
(PAL)
Active
Video
Progressive Raster
(30 frames/sec NTSC,
480 lines 25 frames/sec PAL)
(NTSC)
or
576 lines
“CIF”
(PAL) (Common Intermediate Format)
360 pixels

Active
Video

288 lines
Interlaced Raster
(30 frames/sec NTSC,
25 frames/sec PAL)
Progressive Raster
Luminance values shown. (30 frames/sec)
© 1995-99
7 Sarnoff Corporation
Why Interlace?
• Background
– In 1930’s, interlaced scanning was developed as a bandwidth
saving technique.
– Persistence of vision causes two fields to fuse into single
image, without flicker.
– All broadcasting today uses interlaced scanning.

• Advantages:
– High vertical detail retained for still portions of the scene.

• Drawbacks:
– Reduced vertical detail for moving areas
– Flicker at edges of objects (e.g., text), which is why computer
industry uses progressive scanning for monitors.
– More complicated signal processing for resizing, frame rate
conversion, etc.

© 1995-99
8 Sarnoff Corporation
Human Vision Basics
• Human Visual System (HVS) has
limitations that can be exploited for
video system design:
– limited response to black-and-white detail
– even more limited response to color detail
– image motion appears fluid at rates above 24 Hz
– foveal flicker not annoying at picture rates above 24 Hz
– limited ability to track rapidly moving objects
– insensitivity to “noise”
• at object edges
• in highly detailed areas of a scene
• in bright areas of a scene
• immediately after scene changes
© 1995-99
9 Sarnoff Corporation
Colorimetry Basics
Color Video Monitor

R’ RGB Y YC1C2 R’
Color RGB YC1C2 G’
ColorVideo
Video G’ to C1 to
Camera B’ to C2 to B’
Camera YC1C2 RGB
YC1C2 RGB

gamma-corrected transmission
signals channel(s)

• In broadcast and studio applications, the gamma-corrected RGB “taking”

primaries are transformed to YC1C2 “transmission” primaries.

• Y is the luminance (luma) component; C1 and C2 are the chrominance

(chroma, or color difference) components.

• To exploit the HVS’ reduced spatial response to chroma, C1 and C2 are

further bandlimited in spatial frequency compared to Y.

• The exact transformation matrix is system-dependent.

© 1995-99
10 Sarnoff Corporation
CCIR Rec. 601 Transformation
Y 0.30 0.59 0.11 R’
Cr = 0.50 -0.42 -0.08 G’
Cb -0.17 -0.33 0.50 B’
601:
Cr 0.00 0.71 B’-Y
=
Cb 0.56 0.00 R’-Y
R’-Y,
Cr

• In 8-bit implementations,
– Y occupies 220 levels: [16, 235] B’-Y,
Cb
– Cr, Cb occupy 225 levels: [16, 240]

© 1995-99
11 Sarnoff Corporation
Video
Compression
Basics
© 1995-99
12 Sarnoff Corporation
What is Video Compression?
...Orange Juice Analogy...

Concentrate:
Shipped, Stored
and Sold
OJ

Tastes Like
H2O H2O Fresh-
Water is the redundant element. Squeezed!

In video compression, the encoder removes spatial and

Fresh- temporal redundancy; the decoder puts it back in.
Squeezed!

© 1995-99
13 Sarnoff Corporation
Video Compression Techniques
• Remove spatial and temporal redundancy
that exist in natural video imagery
– correlation itself can be removed in a lossless fashion
– important for medical applications
– only realizes about 2:1 compression efficiency

• Exploit limitations in Human Visual

System
– limited luminance and very limited color response
– reduced sensitivity to noise in high frequencies (e.g., edges of
objects)
– reduced sensitivity to noise in brighter areas
– goal is to throw away bits in a psychovisually lossless manner
– can realize 50:1 or more compression efficiency
© 1995-99
14 Sarnoff Corporation
Major Image and Video
Compression Technologies

• DCT Based Int’l Standards, Economy of Scale

– Motion JPEG Studio Applications
– H.261 Videoconferencing
– MPEG-1 CD-ROM Multimedia
– MPEG-2 DTV Broadcast, DVD

• Subband/Wavelet
– EZW VLBR and browsing applications

• Other
– DVI/Indeo Multimedia
– Fractal Multimedia
– DPCM Broadcast
– Lossless (e.g., special JPEG mode) Medical

© 1995-99
15 Sarnoff Corporation
Evolution of Video
Compression Standards
• JPEG (Joint Photographics Experts Group)
- mostly used for coding still images
- introduced DCT and Quantization as part of "Tool Kit"
- "Motion JPEG" is intra frame only, low compression, and
low delay
• H.261 (px64)
- used for video teleconferencing
- px64 kbps (p=1, ..., 32)
- introduced motion compensated DCT (I and P frames)
- medium compression, low delay
• MPEG-1, MPEG-2
- used for digital storage media and broadcast
- 1-15+ Mbps
- introduced concept of B frames and field modes
- high compression, medium delay

© 1995-99
16 Sarnoff Corporation
Coding Efficiency
• How does one compare the efficiency of
various video compression methods?
• For example, the following video
encoders all have the same quality.
Which has the best coding efficiency?
Which one has the worst?
Parameter Coder 1 Coder 2 Coder 3 Coder 4
Image Size (HxV) 720x480 544x480 480x480 1920x1080
Bit Rate (R) 6 Mbps 4 Mbps 6 Mbps 19 Mbps
Frame Rate (F) 29.97 fps 30 fps 24 fps 29.97 fps
Chroma Format 4:2:0 4:2:2 4:4:4 4:2:0

© 1995-99
17 Sarnoff Corporation
Normalized Bit Rate
• A meaningful comparative metric is the
normalized bit rate, in units of bits/color
pixel.

C´R
Normalized Bit Rate = bits/color pixel
H´V´F

where C = Chroma Format Factor

(C = 1/3 for 4:4:4, 1/2 for 4:2:2, 2/3 for 4:2:0)
R = Bit Rate (bits/second)
F = Frame Rate (frames/second)
H, V = Horizontal and Vertical Size (luma pixels/frame)

• Now let’s compare the four coders using

Normalized Bit Rate:
Parameter Coder 1 Coder 2 Coder 3 Coder 4
Image Size (HxV) 720x480 544x480 480x480 1920x1080
Bit Rate (R) 6 Mbps 4 Mbps 6 Mbps 19 Mbps
Frame Rate (F) 29.97 fps 30 fps 24 fps 29.97 fps
Chroma Format 4:2:0 4:2:2 4:4:4 4:2:0
Norm. Bit Rate 0.39 0.26 0.36 0.20

This coder has the worst coding This coder has the best coding
efficiency. It uses the most bits/pixel. efficiency. It uses the fewest bits/pixel.

© 1995-99
20 Sarnoff Corporation
What is MPEG Video?
• MPEG = Moving Picture Experts Group
• Part of the International Standards Organization (ISO)
• Aim was to create the best video compression standards
for multimedia and broadcast applications
• MPEG-1 Video aimed at SIF resolution
– 352x240, 30 Hz, non-interlaced, 1.5 Mb/s
– CD-ROM applications

• MPEG-2 Video aimed at CCIR-601 resolution

– 720x480, 30 Hz, interlaced, 4-10 Mb/s
– broadcast applications, including HDTV

• MPEG-1 and MPEG-2 are International Standards

• Standard optimized at “NTSC quality” CCIR-601 video @ 10 Mbps

39 algorithms competed in subjective tests, some very different from MPEG-1.

• Large attendance, typically 175-200 participants.

More than 75 organizations, including representatives of CE, telco, computer,
broadcasting and universities.

• Design focus on interlaced CCIR-601 (720x480 pixels) video @ 4 to 9

Mbps.

• Targeted at broadcast and DVD applications.

• Extensible to lower and higher resolutions

1) downward compatibility with MPEG-1
2) includes support of HDTV formats

• MPEG-2 Video (ISO/IEC 13818-2) promoted to International Standard in

November, 1995.

© 1995-99
22 Sarnoff Corporation
MPEG International Standards
• MPEG-1 (ISO/IEC 11172)
– 11172-1: Systems
– 11172-2: Video
– 11172-3: Audio
– 11172-4: Conformance
These standards are
– 11172-5: Software available from ISO
• MPEG-2 (ISO/IEC 13818) and ANSI
– 13818-1: Systems
– 13818-2: Video
– 13818-3: Audio
– 13818-4: Conformance
– 13818-5: Software
– 13818-6: Digital Storage Media - Command & Control (DSM-CC)
– 13818-7: Non-Backward Compatible Audio
– 13818-9: Real-Time Interface
– 13818-10: DSM-CC Conformance
© 1995-99
23 Sarnoff Corporation
MPEG-1 vs. MPEG-2 Operating Points

Image Size &

Frame Rate

1920x1080
30 Hz MPEG-2 HDTV
Broadcast
1280x720
30 Hz MPEG-2 Standard MPEG-2 Standard
Definition Broadcast Definition Production
720x480 •••
30 Hz

360x240 MPEG-1
CD-ROM
30 Hz

5 10 15 20
Bit Rate (Mb/s)

MPEG-2 = MPEG-1 Syntax Elements

+ Interlace Tools
+ New Syntax Structures
+ Scalable Modes
+ Profiles & Levels

• Broadcast video is interlaced

• MPEG-1 does not handle interlaced
video efficiently
• MPEG-2 adds key interlace tools:
– Field Picture Structure
– Field DCT
– Field Prediction Modes
– Alternate Zig-Zag Scan
– 3:2 Pulldown Support
– Field-Based Pan-and-Scan Support
© 1995-99
26 Sarnoff Corporation
Key Points about MPEG Video
• MPEG only specifies bitstream syntax and decoding
process
• Encoding algorithms (e.g., Motion Estimation, Rate
Control and Mode Decisions) are open to invention
and proprietary techniques
• MPEG is asymmetric in that much less computational
power is required in the decoder.
• Example:
– SDTV MPEG-2 encode: 20 GIPS
– SDTV MPEG-2 decode: 600 MIPS

MPEG
Syntax

Motion Motion
Est Comp

Rate
Control VLC VLD

-1
DCT Q Q-1 DCT

GOP
(Display Order, B B I B B P B B P B B P
N=12, M=3)

Cr Note:

Picture Y Y = Luma
Cr = Red-Y

Slice
Cb Cb = Blue-Y

0 1
Macroblock 4 5
2 3
Y Blocks Cr Block Cb Block
© 1995-99
29 Sarnoff Corporation
MPEG Video Layers (cont’d)
• Important syntax elements in each layer:

Sequence Picture Size; Frame Rate

Bit Rate; Buffering Requirements
Programmable Coding Parameters

GOP Random Access Unit

SMPTE Time-Code

Picture Timing information (buffer fullness, temporal

reference), Coding type (I, P, or B)

Slice Intra-frame addressing information

Coding re-initialization (error resilience)

Macroblock Basic coding structure, Coding method, Motion

Vectors, Quantization

Block DCT coefficients

© 1995-99
30 Sarnoff Corporation
Key Concepts
• For a given bit rate, the following coding parameters
greatly affect picture quality:
– GOP Structure
• longer GOP’s improve picture quality but decrease
random access (i.e., lengthen channel change time)
• dynamic GOP’s can be used creatively to handle scene
changes and other effects
– MV Search Range
• Wider searches are better, but more costly
• A large search range is a must for fast action (e.g.,
sports)
– Rate Control
• Mode decisions greatly affect number of coded bits
• Proprietary schemes will continue to dominate
© 1995-99
31 Sarnoff Corporation
Typical MPEG Encoder Structure
Quantization Parameters
From Rate
To VLC
Re-Sequenced Prediction Controller
Encoder
Input Error coefficients
DCT Q
motion vectors
Q-1
Predicted
Image

-1
DCT
Embedded Decoder
Frame
Memory 1
inter Motion
Motion Compensated
Estimator Prediction Reconstructed
Image
Frame
"0" Memory 2
intra

Motion Vectors

© 1995-99
32 Sarnoff Corporation
Sequence
• For CD-ROM applications, sequences can be used to
indicate relatively long clips (e.g. shots, scenes or
entire movies)
• For broadcast applications, sequence headers are
usually sent frequently (e.g., every GOP) so that key
bitstream info is obtained at channel changes

Video 1

Viewer changes channels here...

1 GOP

Video 2

SEQ Header + ...but decoder must wait until next

GOP Header + SEQ header to start decoding
I Frame Pic Header
© 1995-99
33 Sarnoff Corporation
MPEG-2 Structures
• Sequence Structures
– Progressive Sequences: contain frames pictures
– Non-Progressive Sequences: may contain frame and field
pictures

• Frame Structures
– Progressive Frame: its two fields come from same time
instant
– Non-Progressive Frame: its two fields come from different
times

• Picture Structures
– Frame Picture
– Field Picture: must occur in pairs; a frame = two field
pictures
– Both frame and field pictures may be used in the same
non-progressive sequence.
© 1995-99
34 Sarnoff Corporation
Sequence Types
Non-Progressive Frame Picture
Progressive Frame Picture Composed of two Field Pictures

• MPEG-2 allows both Progressive and Non-Progressive Sequences.

• A Non-Progressive Sequence may contain both Frame Pictures and
Field Pictures.

© 1995-99
35 Sarnoff Corporation
Group of Pictures (GOP)
• Contains three types of pictures:
- Intra (I) pictures intraframe-only spatial DCT
- Predicted (P) pictures DCT with forward prediction
- Bi-directional (B) pictures DCT with bi-directional prediction

Forward Prediction

I B B P B B P B B P B B I

Time

Bi-directional Prediction

© 1995-99
36 Sarnoff Corporation
Anchor Pictures
• I and P pictures
– stored in two frame buffers in encoder and decoder
– form the basis for prediction of P and B pictures

I B B P B B P B B P B B I

Time

Anchor Pictures
© 1995-99
37 Sarnoff Corporation
I Pictures
– DCT coded without reference to any other pictures
– stored in a frame buffer in encoder and decoder
– used as basis of prediction for entire GOP

I B B P B B P B B P B B I

Time

All these P and B pictures depend on

the preceding I picture
I Picture

© 1995-99
38 Sarnoff Corporation
P Pictures
– DCT coded with reference to the preceding anchor picture
– stored in a frame buffer in encoder and decoder
– use forward prediction only
Forward Prediction

I B B P B B P B B P B B I

Time

This P picture This P picture

depends on depends on
this I picture this P picture

© 1995-99
39 Sarnoff Corporation
B Pictures
– DCT coded with reference to either the preceding anchor
picture, the following anchor picture, or both
– use forward, backward or bi-directional prediction
Bi-directional Prediction

I B B P B B P B B P B B I

Time

This B picture This B picture

depends on depends on
this I picture and this P this P picture and this P
picture picture

© 1995-99
40 Sarnoff Corporation
Forward Prediction
– a forward-predicted macroblock depends on decoded
pixels from the immediately preceding anchor picture
– can be used to code macroblocks in P and B pictures

I B B P B B P B B P B B I

Time

– the arrows, as shown, indicate direction of motion

– if arrows are reversed, they indicate coding dependencies

© 1995-99
41 Sarnoff Corporation
Backward Prediction
– a backward-predicted macroblock depends on decoded
pixels from the immediately following anchor picture
– can only be used to code macroblocks in B pictures

I B B P B B P B B P B B I

Time

© 1995-99
42 Sarnoff Corporation
Bi-directional (Interpolated) Prediction
– a bi-directionally-predicted macroblock depends on
decoded pixels from the anchor pictures immediately
following and immediately preceding
– can only be used to code macroblocks in B pictures

I B B P B B P B B P B B I

Time

• A GOP must contain at least one I picture

• This I picture may be followed by any number of
I and P pictures
• Any number of B pictures may occur between
anchor pictures, and B pictures may precede the
first I picture
• A GOP, in coding order, must start with an I
picture
• A GOP, in display, order must start with an I or B
picture and must end with an I or P picture

© 1995-99
44 Sarnoff Corporation
Regular and Irregular GOP’s
• Regular GOP’s are defined by N and M*:
– N is the I picture interval
– M is the anchor picture interval. There are M-1 B pictures between anchor
pictures
• Irregular GOP’s are not defined by N and M, but are still allowed
as long as they follow the GOP Rules.

Regular: N=1, M=1 I I I I I I I I I I I I

(12 GOP’s shown)

Regular: N=6, M=2 B I B P B P B I B P B P

(2 GOP’s shown)

Regular: N=12, M=3 B B I B B P B B P B B P

(1 GOP shown)
*N and M are not MPEG
syntax elements and are
Irregular B B I B B B B B P P B P not used in any way by
the specification.

All GOP’s in Display Order

© 1995-99
45 Sarnoff Corporation
Closed and Open GOP’s
• Closed GOP’s can be decoded independently, without
using decoded pictures in previous GOP’s.
• Open GOP’s require such pictures to be available.

Closed GOP’s

Regular: N=4, M=2 B I B P B I B P B I B P

(3 GOP’s shown)
Note that first B picture must be restricted to use
backward prediction only.

Open GOP’s

Regular: N=4, M=2 B I B P B I B P B I B P

(3 GOP’s shown)

Note that first B picture depends on last anchor picture

from previous GOP.

© 1995-99
46 Sarnoff Corporation
GOP Picture Orderings
• Two Distinct Picture Orderings
– Display Order (input to encoder, output of decoder)
– Coding Order (output of encoder, input to decoder)
– These are different if B frames are present
– B frames must be reordered so that “future” anchor pictures are
available for prediction. Note that reordering causes DELAY!

GOP
Display Order B B I B B P B B P B B P
Input to Encoder

GOP
Coding Order I B B P B B P B B P B B
Output of Encoder

GOP
Display Order B B I B B P B B P B B P
Output of Decoder

© 1995-99
47 Sarnoff Corporation
Slice Structures
• A slice is a collection of macroblocks in raster scan order.
• Restriction on slice sizes:
- MPEG-1 has none. Can be single MB or entire picture.
- MPEG-2 restricts a slice to be contained within a row of macroblocks
• MPEG-2 allows gaps between slices in “General Slice
Structure”
• MPEG-2 defines “Restricted Slice Structure”, in which no
gaps are allowed. This is used in most Profiles and Levels.
A
B C
D E F
G
H I
Example of
J
Restricted Slice Structure
K L M N O P Q
R S
T
U V
W X Y Z

4:2:0 Cr Cb
(Required in
Y
2Hx2V
HxV HxV

MPEG-1) CD-ROM and Broadcast Apps.

4:2:2
(Option in
Y
2Hx2V
Cr
Hx2V
Cb
Hx2V
Studio
Apps.
MPEG-2)

Y
2Hx2V
Cr
2Hx2V
Cb
2Hx2V

4:4:4 (Option in MPEG-2)

6 Blocks Y Cr Cb

0 1 4 5
4:2:2 2 3 6 7

8 Blocks Y Cr Cb

0 1 4 8 5 9
4:4:4 2 3 6 10 7 11
12 Blocks Y Cr Cb Spatial Sampling
Relationship
luma chroma

Spatial Spatial
Image 8x8 8x8 Reconstructed
domain Transform domain domain
Forward Inverse Image
DCT 8x8 coefficients DCT 8x8 pixels
8x8 pixels

• DCT is an orthogonal transformation

• 2-D DCT is separable in x and y dimensions

• Has good energy compaction properties

• Close to Karhunen-Loeve Transform (KLT), which is optimal but depends

on image statistics.

• Efficient hardware realization

• Theoretically lossless, but slightly lossy in practice due to round off errors

© 1995-99
51 Sarnoff Corporation
Discrete Cosine Transform (cont’d)
• Transforms 8x8 pixel block into 8x8 frequency coefficient matrix

• Organizes video information in a way that is easy to compress and

manipulate

• DCT applied to Intra blocks as well as motion-compensated blocks

“DC” low horizontal high

255 255 255 255 255 255 255 255 1105 238 358 158 30 -56 -49 -31
low
255 187 204 255 255 255 255 255 548 -379 -143 19 71 66 32 9

255 122 20 102 230 255 255 255 207 103 -171 -81 -58 7 24 31

255 153 0 0 35 136 213 255

8x8 -52 162 -34 -66 -18 -20 -20 -21
Forward vertical
255 196 0 0 0 0 17 94 DCT -33 13 71 -52 -18 -3 9 -4

255 247 43 0 0 0 0 0 11 -56 56 23 -28 -3 -6 1

255 255 82 0 0 0 0 0 -5 -14 -11 49 -1 -18 -9 8

255 255 128 0 0 0 0 0 -27 9 -24 28 34 -24 -4 3

high

pixels DCT coefficients

© 1995-99
52 Sarnoff Corporation
8x8 Blocks and Their Transforms
Block of Their DCT
MPEG Flower Garden 8x8 Pixels Coefficients

Flat Area

Vertical Edge

Horizontal Edge

Single
Pixel
Diagonal Line

© 1995-99
53 Sarnoff Corporation
DCT and IDCT Formulas
DC
Coeff.
x u
f(x,y) 2-D
DCT F(u,v)
AC
Coeffs.
y v
Pixels DCT Coefficients

Forward DCT: N-1, N-1

F(u,v) = (2/N) C(u) C(v)
x=0, y=0
SS
f(x,y) cos[(2x+1)uπ/2N] cos[(2y+1)vπ/2N]

Inverse DCT: N-1, N-1

f(x,y) = (2/N)SS
u=0, v=0
C(u) C(v) F(u,v) cos[(2x+1)uπ/2N] cos[(2y+1)vπ/2N]

where: C(u), C(v) = {1/Ö2 for u,v = 0; 1 otherwise}

N=8
© 1995-99
54 Sarnoff Corporation
2-D DCT Basis Images
0 1 2 3 4 5 6 7
0

3 v
(Vert.
4 Freq.)

u (Horizontal Frequency)

Quantized
Reconstructed
DCT Q -1 DCT -1
Image coefficients Q Image

• Quantization can be thought of as dividing each

transform coefficient by a frequency-dependent value,
and then rounding or truncating to the nearest integer

• Inverse quantization is like multiplication

• Quantization coefficients can be tailored to noise

sensitivity of Human Visual System

• Quantization is LOSSY! Reconstructed pixels usually

differ in value from original

• Quantization causes information to be irretrievably lost

© 1995-99
56 Sarnoff Corporation
Quantization Tools
• Quantization Matrix (QM)
– 8x8 matrix can be shaped so that coarser quantization of
high spatial frequencies occurs
– coarser quantization of high spatial frequencies saves bits
but causes little or no subjective degradation
– In MPEG-2, up to four QM’s (luma intra/non-intra and
chroma intra/non-intra) can be changed at the picture rate
– Default matrices are specified and need not be sent, but
different ones can be downloaded

• Quantizer Scale (QS)

– QS can change on a macroblock basis
– rate control’s job is to modify QS in a way that keeps
picture quality high for a given bit rate

100
Nonlinear
Quantizer Scale
80 (q_scale_type = 1)

60
Linear
Quantizer Scale
40
(q_scale_type = 0)

0
0 10 20 30 40

quantizer_scale_code [1, 31]

(sent in bitstream)
© 1995-99
58 Sarnoff Corporation
Quantization Example
DCT Frequency
Coefficients
T[u][v]
DC 276 59 89 39 7 -13 -12 -7
137 -94 -35 4 17 16 7 2
51 25 -42 -20 -14 1 5 7 Quantized DCT Coefficients
-12 40 -8 -16 -4 -4 -5 -5
T’[u][v]
-8 3 17 -13 -4 0 2 -1
´16
DC 35 1 2 1 0 0 0 0
2 14 14 5 -7 0 -1 0 A 3 2 -1 0 0 0 0 0
-1 -3 -2 12 0 -4 -2 1 1 0 -1 0 0 0 0 0
-6 2 -6 6 8 -5 -1 0 Pointwise 0 1 0 0 0 0 0 0
Division A/B 0 0 0 0 0 0 0 0
DC 8 16 19 22 26 27 29 34 and Rounding
0 0 0 0 0 0 0 0
16 16 22 24 27 29 34 37
0 0 0 0 0 0 0 0
19 22 26 27 29 34 34 38 B 0 0 0 0 0 0 0 0
22 22 26 27 29 34 37 40
22 26 27 29 32 35 40 48
´QS Note: Quantization of DC term
26 27 29 32 35 40 48 58 is fixed and does not depend
26 27 29 34 38 46 56 69
on QM or QS.
Quantizer Scale
27 29 35 38 46 56 69 83 QS = 40
(from Rate Controller)
Default Intra
Quantization Matrix
QM[u][v]

DC 8 16 19 22 26 27 29 34 16 16 16 16 16 16 16 16
16 16 22 24 27 29 34 37 16 16 16 16 16 16 16 16
19 22 26 27 29 34 34 38 16 16 16 16 16 16 16 16
22 22 26 27 29 34 37 40 16 16 16 16 16 16 16 16
22 26 27 29 32 35 40 48 16 16 16 16 16 16 16 16
26 27 29 32 35 40 48 58 16 16 16 16 16 16 16 16
26 27 29 34 38 46 56 69 16 16 16 16 16 16 16 16
27 29 35 38 46 56 69 83 16 16 16 16 16 16 16 16

Intra Matrix: QMI[u][v] Non-Intra Matrix: QMN[u][v]

Note: AC coefficients (all coefficients Note: All coefficients are first

except DC) are first multiplied by 16, multiplied by 16, then divided
then divided by QS*QMI[u][v]. by QS*QMN[u][v].

DC term is treated specially.

• For improved quality in certain 16 17 18 19 20 21 22 23

coding situations, quantization 17 18 19 20 21 22 23 24
matrices for Intra and Non-Intra 18 19 20 21 22 23 24 25
macroblocks can be 19 20 21 22 23 24 26 27
downloaded. 20 21 22 23 25 26 27 28

• The decoder uses these instead 21 22 23 24 26 27 28 30

of the defaults (which are not 22 23 24 26 27 28 30 31
sent in the bitstream) 23 24 25 27 28 30 31 33

• The example at right shows an Example of

improved Non-Intra Quant Matrix Downloadable Matrix
used by the MPEG-2 Test Model (TM5 Non-Intra Matrix)
5 (TM5)

“freq” “freq”

Before Quantization After Quantization

Reconstruction Levels
DC

“freq” “freq”

Before Quantization After Quantization

Tilted Matrix
© 1995-99
62 Sarnoff Corporation
Quantization Artifacts
Original
8x8 Block QS = 2 QS = 5 QS = 10 QS = 15

Vertical
Edge

Corner
Edge

Diagonal
Edge

Shown after DCT, Quantization, Inverse Quantization and Inverse DCT

using default Intra Quantization Matrix and Linear Quantizer Scale
© 1995-99
63 Sarnoff Corporation
Variable Length Coding (VLC) and
Decoding (VLD)

Variable
bit rate
Image DCT Q VLC VLD Q-1-1
Q DCT -1 Reconstructed
Image

• Quantization zeros out many DCT

coefficients

• Zig-Zag scanning of the quantized DCT

coefficients yields runs of zeros

• Non-Zero Levels and Runs of Zeros can be

coded efficiently using VLC's

• VLC causes variable bit rate output!

• To optimize the runs, the block is zig-zag scanned

DC Coefficients are
differenced from block
DC 35 1 2 1 0 0 0 0
35 to block and VLC’d
3 2 -1 0 0 0 0 0 0, 1
0, 3
1 0 -1 0 0 0 0 0
0, 1
0 1 0 0 0 0 0 0 0, 2 Common Run/Level
0, 2
0 0 0 0 0 0 0 0 0, 1 Pairs are VLC’d
0 0 0 0 0 0 0 0 0, -1
3, 1
0 0 0 0 0 0 0 0 0, -1
0 0 0 0 0 0 0 0 End of Block

Zig-zag scan (MPEG-1 pattern) Corresponding

through quantized DCT coefficients Run/Level Pairs

© 1995-99
65 Sarnoff Corporation
MPEG-2 Enhancements
Alternate
Field & Zig-Zag and
Field and Frame Frame DCT Quantization VLC coding
Pictures Parameters
DCT coefficients
– DCT Q VLC &
Bitstream
IQ Packer MPEG-2
Predicted Video
Image Linear & motion
Bitstream
vectors
Nonlinear QS
IDCT
Embedded Decoder Headers
Frame
Mem 1
Motion
Estimator
Motion
Comp +
Frame
“0” Mem 2

Motion Vectors
Field &
Frame Prediction
© 1995-99
66 Sarnoff Corporation
MPEG-2 Zig-Zag Scan Options

8x8 Blocks of Quantized DCT Coefficients

DC DC

For Frame DCT

coding of inter-
laced video, more
energy exists
here, so run length
coding is more
efficient.

Normal Zig-Zag Scan. Alternate Zig-Zag Scan.

Mandatory in MPEG-1. Not used in MPEG-1.
Option in MPEG-2. Option in MPEG-2.

• Field DCT: Split into top and bottom fields

• MPEG-2 encoder may choose Field DCT on any macroblock.

• Decoder must interpret coding flag correctly, or severe errors will occur.

y Field DCT Coding Luminance Macroblock Frame DCT Coding

x
• •
• •
• •

Note: Chrominance blocks in 4:2:0 mode are always DCT coded in Frame order

© 1995-99
68 Sarnoff Corporation
Variable Length Coding
• Huffman type “entropy” coding
• Shorter codewords assigned to more probable symbols (like Morse Code)
• Used for motion vectors, run/level pairs, type of macroblocks, etc.

Example: DCT AC coefficients: 0,1 110

1,1 0110
0,-1 111
7,-1 0001001
EOB 10

Example: Vectors delta coded: 0 1

1 010
2 0010
3 00010
4 0000110
5 00001010
...
15 000000011010

© 1995-99
69 Sarnoff Corporation
Rate Control
Rate
Controller
Constant
Bit-Rate Reconstructed
Image DCT Q VLC Buffer Buffer VLD Q -1 DCT
-1
Image

• A buffer is used to smooth out the bit rate

• Rate controller adjusts quantizer to control buffer fullness and prevent

overflow and underflow of decoder’s buffer (Video Buffer Verifier)

• Buffer size affects image quality and overall delay

• Rate control algorithm is crucial for high quality compression

• Shown above is basic structure for:

- Motion JPEG
- Intraframe H.261
- Intraframe MPEG

+ CBR
Reconstructed
Image
-
- DCT Q VLC Buf Buf VLD Q -1 DCT -1 + Image

Frame
Delay
Q -1

Predicted
Image -1
DCT
• To exploit redundancy in still portions of an image
sequence, the difference between the input and
+ the reconstructed previous frame is coded
Frame
Delay
• Encoder gets more complex and includes copy of
decoder (called an embedded decoder)

• Moving areas are not coded well using this

scheme, so MPEG uses Motion Compensated
Prediction.

+ CBR
Reconstructed
Image
-
- DCT Q VLC BUF BUF VLD Q -1 DCT -1 + Image

Motion
Motion Q -1 Compensator
Estimation
Motion Vectors
Predicted
Image
DCT -1
• Most motion is predictable, and motion
compensation exploits this fact.

Motion
+ • Motion Estimation is the process by which motion
Compensator vectors are computed in the encoder. It can be
Reconstructed
Image quite computationally intensive.
Motion Vectors
• Motion vectors are used by the Motion
Compensators in the encoder and decoder to
produce Predicted Images from Reconstructed
Images.

• We now have P frames.

input
image
– DCT/Q VLC
• Coarse motion vectors
predicted Q-1/DCT-1 computed from input
image images.
• “Refined” motion vectors,
+ e.g., half-pel refinement,
MC
recon. computed from
image
reconstructed images.
Coarse refined motion vectors Fine • Good compromise
ME ME between “true motion”
coarse motion vectors
and small error.
• Used in MPEG-2 Test
Model 5.

F
X Current
MVF Macroblock

Previous I or P Picture Current P or B Picture

• Instead of sending quantized DCT coefficients of X, send:

1. quantized DCT coefficients of X-F (prediction error). If prediction
is good, error will be near zero and will code with fewer bits.
2. MVF, the motion vector. This will be differentially coded with
respect to its neighboring vector, and will code efficiently.
• This will typically result in 50% - 80% savings in bits.

© 1995-99
74 Sarnoff Corporation
Gray-Scale Statistics of Prediction Error
One Frame of Original Image Pair Prediction Error

0.02
Histogram 0.25
Histogram
0.018

0.016 0.2

0.014

0.012 0.15

0.01

0.008 0.1

0.006

0.004 0.05

0.002

0 0
-100 -50 0 50 100 150 200 250 300 350 400 -250 -200 -150 -100 -50 0 50 100 150 200 250

MB Grid

Search Area

Position of "zero motion vector" MB

(center of search area)

Motion Vector
(e.g., [-20.5, +20.5])
Position of "best match" MB
(to half-pixel accuracy -
need not be aligned to MB grid)
Position of
current Macroblock
(aligned to MB grid)

Time
Previous I or P Picture Current P or B Picture

offset (k,l)

X MB
grid

X’
j X’ = 16x16 prediction MB j X = 16x16 current MB

• Minimum Mean Absolute Error: MMAE = min |X - X’|

k,l i,j 256

• Minimum Mean Squared Error: MMSE = min (X - X’)2

k,l i,j 256

Search Area Macroblock Grid

Previous I or P Picture. Current P Picture.

Within the search area, a Current MB is shown
good match is found with heavy outline. Since
for this still object. a match is found, this
MB is intercoded.

Macroblock Grid

Search
Area

Previous I or P Picture. Current P Picture.

Within the search area, many Current MB is shown
good matches are found. Encoder with heavy outline. Since
must pick one and send appropriate a match is found, this
motion vector. MB is intercoded.

© 1995-99
79 Sarnoff Corporation
Example of Forward Motion Estimation
Case: Good prediction for linearly translating objects.

Macroblock Grid

Search
Area

Previous I or P Picture. Current P Picture.

Within the search area, a Current MB is shown
good match is found for this with heavy outline. Since
moving object. Encoder sends a match is found, this
appropriate forward motion vector. MB is intercoded.

© 1995-99
80 Sarnoff Corporation
Example of Forward Motion Estimation
Case: A good prediction might be missed because it is outside the search area.

Macroblock Grid

Search
Area

Previous I or P Picture. Current P Picture.

Within the search area, no good match Current MB is shown
is found. Note that a good match would with heavy outline. Since
be found with a larger search area. Search area no match is found, this
is an important encoder design parameter. MB is intracoded.

© 1995-99
81 Sarnoff Corporation
Example of Forward Motion Estimation
Case: A good prediction might come from an unrelated object.

Macroblock Grid

Search
Area

Previous I or P Picture. Current P Picture.

Within the search area, a good match Current MB is shown
is found, but within a different object. with heavy outline. Since
There is no requirement that motion a match is found, this
vectors represent true motion of MB is intercoded.
objects.

Macroblock Grid

P P P P P P P

P P I P P P P

P P P P P P P

Previous I or P Picture Current P Picture Prediction Error Picture,

with MB Type and Motion
Vectors Superimposed.
(I = Intra, P = Inter)

© 1995-99
83 Sarnoff Corporation
Example of Backward Motion Estimation
Case: Handles uncovered objects missed by forward prediction.

Previous I or P Picture. Current B Picture. Next I or P Picture.

Searching here finds no Current MB is shown Searching here finds
good match because with heavy outline. a good match because
some features are features are now
partially hidden. uncovered.

© 1995-99
84 Sarnoff Corporation
Forward/Backward/Interpolated Decision
...must be made for every non-intra macroblock in a B picture...

F
X
MVF

MVB B

Previous I or P Picture Current B Picture Next I or P Picture

Define: X = Current MB
F = “Best” MB in previous I or P Picture
B = “Best” MB in next I or P Picture
MVF = MV corresponding to F’s displacement from X
MVB = MV corresponding to B’s displacement from X

Compute: “Goodness” of F, B and (F+B)/2 as predictors for X

Decide: If F is best, send MVF Forward Prediction

If B is best, send MVB Backward Prediction
If (F+B)/2 is best, send MVF and MVB Interpolated Prediction

Motion Vectors (MV’s) shown for

MV Field 8 successive macroblocks.

x 3 10 30 30 -14 -16 27 24 Assume all [x, y] for picture in RANGE

MV
y -10 -10 -9 -9 -11 -11 -10 -10 [-32, 31] => f_code = 2, MODULUS= 64.

x 3 7 20 0 -44 -2 43 -3 DMV = Differential MV. [0,0] used

DMV
y -10 0 1 0 -2 -0 1 0 as predictor for first MV.

x 3 7 20 0 20 -2 -21 -3 Add or subtract MODULUS if out of

DMV’
y -10 0 1 0 -2 -0 1 0 RANGE. Keeps all values in RANGE.

DMV’’ x 0101,000101,00000100100,10,00000100100,0110,00000100110,0110 Convert to VLC’s using table Table 2-B.4

in the MPEG-1 Video spec. VLC’s used in
VLC y 000010110,10,11,10,0111,10,11,10
this example are for illustration only.

• Note that the vertical components of the MV’s are much more
correlated than the horizontal components.
• Therefore, the MV differentials for the vertical components
code with fewer bits.
© 1995-99
86 Sarnoff Corporation
MPEG-2 Prediction Modes
• Frame Prediction
– in a frame picture, field prediction or frame prediction is selected
on a macroblock basis
• Field Prediction
– predictions are made independently for each field
– in a field picture, all predictions are field predictions
• Dual Prime
– can be used in field pictures or frame pictures
– can only be used in P pictures
– one MV plus a differential MV sent per macroblock
• 16x8 Motion Compensation
– can only be used in field pictures
– two MV’s are sent for forward or backward prediction
– first MV used for upper 16x8 region, second MV for lower
– four MV’s are sent for bi-directional prediction
© 1995-99
87 Sarnoff Corporation
Allowable MPEG-2 Prediction Modes

Frame Pictures Field Pictures

16x8 Motion
Frame Prediction
Compensation

Field Prediction Field Prediction

Dual Prime Dual Prime

Frame Prediction
16x16
Best 16x16 region in
16x16 Reference Picture
determines frame MV for
Current MB
16x16 MB. Only mode
allowed in MPEG-1.

16x8
Field Prediction
Top 16x8
Field 16x8
or
Top Field of Current MB
Best 16x8 region in
Top or Bottom field in
Reference Picture
Bottom
16x8
or 16x8
determines field MV’s
Field 16x8
Bottom Field of Current MB for Top and Bottom
portions of 16x16 MB.

16x8
16x8
In Frame Pictures
Top 16x8
Average Single MV (heavy arrow) sent in
Field Top Field of Current MB bitstream; this represents predictions from
fields of same parity. Small differential
MV’s are also sent; these represent offset
16x8
16x8 Average predictions from fields of opposite parity.
Bottom 16x8 Same and opposite field predictions are
Field averaged to form final prediction for each
Bottom Field of Current MB
16x8 region of current MB.

16x16 In Field Pictures

First 16x16
Single MV (heavy arrow) sent in
Field Average bitstream; this represents prediction from
field of same parity. A small differential
MV is also sent; this represents an offset
prediction from field of opposite parity.
Second 16x16 This Field not Same and opposite field predictions are
Field yet decoded. averaged to form final prediction for
current 16x16 MB.

Vector Transmitted
in Bitstream for
Same Parity Fields

Differential Vector
Transmitted
in Bitstream
(limited to values
-1, 0, +1)

Vector Derived
at Decoder
for Opposite
Parity Fields
Reference Predicted
Picture Picture

© 1995-99
91 Sarnoff Corporation
Concealment Motion Vectors
• An MPEG-2 enhancement; not a requirement
• Helps in concealing errors when data is lost
• Concealment motion vectors (CMV’s), if sent,
are coded with Intra macroblocks (MB’s)
• CMV’s should be used in MB’s immediately
below the one in which the CMV occurs

Use CMV’s in this

Group of row for MB’s below
Intra-coded Macroblocks in
this row are lost
Macroblocks
with CMV’s

+ 0 CBR
Reconstructed
Image
-
- 1
DCT Q VLC BUF BUF VLD Q -1 DCT
-1
+ Image

Intra/Inter
Motion
Mode 0 1
Motion Intra/Inter Q -1 Compensator
Estimation Decider

Motion Vectors 0
-1
DCT
• On a macroblock basis, decide whether
0 0 it's more efficient to code original signal
1
Motion
+ or motion compensated prediction error
Compensator
Reconstructed • Some pictures are coded entirely
Image
intraframe (I-pictures). This is useful for
Motion Vectors
resetting prediction loop and for editing

• Basic structure of H.261 codec

1. MC vs. No MC
– if Motion Compensation is best, select “MC” and transmit
motion vector(s); if B picture, select forward, backward or
interpolated
– otherwise, select “No MC”; do not transmit motion vector; it
is assumed to be 0
2. Intra vs. Inter
– should MV found in step 1 be used? If so, select “Inter”
3. Coded vs. Not Coded
– if quantized prediction error is zero, select “Not Coded”
4. Quant vs. No Quant
– if quantizer scale needs to be changed, select “Quant”

© 1995-99
94 Sarnoff Corporation
Example of MB Type Selection
for P Pictures
Quant
pred-mcq
Coded
No Quant
Non Intra pred-mc
Not Coded
MC pred-m
Quant
Intra intra-q
Begin
No Quant
intra-d
No MC
Quant
pred-cq
Coded
No Quant
Non Intra pred-c
Not Coded
skipped

© 1995-99
95 Sarnoff Corporation
Example of MB Type Selection
for B Pictures
Quant pred-fcq
Coded
Forward No Quant pred-fc
Not Coded pred-f or skipped
Quant pred-bcq
Coded
No Quant pred-bc
MC Backward
Not Coded pred-b or skipped

Begin Quant pred-icq

Coded
Interpolated No Quant pred-ic
Not Coded pred-i or skipped
Quant
No MC Intra intra-q
No Quant
intra-d
© 1995-99
96 Sarnoff Corporation
Macroblocks and Quantizer Scale Codes
• Quantizer Scale Codes are 5-bit integers sent in
every slice header and selected MB headers
• Decoder uses most recent value for all subsequent
MB’s until another Quantizer Scale Code is
encountered.
These quant scales coded in bit stream A single
MB

Slice
Header 9 (9) (9) (9) 5 (5) 4 (4) 6 (6) (6) (6)

A single slice

Decoder uses values shown in parentheses

© 1995-99
97 Sarnoff Corporation
Skipped Macroblocks
• MB’s cannot be skipped in I Pictures
• MB’s can be skipped in P and B pictures if certain
rules apply
Portion of a
P or B Picture A slice

The first MB The last MB

of a slice must of a slice must
be coded be coded

These MB’s can be skipped if:

1) all quantized DCT coeffs = 0, and
2) all MV’s = 0 (in P pictures), or
all MV differentials = 0 (in B pictures)

© 1995-99
98 Sarnoff Corporation
Forward Analysis and Resequencing
Forward Analysis is a look-ahead technique that can be
used to help the Rate Controller adjust quantization in a
more optimal fashion

Forward Rate
Analyzer Controller
R R
e + 0 CBR e
Reconstructed
Image s
e -
- 1
DCT Q VLC BUF BUF VLD Q -1 DCT
-1
+
s
e
Image

q q
Intra/Inter
Mode 0 1 Motion
Motion Intra/Inter Q -1 Compensator
Estimation Decider

Motion Vectors 0
-1
DCT

0 0
1 • B frames must be resequenced from
Motion
+ display to coding order
Compensator
Reconstructed
Image
Motion Vectors • Basic structure of MPEG codec

• Sequence layer Picture Picture Aspect

Bitrate
Picture ...
• GOP layer width height ratio rate

Sequence Sequence
Header Sequence Header Sequence

GOP Picture
Picture ... Picture
Picture
Header Header Header

Temporal Picture VBV Extension Picture

Reference Type Delay ... Start Code Structure ...

Block Macroblock
• Picture layer
• Slice layer
• Macroblock layer
• Block Layer Slice

Picture Slice
Macroblock ... Macroblock
Slice
Macroblock ...
Header Header Header

Coded
Address Type
Quantizer Motion Block Block ... Block
Scale Vectors Pattern

© 1995-99
101 Sarnoff Corporation
3:2 Pulldown
• MPEG-2 provides a mechanism for film-originated content to
be coded at 24 frame/sec but displayed at 30 frames/sec
• The lower frame rate of film means it can be coded at the
same quality as 30 frame/sec video, but at a lower bit rate.
• The repeat_first_field (rff) and top_field_first (tff) flags allow
decoders to recreate the 3:2 pulldown sequence for display.
1/24 sec
Film Frames
rff=1 rff=0 rff=1 rff=0 coded as progressive
tff=1 tff=0 tff=0 tff=1 frames at 24 frames/sec

3:2 pulldown alternately

creates 3 and 2 displayed
fields for each input frame

1/60 sec
1/30 sec
repeat repeat
first field first field

© 1995-99
102 Sarnoff Corporation
Pan-and-Scan
• MPEG-2 provides a mechanism for panning a display
rectangle around a reconstructed frame
• Horizontal and vertical offsets are specified to 1/16 pixel
resolution and can be sent for every displayed field.
• This allows widescreen material to be viewed on 4:3
displays.

4:3 16:9
Display Reconstructed
Rectangle Frame

frame_centre_horizontal_offset

In this example the horizontal

frame center offset is a positive number.

DCT
Coeffs
VLD Inv
Scan
Q-1 DCT-1 + Sat.
Decoded
Pixels
Zig-Zag Scan Mode
Combine
Predictions
Quant Scale Factor & Quant Matrices
MPEG-2
Bitstream
Motion
Parsing Vectors
Half-Pel
Dual Prime Chroma Half-Pel Info
VLD Arithmetic
Prediction
Scaling Filtering

Vector
Predictors
Field/Frame Prediction Selection
Framestore Frame
Addressing Stores

NOTE: This is a simplified, high-level functional diagram that integrates

several separate diagrams in the MPEG-2 Video Spec (ISO/IEC 13818-2).

• More About Rate Control

• The Video Buffer Verifier
• MPEG-2 Profiles and Levels
• Statistical Multiplexing
• Practicing the Art of MPEG

© 1995-99
105 Sarnoff Corporation
Rate-Distortion Curve
• As the rate increases, the distortion decreases.
R3 • For a given distortion, the rate increases with
complexity.
R2 • At zero distortion, the source is coded at its
entropy, Rn.
R1 • At zero rate, the source is not coded. The
distortion is equal to the source energy, sn2.
Rate

increasing
complexity

0 s12 s22 s32

Distortion

© 1995-99
106 Sarnoff Corporation
Distortion and Quant Scale
• As quant scale increases, so does distortion.
• For a given quant scale, the distortion generally increases
with complexity.

sn2

Distortion

increasing
complexity

1 5 10 15 20 25

Quantizer Scale Code

R3 • As quant scale decreases, the bit rate increases.

• For a given quant scale, the bit rate increases
with complexity.
R2
• For minimum distortion, use the smallest quant
scale.
R1

Rate
(e.g., bits/
increasing
picture)
complexity

1 5 10 15 20 25

Quantizer Scale Code

© 1995-99
108 Sarnoff Corporation
Constant Quality Encoding
• For a given picture type (I, P or B), constant quality is
achieved with a fixed quant scale.
• For sequences with mixed picture types, B pictures can
be coded with somewhat lower picture quality, since they
are not used as the basis for prediction.
Quant
Scale
Code
15

10
B B B B B B
5 P P
I I
frames
Example showing B pictures with higher quant scale (display order)
(i.e., lower quality).

© 1995-99
109 Sarnoff Corporation
Constant Quality => VBR
• With a fixed quant scale, the bit rate increases with
complexity.
• This implies variable bit rate (VBR) encoding.

Bits/
Picture Constant Quality Encoding for All I-Frame Sequence
(kbits) - Fixed Quant Scale -
500

300 I
I I
I I I
100 I
I I
simple moderately complex
scene complex scene scene
frames
(display order)

© 1995-99
110 Sarnoff Corporation
CBR => Variable Quality
• For many applications, constant bit rate (CBR) encoding is
required.
• This can lead to highly variable image quality.
Bits/
Picture these pictures need these pictures these pictures need
(kbits) more bits are just about right fewer bits
(lower quant scale (increase quant scale)
500 or add stuffing)

300
I I I
I I I
100
I I I
simple moderately complex
scene complex scene scene
frames
(display order)
300 kbit/picture (CBR) Encoding for All I-Frame Sequence
- Variable Quant Scale -

© 1995-99
111 Sarnoff Corporation
CBR Rate Control
• Goal is to achieve high quality at constant bit rate.
• To achieve a constant bit rate, a buffer is used to
smooth out high variability in bits/frame.
• In practice, I frames are often given highest quality,
since they form the basis of prediction for all other
pictures in the GOP.
• As complexity increases, the quant scale, on average, is
increased to avoid buffer overflow.
• To approach constant quality from frame to frame, bits
are “stolen” from simple frames and given to complex
frames.
• To approach constant quality within a frame, bits are
“stolen” from simple areas and given to complex areas.

© 1995-99
112 Sarnoff Corporation
What is the Video Buffer Verifier (VBV)?
• The VBV is a hypothetical input rate buffer for the video decoder, which is connected to
the output of an encoder.
• The encoder keeps track of the VBV fullness, and must ensure that it does not overflow or
underflow.
• Assuming constant end-to-end delay, the encoder buffer is the mirror image of the VBV.

VBV

Video MPEG Video Bitstream Video

Output Rate Buffer Input Rate Buffer (VBV)

MPEG Encoder MPEG Decoder

Tank Fullness
B

Constant
Flow Volume of water (B2-B1) is extracted
instantaneously every T seconds starting at 2T.
B2

Tank fills at constant rate B2/2T until fullness B2

is reached. (Slope = flow rate)

Shuttered Bottom 0 T 2T 3T 4T 5T 6T time

Volume of water
extracted instantaneously MPEG Analogs:
Tank = Video Buffer Verifier (Hypothetical Decoder Buffer)
B = VBV Buffer Size (in Bits)
T = Output Frame Period
Constant Flow = Constant Input Bit Rate = B2/2T bits/sec
Extracted Volume = Coded Bits in Each Picture (B2-B1)
2T = VBV Delay for Each Picture

NOTE: In general, coded bits per picture varies greatly!

Constant
Flow Tank Fullness Overflow!
B
Volume of water (B2-B1)/2 is
extracted instantaneously every
T seconds starting at 2T.

Tank fills at constant rate B2/2T.

Shuttered Bottom 0 T 2T 3T 4T 5T 6T time

Volume of water
extracted instantaneously

Tank Fullness
B
Constant
Flow Volume of water 3*(B2-B1)/2 is extracted
instantaneously every T seconds starting at 2T.
B2

Tank fills at constant rate B2/2T.

Shuttered Bottom 0 T 2T 3T 4T 5T 6T time

Underflow!

Volume of water
extracted instantaneously

NOTE: Slopes are all equal in Constant Bit Rate operation!

b(1)

R
=
e
op
b(3)

Sl
All bits for All bits for

R
=
Picture 1 Picture 4

e
R

op
=

b(2)

Sl
e
op

R
Sl

=
e
op
Sl

-T/2 0 T 2T 3T 4T 5T 6T 7T 8T time
vbv_delay(1) vbv_delay(n) tells decoder how long to wait before
vbv_delay(2) extracting bits for n’th picture, assuming initially empty buffer.

vbv_delay(3) vbv_delay(n) = 90,000*b(n)/R, where R = bit rate in bits/sec.

Note that vbv_delay(n) is therefore proportional to fullness.

Sequence GOP Picture Coded Bits Picture Coded Bits Picture Coded Bits Picture Coded Bits
Header Header Header for Pict 1 Header for Pict 2 Header for Pict 3 Header for Pict 4

vbv_buffer_size vbv_delay(1) vbv_delay(2) vbv_delay(3)

(in units of (in units of (in units of (in units of
16*1024 bits) 90kHz clocks) 90kHz clocks) 90kHz clocks)

VBV
Fullness
Slope
= Ract
Time

VBR: VBV fills at max bit rate until full, then waits

VBV
Fullness
Slope
= Rmax
Time

Problem: A Decoder that could decode any MPEG-2 bitstream

would be prohibitive in terms of memory and
performance. Decoder manufacturers might choose
proprietary subsets of the syntax, preventing
interoperability.

Solution: Pre-defined subsets of the syntax: Profiles & Levels

create “compliance points”

Profile: A defined subset of syntax elements in MPEG-2

(e.g, 4:2:0 only, I/P frames only, field DCT, etc.)

Level: Parameter constraints on those syntax elements (e.g.,

max Picture Size, max Bit Rate, max Vertical Motion
Vector, max Buffer Size, etc.)

© 1995-99
119 Sarnoff Corporation
Profiles and Levels
• Profiles: Simple, Main, SNR, Spatial, High, 4:2:2
• Levels: Low, Main, High-1440, High
• Not all Profile/Level combinations are allowed.
• Main Profile:
- B Frames supported (not so in Simple Profile)
- 4:2:2 and 4:4:4 not supported
- Scalable Modes not supported
- Restricted slice structure
• Main Level:
- max Picture size: 720x576, 30 frames/sec
- max Bitrate: 15 Mbps
- max Buffer size: 1.835008 Mbits
• A Compliance Point is a Profile at a Level,
- e.g., Main Profile at Main Level, “MP@ML”

© 1995-99
120 Sarnoff Corporation
Profiles and Levels
Profile
Level
Simple Main SNR Spatial High 4:2:2
ATSC 1920H 1920H 960H SMPTE
High 1152V 1152V 576V
Formats 60Hz 60Hz 30Hz 308M

High- 1440H 1440H 720H 1440H 720H

1152V 1152V 576V 1152V 576V
1440 60Hz 60Hz 30Hz 60Hz 30Hz

720H 720H 720H 720H 352H 720H

Main 576V 576V 576V 576V 288V 512V/608V
30Hz 30Hz 30Hz 30Hz 30Hz 30Hz

352H 352H
Low 288V 288V
30Hz 30Hz

Max H Size Notes: 1) A split box shows constraints on Enhancement Layer

(left) and Base Layer (right)
Key: Max V Size
Max Frame Rate 2) In general, a compliant decoder must also handle all
lower Profile and Level compliance points.

• Stat mux exploits the fact that the coding

complexities of a selection of video sources,
at any given time, are usually quite different.
• For a large group of video sources, there
might be only one or two “difficult” scenes at
any given time.
• Stat mux uses variable bit rate (VBR)
encoding to give more bits to the more
difficult scenes.

© 1995-99
122 Sarnoff Corporation
Typical Stat Mux Encoder
VBR
Video 1 Bitstream 1
Encoder 1
VBR
Video 2 Bitstream 2 CBR
Encoder 2 Bitstream Multi-Program
• Mux Multiplex
•
• VBR
Video 3 Bitstream 3
Encoder 3
Stat Mux
Controller

• The bit rates of the individual encoders are adjusted so that the
total bit rate is constant.
• Depending on the algorithm, the individual bit rates can be
adjusted at, for instance, a picture or GOP level.

© 1995-99
123 Sarnoff Corporation
Bit Rate and Buffer Issues
• The bit rates and buffer sizes in a stat mux
system cannot be arbitrarily chosen.
• To prevent buffer underflow or overflow, it is
sufficient that the following relationship hold:

rmax
Dsize = rmin Esize

where Dsize = decoder buffer size

Esize = encoder buffer size
rmax = maximum instantaneous bit rate
rmin = minimum instantaneous bit rate

• Stat Mux can increase the number of coded

programs in a fixed bandwidth, without
decreasing the quality of any program.
• Broadcasters love this, since it means
squeezing even more programs into a
channel or transponder!
• Stat Mux R&D is still in its infancy, and
algorithms are highly proprietary.
• Existing Stat Mux products achieve this goal
with varying degrees of success.
© 1995-99
125 Sarnoff Corporation
Practicing the
Art of MPEG

• Blocky Artifacts
– seen when the eye tracks a fast-moving, detailed object
– may also be seen during dissolves and fades
– blocky grid remains fixed while the object moves under it
– caused by poor motion estimation and/or insufficient
allocation of bits

• “Mosquito Noise”
– may be seen at the edges of text, logos and other sharply
defined objects
– the edge causes high freqency DCT terms, which are
coarsely quantized and spread spatially when transformed
back into the pixel domain

• Dirty Window
– streaks or noise appear to remain stationary while objects
move beneath it (like looking through a dirty window)
– the encoder may not be sending enough bits to code the
residual (prediction) error in P and B frames

• “Wavy Noise”
– often seen during slow pans across highly detailed objects,
such as crowds in a stadium
– the coarsely quantized high frequency terms cause
reconstruction errors to modulate spatially as details shift
within the DCT blocks.

© 1995-99
128 Sarnoff Corporation
Where MPEG Compression
Can Perform Poorly
• For types of motion that don’t fit the
linear translation model
– zooms
– rotations
– transparent/translucent moving objects
– dissolves containing moving objects
• For other things that can’t be
predicted well
– shadows
– changes in brightness (fade-ins, fade-outs)
– scene cuts
– highly detailed, uncovered regions
– noise effects
– additive noise
© 1995-99
129 Sarnoff Corporation
Tips for Higher Quality Coding
• Remove Noise
– coding noise wastes valuable bits!
– consider using preprocessing technology that can remove
Gaussian noise, impulse noise, NTSC/PAL decoding artifacts,
film grain, film streaks, etc.
• Code film material at its original frame rate.
– Use high-quality inverse telecine algorithms
• Code material at proper image size
– for the same bit rate, a reduction in coding noise can be
achieved by simply reducing the horizontal image size
– because of interlace, use care when reducing vertical image
size
• Use high-quality Stat Mux algorithms
© 1995-99
130 Sarnoff Corporation
Tips for Higher Quality Encoding (cont’d)

• Rate Control
– over time, improved rate control techniques will become
available
– ultimately, we would like to evaluate the perceptual impact of
each mode decision, and choose the modes that result in the
fewest bits with the lowest perceptual degradation

• Motion Estimation
– the larger the search area, the faster the motion that can be
well predicted
– this comes at a price: full search is good, but is usually too
expensive
– new, hierarchical techniques are being developed that can
approach full search in terms of quality, are closer to “true
motion”, and are not fooled by brightness changes

MPEG-1 Video:
1 - 3 Mbps: CD-ROM Multimedia
Telecommunications and Near Video on Demand

MPEG-2 Video:
3 - 15 Mbps: SDTV Broadcast (e.g., ATSC and DVB)
Digital Video Disk (DVD)

15 - 20 Mbps HDTV Broadcast (e.g., ATSC)

25 - 50 Mbps SDTV Production

100 - 300 Mbps HDTV Production

ATSC Video = MPEG-2 Video

+ ATSC Constraints
+ ATSC Extensions

© 1995-99
133 Sarnoff Corporation
ATSC Video Constraints
• Sequence Layer
– Video Formats as per Table 3 in ATSC Doc. A/53,
Annex A
– Bit Rate <= 19.4 Mb/s
– VBV Buffer Size <= 7.99 Mbit
– Chroma Format 4:2:0
– Component Video Format
• Picture Layer
– VBV Delay <= 0.5 sec (limits channel change
delay)

1920 x 1080 704 x 480 704 x 480

16:9 16:9 4:3

640 x 480 (VGA)

1280 x 720 4:3
16:9

24, 30 and 60 frames/sec allowed

18 formats all together
© 1995-99
135 Sarnoff Corporation
ATSC Video Extension: DTV CC
• DTV Closed Captioning Sent in Video User
Data
• 9600 bps: 10x more capacity than analog VBI
method (EIA-608)
• DTV CC Descriptor indicates
– number of services
– language of each service
– whether CC is limited to Line 21 data
– whether text is tailored to beginning readers
– whether text is formatted to widescreen displays
• See EIA-708B for more details

© 1995-99
136 Sarnoff Corporation
Concluding Remarks
• The MPEG video compression standard is the result
of many years of competitive and, ultimately,
collaborative effort among many commercial and
academic laboratories
• MPEG video compression can increase a
broadcaster’s channel capacity by 8x or more
• MPEG video compression is being used successfully
in many application areas, such as:
– CD-ROM and DVD multimedia
– Satellite Broadcast
– Terrestrial Broadcast
– Cable Broadcast
– Telco Video-on-Demand Systems

© 1995-99
137 Sarnoff Corporation
MPEG-2 Video References
• MPEG-2 Books
– Mitchell, J.L., Pennebaker, W.B., Fogg, C.E., and LeGall, D.J., MPEG Video
Compression Standard, Chapman & Hall, 1997.
– Haskell, B.G., Puri, A. and Netravali, A.N., Digital Video: An Introduction to
MPEG-2, Chapman & Hall, 1997.
– Rao, K.R. and Hwang, J.J., Techniques & Standards for Image, Video and Audio
Coding, Prentice Hall, 1996.
– Weiss, S. Merrill, Issues in Advanced Television Technology, Focal Press, 1996.

• MPEG-2 Video Specification

– ISO/IEC IS 13818-2, “Generic Coding of Moving Pictures and Associated Audio
Information: Video”, January 20, 1995.

• MPEG Web sites

– MPEG Pointers and Resources http://www.mpeg.org
– ATSC http://www.atsc.org
– DVB http://www.dvb.org

Xbox Conexant-CX25870-871 Datasheet
No ratings yet
Xbox Conexant-CX25870-871 Datasheet
291 pages
DGTL BRKDCN 1645
No ratings yet
DGTL BRKDCN 1645
40 pages
Batsocks - Monochrome Composite Video
No ratings yet
Batsocks - Monochrome Composite Video
1 page
NEC 61XM3 61 Plasma TV Manual
No ratings yet
NEC 61XM3 61 Plasma TV Manual
40 pages
Digital Image TEXT ABOUT POST PRODUCTION
No ratings yet
Digital Image TEXT ABOUT POST PRODUCTION
9 pages
Here, The Odd-Numbered Lines Are Traced First, Then The Even-Numbered Lines
No ratings yet
Here, The Odd-Numbered Lines Are Traced First, Then The Even-Numbered Lines
4 pages
MPEG Video Compression Technology and Testing
100% (1)
MPEG Video Compression Technology and Testing
119 pages
Soc 4030
No ratings yet
Soc 4030
1 page
BondTester EN 201310
No ratings yet
BondTester EN 201310
4 pages
BRKDCN 1645
No ratings yet
BRKDCN 1645
62 pages
5G SSV Template HWI Riyadh
No ratings yet
5G SSV Template HWI Riyadh
45 pages
Swiftnet: Real-Time Video Object Segmentation
No ratings yet
Swiftnet: Real-Time Video Object Segmentation
10 pages
The Video Medium: Digital Camera Mechanics Episodic TV CTPR 479
No ratings yet
The Video Medium: Digital Camera Mechanics Episodic TV CTPR 479
23 pages
CX25870_ConexantSystems
No ratings yet
CX25870_ConexantSystems
291 pages
GE Logiq-200-Pro - Training
No ratings yet
GE Logiq-200-Pro - Training
39 pages
SPCA711A: Digital Video Encoder For Videocd
No ratings yet
SPCA711A: Digital Video Encoder For Videocd
20 pages
STV 2310
No ratings yet
STV 2310
113 pages
NeuroQuant Scanner Parameters - NQ3.1 - Rev00
No ratings yet
NeuroQuant Scanner Parameters - NQ3.1 - Rev00
18 pages
GE MRI Acronym Chart QRG
No ratings yet
GE MRI Acronym Chart QRG
5 pages
Brkewn 2339
No ratings yet
Brkewn 2339
160 pages
RIY0862 5G SSV - Mate20x
100% (1)
RIY0862 5G SSV - Mate20x
57 pages
Panasonic AG 500
No ratings yet
Panasonic AG 500
80 pages
Pan Th42pv500e
No ratings yet
Pan Th42pv500e
260 pages
BeoVision Avant 32
No ratings yet
BeoVision Avant 32
1 page
Service Manual: Trinitron Color TV
No ratings yet
Service Manual: Trinitron Color TV
140 pages
AS-410 Datasheet
No ratings yet
AS-410 Datasheet
4 pages
MAX7456 OSD Generator Monocrom
100% (1)
MAX7456 OSD Generator Monocrom
44 pages
Flyer N4s English
No ratings yet
Flyer N4s English
2 pages
BRKARC 3147 IOS XE Troubleshooting
No ratings yet
BRKARC 3147 IOS XE Troubleshooting
211 pages
Lorex Pentaplex Ds
No ratings yet
Lorex Pentaplex Ds
2 pages
THC 90 A
No ratings yet
THC 90 A
109 pages
4 - Establishing Measurements
100% (1)
4 - Establishing Measurements
38 pages
Signal Analyzers X-Series
No ratings yet
Signal Analyzers X-Series
19 pages
Sony Bg3r Chassis Kvxa21m83
100% (5)
Sony Bg3r Chassis Kvxa21m83
92 pages
Av21vt15 PDF
No ratings yet
Av21vt15 PDF
50 pages
Lesson Plan CSC567 20242
No ratings yet
Lesson Plan CSC567 20242
4 pages
Agenda Kist Er Special-WebEx-Training
No ratings yet
Agenda Kist Er Special-WebEx-Training
2 pages
Monitor Ikegami
No ratings yet
Monitor Ikegami
1 page
Mpeg-2 TS Systems
No ratings yet
Mpeg-2 TS Systems
71 pages
Service Manual: AV-20NX3
No ratings yet
Service Manual: AV-20NX3
39 pages
Manual Geovision
No ratings yet
Manual Geovision
2 pages
Digital Video Signal
No ratings yet
Digital Video Signal
29 pages
bt869krf PDF
No ratings yet
bt869krf PDF
105 pages
FLIR Axxx-Series Image Streaming - LTR - en-US
No ratings yet
FLIR Axxx-Series Image Streaming - LTR - en-US
2 pages
Ateme MPEG-4 AVC H264 Motion Estimation IP Datasheet
No ratings yet
Ateme MPEG-4 AVC H264 Motion Estimation IP Datasheet
13 pages
Building Data Center Networks With VXLAN EVPN Overlays - Part II
No ratings yet
Building Data Center Networks With VXLAN EVPN Overlays - Part II
150 pages
Panasonic TH-50PV80H Plasma TV
100% (2)
Panasonic TH-50PV80H Plasma TV
5 pages
Master The Basics - Oscilloscopes: Heng Wee Boo
No ratings yet
Master The Basics - Oscilloscopes: Heng Wee Boo
28 pages
LivescopePlus DM EN
No ratings yet
LivescopePlus DM EN
2 pages
HR-P91K: Video Cassette Player
No ratings yet
HR-P91K: Video Cassette Player
15 pages
CH 11
No ratings yet
CH 11
53 pages
Palmcoder Pvl557 Service Manual
No ratings yet
Palmcoder Pvl557 Service Manual
7 pages
Datasheet Camera5mp
No ratings yet
Datasheet Camera5mp
2 pages
Sony Camera - DFW-SX910
No ratings yet
Sony Camera - DFW-SX910
4 pages
Kde 37
No ratings yet
Kde 37
169 pages
Snla 207
No ratings yet
Snla 207
10 pages
3P61 Service Manual PDF
No ratings yet
3P61 Service Manual PDF
17 pages
Tek Tas465 in 1993 Catalog
No ratings yet
Tek Tas465 in 1993 Catalog
4 pages
Short Films and Online Content Dictionary: Grow Your Vocabulary
From Everand
Short Films and Online Content Dictionary: Grow Your Vocabulary
Blake Pieck
No ratings yet
AMOS Guide
100% (1)
AMOS Guide
12 pages
Alarms List
No ratings yet
Alarms List
2 pages
Fundamentals of Cellular Network Planning and Optimisation-eBook
No ratings yet
Fundamentals of Cellular Network Planning and Optimisation-eBook
290 pages
Make Sectors Help2
50% (2)
Make Sectors Help2
9 pages
UMTS Optimization Question & Answer - Wireless Exam
100% (1)
UMTS Optimization Question & Answer - Wireless Exam
8 pages
Marantz SR 5500 Manual
No ratings yet
Marantz SR 5500 Manual
41 pages
Chapter 6: Electricity and Magnetism 6.1 Generation of Electricity
No ratings yet
Chapter 6: Electricity and Magnetism 6.1 Generation of Electricity
12 pages
Mini Max User Manual v3.03
No ratings yet
Mini Max User Manual v3.03
110 pages
11 Chemistry CBSE Chemical Bonding
No ratings yet
11 Chemistry CBSE Chemical Bonding
3 pages
Effect_of_Temperature_Project
No ratings yet
Effect_of_Temperature_Project
9 pages
Catalog Sommerkamp 2010
No ratings yet
Catalog Sommerkamp 2010
20 pages
Series 2380: Programmable DC Electronic Loads
No ratings yet
Series 2380: Programmable DC Electronic Loads
5 pages
Epl Electronics Lab Manual
No ratings yet
Epl Electronics Lab Manual
40 pages
Aec Unit-5
No ratings yet
Aec Unit-5
29 pages
IEC Low-Voltage Surge Protective 2014
No ratings yet
IEC Low-Voltage Surge Protective 2014
58 pages
Nte5340 44
No ratings yet
Nte5340 44
1 page
Ie CP Report
No ratings yet
Ie CP Report
5 pages
ELECTROSTATICS
No ratings yet
ELECTROSTATICS
4 pages
Oksyv, LH Oksyvrk Ls de Osq Flopfx J VKSJ Fu A K.KFX J Osq Fu A K.K V Akliqkezjksa Fof'Kf"V
No ratings yet
Oksyv, LH Oksyvrk Ls de Osq Flopfx J VKSJ Fu A K.KFX J Osq Fu A K.K V Akliqkezjksa Fof'Kf"V
13 pages
ASUS W7J Schematic Diagrams
100% (1)
ASUS W7J Schematic Diagrams
64 pages
SAK-C167CR-LM Infineon Elenota - PL PDF
100% (1)
SAK-C167CR-LM Infineon Elenota - PL PDF
23 pages
Training - Battery-Based-PV-Systems-with-OutBack-Power PDF
No ratings yet
Training - Battery-Based-PV-Systems-with-OutBack-Power PDF
97 pages
ECE 1071 PHY 19 Apr 2023
No ratings yet
ECE 1071 PHY 19 Apr 2023
4 pages
S5000 Gas Monitor
No ratings yet
S5000 Gas Monitor
8 pages
Syllabus Ece
No ratings yet
Syllabus Ece
85 pages
Introduction To Farji
No ratings yet
Introduction To Farji
1 page
Scada Project 1326860164
No ratings yet
Scada Project 1326860164
86 pages
HHH
No ratings yet
HHH
38 pages
Cdot Question Bank
No ratings yet
Cdot Question Bank
71 pages
A Low-Power Viterbi Decoder Design For Wireless Communications Applications
No ratings yet
A Low-Power Viterbi Decoder Design For Wireless Communications Applications
5 pages
Laporan Man Power Dan Man Hours Juni 2019
No ratings yet
Laporan Man Power Dan Man Hours Juni 2019
2 pages
Foto de Exito - 02.10.24 VF
No ratings yet
Foto de Exito - 02.10.24 VF
54 pages
Digital&Analog Faceplate User Instructions
No ratings yet
Digital&Analog Faceplate User Instructions
15 pages
Syntime
No ratings yet
Syntime
2 pages
PCB Design For Low-EMI DC - DC Converters
No ratings yet
PCB Design For Low-EMI DC - DC Converters
6 pages