Mpeg-2 Basics
Mpeg-2 Basics
This slide & speaker notes based tutorial, covers the very basics of
how MPEG-2 works. That is how a high bit rate digital video signal is
taken from a studio camera. Compressed down to a low enough bit
rate, so that it can make economic use of available transmission
bandwidth. And converted into a form that a consumer set-top box
can decode.
The tutorial covers the creation of the elementary stream, the
packetised elementary stream, and the transport stream multiplex.
It discusses the use of Programme Specific Information, and also the
extensions created by the DVB, known as the DVB Service
Information.
It also tries to give a feel for the fragility of the MPEG-2 transport
stream. This is of major importance to service providers, wishing o
guarantee quality of service.
Compress
Reduce the overall
into MPEG Audio video bit rate
MPEG-2 Packetized
Elementary Stream
Conver the video, audio, and
data into a product
MPEG-2
Transport Stream
Data Link
(FEC)
AAL-1
DVB
standards
ITU-T
standards
QAM
SDH
VSB
QPSK
PDH
DS1, E1, J2, DS3, E3
This slide shows the protocol levels that are used to transport the digital
video signal, from the studio to the consumer.
We are interested in the section between red dashed lines.
Transport stream protection is covered in another tutorial.
The protected transport stream is then fed into whichever physical channel is
used to deliver the video signal to the consumer.
Note this architecture is point to multi-point, constant bit rate, broadcast
orientated: which is why only AAL-1 is shown, not AAL-5.
YUV
RGB
_.
0110
0101
1101
0110110
0101100
1101111
1
011
000010
Blocks
Remove spatial
redundancy
Macro blocks
Remove temporal
redundancy
0110
0101
1101
Quantization
Huffman Coding
Zig Zag
Scan
Spatial redundancy:
Pixel coding using the DCT
Pixel amplitude values
Blocks
YUV
RGB
0 0
0 0
0
0
0
0
0
0
0
0
0
0
0 0
0 0
0
0
0 0
0 0
0 0
Spatial redundancy:
Quantization & Entropy coding
_.
0110
0101
1101
1
011
000010
Entropy
Coding
0110110
0101100
1101111
HF DCT freq
coefs are lost
0110
0101
1101
Quantization
Zig
Zag
Scan
H
Discrete Cosine
Transform
Authoring Division Name File Name
Security Notice (if required)
The higher the DCT frequency, the higher the Quant Matrix value its divided
by. This makes many coeficientss go to zero.
The fixed value scale factor reduces even more of the DCT's to zero.
The next stage is to increase the number of zero's in the run of bits into the
entropy coder. This is done by zig-zag scanning the 8x8 pixel block DCT
values and helps the entropy coder do its job.
Entropy coding essentially sizes coeficients by how often they occur.
The more a coeficient occurs, the smaller a binary value its given.
Since in any frame your going to get a large number of identical 8x8 blocks,
your reducing the overall binary data rate.
To summarize then, quantization makes many higher frequency DCT values
go to zero. Entropy coding removes duplication of DCT's, assigning each
DCT position with a pointer to its value.
This all has a cost. Thats shown in the pictures above: the upper
picture is unquantized, the lower one quantized.
Temporal redundancy:
Inter-frame prediction & motion estimation
Search Area
Macro blocks
Time
Best Match Position
Current
Macroblock
This is were the real bit rate reduction kicks in. As we'll cover in the next
slide, there are three different frame types.
By just doing spatial redundancy on a frame you create an I frame. This has
all the information necessary to decode the picture.
The next stage is to look at the next frame to this and see how similar it is.
You can do three things to minimize this frames bit rate.
Firstly, look to see if the macroblock in the same position in the next frame
hasn't change. If it hasn't, don't do any coding, Just transmit that its the same.
The next stage is to search around in the I-frame and see if this macro-block
exists, but its in a different place. If so transmit motion vectors for its old
location.
Only if its completely new, do you go for the complete intra-coding process.
This really reduces the overall bit rate from frame to frame.
But note if you kept predicting each frame from the last, it would only
take a little error, and the whole process would fast start to unravel.
Thats why there are three different frame types, and a specific frame
transmit process.
The Intra Frames contain full picture information. These are your lifeline, if
errors occur, or the decoder loses a frame. Without periodic transmission of
these the whole process falls apart. But the I-frames are the least
compressed.
Predicted (P) Frames are predicted from past I, or P frames,
Bi-directional predicted frames offer the greatest compression and use past
and future I & P frames for motion compensation. But they are the most
sensitive to errors.
The encoder will cycle through each frame and decide whether to do I, P, or
B coding. The order will depend on the application. But roughly every twelve
frames, an I-frame is created.
If the encoder didn't do this, any small errors would build up and the MPEG
compressor would rapidly descend into an electronic form of Entropic "heat
death".
The process detailed in the last few slides does the real work. But a decoder
needs additional information to reconstruct the frames.
Picture
Height
Aspect
Ratio
Bit
Rate
Picture
Rate
Sequence
Sequence
Sequence
Sequence
Sequence
Header
Header
Header
GOP
Header
Frame
Header
Frame 1
Frame
Header
Macroblocks
Block
Frame N
Slice
Temporal
references
Frame
Header
Frame
Type
Slice
Header
Address
Type
VBV
Delay
.........
Extension
Frame
start code Structure
Macroblocks 1 to N
Slice
Header
Quantizer
Quantizer
Scale
Scale
Motion
Vectors
.........
Macroblocks 1 to N
Coded
Coded
Block
Pattern
Block pattern
etc.
Block 1
Block 2
Block3
This slide shows how the actual blocks, slices, frames etc. are all put
together to form the elementary stream.
Along with the actual picture data, header information is required to
reconstruct the I, B, P frames. This header structure is shown.
Each slice will contain a header detailing its contents & location.
Each frame will have a header, and each group of I, B, P frames, known as a
Group Of Pictures (GOP) will have a header.
The next stage is to take this ES and convert it into something that can be
transmitted and decoded at the other end.
At this stage, the elementary stream is a continual stream of encoded video
frames. Though all the data required to reconstuct frames exists here. No
timing information or systems data is contained
Thats the job of the MPEG-2 multiplexer
First a few words on what we do with the audio signal associated with the
video
Sub-Band Masking
Before
0
20,000
Frequency
20,000
After
0
Frequency
FRAME
FRAME
FRAME
FRAME
FRAME
FRAME
FRAME
FRAME
FRAME
FRAME
FRAME
FRAME
FRAME FRAME
FRAME
FRAME
FRAME FRAME
B
FRAME FRAME
FRAME
PES Packet
stream
id
10
PES
packet
length
Optional
PES
header
PES
PES
Scramnling
priority
control
stuffing
bytes (FF)
data
alignment copyright
indicator
PTS DTS
original
or copy
ESCR
PES
header
data
length
7 flags
ES rate
DSM
trick
mode
optional fields
additional
PES CRC
copy info
5 flags
PES
private
data
pack
header
field
program
packet
sequence
counter
PES extension
optional fields
P-STD
buffer
PES extension
field
PID
I
13
Continuity Adaptation
Counter
Field
PES 2
PES 1
PES N
Flags
Length
8
.......................
Optional
Fields
Stuffing
Bytes
Discontinuity Indicator
Random Access Indicator
PES Priority
PCR
OPCR
42
42
Splice
Countdown
Private
Data Length
8
Private
...
Data
Optional Fields
Flags
PCR
31
Stream 2
Video 1
54
Stream 1
PCR
41
Stream 3
Audio 1
48
Program 0
PID = 16
Stream 2
Video 1
19
Stream 4
Audio 2
49
Program 1
PID = 22
Stream 3
Audio 1
81
-----------
------
----
Program 2
PID = 33
Stream 4
Audio 2
82
Stream k
Data k
66
-------------
-----
-----------
------
----
Program K
PID = 55
Stream k
Data k
88
PAT
22
Prog 1 PMT
33
Prog 2 PMT
99
Prog 1 EMM
31
Prog 1, PCR
48
Prog 1, audio 1
54
CA Section 1 (programme 1)
Privare Section 1
NIT info
CA Section 2 (programme 2)
Private Section 2
NIT info
Private Section 3
NIT info
CA Section 3 (programme 3)
----------CA Section k (programme k)
109
Prog 2 EMM
ETC.
Conditional Access
Table (always PID 1)
Prog 1, video 1
-----------
------
Private Section k
NIT info
To reconstruct the PES, the PSI uses a series of identifiers known as the
Programme Identifiers, or PID's.
Once the programme to be decoded is known, the decoder searches for
PID=0 - the Programme Association Table PID.
The PAT contains the PID's of all the PMT's for the programmes contained.
We assume programme one has been chosen for decode.
The PMT for programme 1 is identified via its PID (22), extracted from the
transport stream packets containing it, and decoded.
Prog 1's PMT contains all the PID's for Prog1's video, audio and data
packets. These must be put together to reconstruct the PES.
Prog 1's timing info, required for decode, is contained in a transport packet,
identified by the PCR PID (31). Each prog has a PCR.
PID zero is always used to identify the CAT. This is needed to find out
whether the consumer is allowed to decode and view prog 1.
The CAT contains all the PID's identifying the EMM's for all progs.
The NIT, contains information about the user-selected service. Such as
channel freq's transponder numbers etc. The NIT is always associated with
prog 0's PID. See DVB-SI for more about the NIT.
WIthin the PSI structure, is a table known as a private table. This was
created by MPEG, so that service providers could create their own extentions
to the MPEG-2 PSI.
This has been used by the DVB and the ATSC, to create what are called
Service Information tables. We discuss the DVB-SI only.
These are used to carry information about what tv events are contained in the
transport stream: via the Event Information Table.
What services are being carried, via the Service Desciption Table.
What groups of services, with common themes exist: via the Bouquet
Association Table.
And what are the physical parameters of the network carrying a transport
stream: via the Network Information Table.
The tables are highly complex constructs. And like the PSI, are protected by
a CRC-32.
The SI are used to create Electronic Programme Guides (EPG's). These help
the consumer find the programme they want to watch.
The example shows the process by which a user goes from the display of
bouquet's all the way to viewing a Rugby match.
MPEG Technology
Program Clock Reference - PCR
27 MHz
Clock
Clock increments
PCR counter
Frequency Locked
27MHz System
Time Clock
TSP
PCR
TSP
transport
network
Local PCR Cntr
VCXO
increments local
counter
27MHz
VCXO
TSP
Received
PCR
PCR
TSP
TSP
TSP