4 Chapter Audio and Video Compression (1)
4 Chapter Audio and Video Compression (1)
• 4.1 Introduction
• 4.2 audio compression
• 4.3 Video compression
4.1 introduction
• Both audio and most video signals are
continuously varying analog signals
• The compression algorithms associated with
digitized audio and video are different
4.2 Audio compress
• Pulse code modulation(PCM)
-requires sampling of analog signal at require
rate
• Bandlimited signal
• The bandwidth of the communication
channels that are available dictate rates that
are less than these.This can be achieved in
one of two ways:
– Audio signal is sampled at a lower rate
– A compression algorithm is used
4.2.1 Differential pulse code modulation
• DPCM is a derivative of standard PCM
The decoder adds the DPCM with previously computed signal in the reg
The o/p of filters are lower band signal and upper sub band signal
The speech signal generated by the vocal tract model in the decoder is the
present o/p signal of speech synthesizers and linear combination of previous
set of model coefficients
Encoder determines and sends a new set of coefficients for each quantized
segment
The output of encoder is a set of frames ,each frame consists of fields for
pitch and loudness
Bit rates as low as 2.4 or 1.2 kbps. Generated sound at these rates is very
synthetic and LPC encoders are used in military applications, where
bandwidth is important
4.2.5 Code-excited LPC
• Code-excited LPC
– The synthesizers used in most LPC decoders are
based on a very basic model of the vocal tract
• In the CELP model,instead of treating each
digitized segment independently for encoding
purpose a limited set of segments are used.
• All coders of this type have a delay
associated with them which is incurred while
each block of digitized samples is analyzed
by the encoder and the speech is
reconstructed at the decoder
• Associated delay – Processing delay,
algorithmic delay and lookahead
CELP
• Template codebook held by encoder nad
decoder before computing
• Codeword that is sent selects a particular
template from codebook whose difference
value best match those quantized by
encoder
• Hence improvement in sound quality is
obtained.
4.2.6 Perceptual coding
• Perceptual encoders have been designed for
the compression of general audio
• Perceptual coding since its role is to exploit a
number of the limitation of the human ear.
• Sensitivity of the ear
– A strong signal may reduce the level of sensitivity
of the ear to other signals which are near to it in
frequency
– The model used is Psychoaccoustic
4.2.6 Perceptual coding -cont
– The Sensitivity of the ear varies with the frequency
of the signal,the perception threshold of the ear –
that is, its minimum level of sensitivity-as a
function of frequency is show in figure 4.5(a)
– Most sensitive to signals in the range 2-5kHz
– Fig a- vertical axis – amplitude of all signals to be
heard , A and B have same level , A is heard due
to threshold level
Shown 4.5(b) shows how the the sensitivity of the ear
changes in the vicinity of a loud signal
• ENCODING
– Input signal is first sampled and quantized using
PCM
– The bandwidth that is available for transmission is
divided into a number of frequency subbands
using a bank of analysis filters/ critical band filters
– Scaling factor:
• The analysis filter band also determines the maximum
amplitude of the 12 subband samples in each subband,
each known as scaling factor
Output of this is passed both to Psychoaccoustic
model and the quantizer block
CD-quality of
3 CD-quality 64kbps 60ms
64kbps per
channel
•The Psychoaccoustic model controls the accuracy
of quantization by computing
• Frame type
– I-frame- Intracoded
• I-frames are encoded without reference to any
other frames
• Group of Pictures (GOP):The number of frame
between successive I-frames
– P-frame:intercoded
• encoding of a p-frame is relative to the contents of
either a preceding I-frame or a preceding P-frame
GOPs consist of I-frames (keyframes), P-frames (predicted frames), and B-frames
(bidirectionally predicted frames).
• The number of P-frames between I-frame is
limited since any errors present in the first P-
frame will be propagated to the next
– B-frame:their contents are predicted using search
regions in both past and future frames
– PB-frame:this does not refer to a new frame type
as such but rather the way two neighboring P- and
B-frame are encoded as if they were a single
frame
– D-frame:only used in a specific type of application.
It has been defined for use in movie/video-on-
demand application
MOTION ESTIMATION AND COMPENSATION
P frame encoding
For encoding p frame, the contents of each macroblock in the
target frame is compared with the corresponding macroblock in
the I or Pframe – reference frame.
Third mv and the difference is calculated using target and the mean of
the two predicted values.
The set with least difference matrices is selected and encoded similar
to p frame.
Mv resolution is termed as half pixel resolution.
B frame encoding
IMPLEMENTATION ISSUES
For I frame three steps are : Forward DCT, Quantization and entropy coding
For four block of Y, two for chrominance, each macroblock requires 8x8
pixel blocks to be encoded.
If the two contents are same, only address in the macroblock in ref frame is
encoded
If the two contents are very close, both the mv, the difference matrices are
encoded.
After the target frame is completely compressed the difference values are used
to update the ref frame contents for next frame encoding
Coded block pattern defines which of six 8x8 pixel block make up macroblock and
the JPEG encoded DCT COEFFICIENTS are given in each block
Temporal ref field- time stamp to synchronize video block with the associated audio
block of the same time stamp
GOB- GROUP OF MACROBLOCKS (size is chosen such that CIF and QCIF has
integral number of GOBs)
o/p of the buffer is defined by the transmission bit rate, two threshold values are
defined low and high
If contents of buffer is below the low threshold ,quantization threshold is reduced and
the o/p rate is increased, if it is above high threshold then the threshold is increased
and the o/p rate is reduced
• Frame types:
– I-frame
– P-frame
– B-frame
– PB-frame:because of the much reduced encoding
overhead
• Unrestricted motion vectors
• Mv associated with predicted macroblocks
are restricted to defined area and the search
area is restricted to the edge of the frame.
• Error resilience
– Cause error propagation,show figure4.17(a)
– For PSTN , errors present in bitstream is more
– Difficulty in finding the error macroblock
– -GOB(group of macroblocks) may contain any error
macroblock also.
– When error in GOB is detected it skips the remaining
macroblocks in the affected GOB and finds
resynchronization marker.
– Masking of error – error concealment scheme.
Error propagation
It leads to error propagation to other regions of the
frame
•Error tracking
The encoder can select any of the previously decoded frame as ref.
Mixed mode- mv for both field and frame modes are computed and mean
value selected
4.3.6 MPEG-2 -cont
• 3 standards with HDTV- ATV – North America,
• DIGITAL VIDEO BROADCAST (DVB)-Europe
• Multiple sub-Nyquist sampling encoding (MUSE)-
Japan, Asia
• These standards define the transmission of bitstream
over network
– DVB HDTV
• 4/3 ASPECT RATIO (1440 x 1152 pixels)
• SSP@H1440-SPATIALLY-SCALEABLE
PROFILE AT HIGH 1440
• Compression std- MPEG audio layer 2
• MUSE- 16/9 ASPECT RATIO
• 1920 SAMPLES/LINE AND 1035 PER
FRAME
4.3.7 MPEG-4
• Application of MPEG-4 is related to Audio and
Video associated with interactive multimedia
applications over internet and entertainment
networks
• Contains features to enable a user to access
and manipulate individual elements of picture
• Due to high coding efficiency, used for low bit
rate networks such as wireless and PSTNs.
• Alternate to H.263, supports low bitrates also
• MPEG-4 has:
– Content-based functionalities-each scene is defined in the form
of background and one or more foreground objects -Audio-visual
object(AVOs)
– Each AVO- is defined as one or more audio and video objects
– Object descriptor- each audio and video object origin description
is required for manipulation
– Binary format for scenes (BIFS)- language used for modifying
objects( example for video objects changing shape, color,
appearance and for audio objects changing volume etc..)
– Scene descriptor- contains composition of a scene.
➢ Defines how the various AVOs are related to each other in the
context of the complete scene
--Figure 4.23- A frame/scene is defined in the form of a number of
AVOs
---Each video frame is segmented into a number of Video object
planes(VOPs), each of which corresponds to an AVO of interest
• In the example frame is shown as consisting of three
VOPs:
➢ VOP 0 – To represent the person approaching the car
➢ VOP 1 – the remainder of the car parked outside the
house
➢ VOP 2 – the remainder of the background
➢ Each VOP is encoded separately based on its shape,
motion and texture
➢ Each VOP is encapsulated within a rectangle. It is chosen
so that it completely covers the related AVO using the
minimum number of macroblocks
EACH VOP in the rectangle which refers the related AVO WITH min MB.
The motion and texture of each VOP is also encoded and the bitstream is
multiplexed together with related object and screen descriptor Information .
Different algorithms like G.723.1, DOLBY AC3, MPEG LAYER 2 USED for
different applications
EACH object in the VOP is based on the similar properties of texture, color and
brightness and such video objects bounded by rectangle containing min MB is
encoded based on shape, motion and texture
Any VOP which has no motion associated with it will produce a minimum of
compressed information
VOPs which move often occupy only a small portion of the scene/frame
Bitrate of the multiplexed video stream is much lower than that obtained with the
other standards.
DECODER
4.3.7 MPEG-4 -cont
• Transmission format(figure 4.25)
– Transport stream – information of frame and n/w
– Packetized elementary- TS consists of multiplexed
stream of PES
– Elementary stream(ES)- compressed audio and
video information relating to each AVO
– Each PES contains type field in the packet header
– FlexMux layer- routes the PES to synchronization
block
– Synchronization layer- consists of synchronized blocks
– Elementary stream descriptor(ESD)- carries
compressed audio and video for each AVO and routes
to the decoder.
– Contains timing stamps – timing synchronization block
– Composition and rendering block- decompressed data
and screen descriptor information is used to compose
frame.
MPEG4 DECODER
4.3.7 MPEG-4 -cont
• Error resilience techniques are for
transmission channels (figure 4.26)
– Use of fixed-length instead of GOB
– Based on reversible VLCs- a new variable length
coding
– Error occur
• macroblock
• header
Video packets –
IN MPEG4, compressed bitstream is divided into equal no of bits and this group
– video packet, seperated by resynchronization marker
IN fig c video packet format shows the MB block- no of first MB , MVS are
limited to boundaries of the MB.
The header contains copy of picture /frame level parameters which are
identified by header extension code – HEC bit
4.3.7 MPEG-4 -cont
• Reversible VLCs (figure 4.27)
– The associated set of RVLCs is then produced by
adding a fixed—length prefix and suffix to each of
the corresponding VLCs and decoded in
– Forward direction scan
– Reverse direction scan
– The error at difference points in the bitstream
resulting in an overlap region
– In the ex each VLC has a hamming weight of 1,
fixed length and sufix single binary and hence
each RVLC has 3 binary 1s in it, when decoding a
string can be identified by no. of 1s and the last 1
indicates end
Ex 4.2