0% found this document useful (0 votes)
9 views

4 Chapter Audio and Video Compression (1)

Chapter 4 discusses audio and video compression techniques, highlighting the differences in algorithms for digitized audio and video signals. It covers various audio compression methods such as PCM, DPCM, ADPCM, and perceptual coding, as well as video compression principles including frame types and motion estimation. The chapter emphasizes the importance of psychoacoustic models in audio encoding and the role of motion compensation in video encoding.

Uploaded by

sadi22ece
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

4 Chapter Audio and Video Compression (1)

Chapter 4 discusses audio and video compression techniques, highlighting the differences in algorithms for digitized audio and video signals. It covers various audio compression methods such as PCM, DPCM, ADPCM, and perceptual coding, as well as video compression principles including frame types and motion estimation. The chapter emphasizes the importance of psychoacoustic models in audio encoding and the role of motion compensation in video encoding.

Uploaded by

sadi22ece
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 122

Chapter 4 Audio and video compression

• 4.1 Introduction
• 4.2 audio compression
• 4.3 Video compression
4.1 introduction
• Both audio and most video signals are
continuously varying analog signals
• The compression algorithms associated with
digitized audio and video are different
4.2 Audio compress
• Pulse code modulation(PCM)
-requires sampling of analog signal at require
rate
• Bandlimited signal
• The bandwidth of the communication
channels that are available dictate rates that
are less than these.This can be achieved in
one of two ways:
– Audio signal is sampled at a lower rate
– A compression algorithm is used
4.2.1 Differential pulse code modulation
• DPCM is a derivative of standard PCM

• for most audio signals, the range of the


differences in amplitude between successive
samples of the audio waveform is less than
the range of the actual sample amplitudes.
• Figure4.1
The previous digitized sample value is held in reg R

Difference signal is by subtracting (Ro) from the digitized sample of ADC

Reg R is updated with the difference signal

The decoder adds the DPCM with previously computed signal in the reg

The o/p of ADC is also known as residual

There are schemes to predict the more accurate previous signal

The proportions used are determined by predictor co-efficients


4.2.2 Adaptive differential PCM
• Additional savings in bandwidth –or improved
quality –can be obtained by varying the
number of bits used for the difference signal
depending on its amplitude
• A second ADPCM standard ,which is G.722.It
has added subband coding.better sound
quality
• A third standard based on ADPCM is also
available.this is defined in G.726.This also
uses subband coding but with a speech
bandwidth of 3.4kHz
For higher signal bandwidth, before sampling the i/p signal is passed
through filters .

The o/p of filters are lower band signal and upper sub band signal

It is sampled and encoded independently using ADPCM

Two bitstreams are multiplexed to produce transmitted signal and the


decoder divides into separate stream for decoding
4.2.3 Adaptive Predictive Coding(APC)

• Even higher levels of compression-but at


higher levels of complexity-can be obtained
by also making the predictor coefficients
adaptive.This is the principle of adaptive of
adaptive predictive coding
Linear predictive coding

All algorithms – sampling, digitization and quantization using


DPCM / ADPCM

DSP crcuits help in analyzing the signal based on the required


features (perceptual) and then quantized

Origin of sound is also important – vocal tract excitation


parameters

Voiced sounds-generated through vocal chords

Unvoiced sounds – vocal chords are open

These are used with proper model of vocal tract to produce


synthesized speech
4.2.4 Linear predictive coding
• After analyzing the audio waveform , These are
then quantized and sent and the destination
uses them,together with a sound synthesizer,to
regenerate a sound that is perceptually
comparable with the source audio signal.this is
LPC technique.
• Three feature which determine the perception
of a signal by the ear are its:
– Pitch
– Period
– Loudness
• Basic feature of an LPC encoder/decoder:
figure 4.4
The i/p waveform is first sampled and quantized at a defined rate

Segment- block of sampled signals are analyzed to define perceptual


parameters of speech

The speech signal generated by the vocal tract model in the decoder is the
present o/p signal of speech synthesizers and linear combination of previous
set of model coefficients

Hence the vocal tract model is adaptive

Encoder determines and sends a new set of coefficients for each quantized
segment

The output of encoder is a set of frames ,each frame consists of fields for
pitch and loudness

Bit rates as low as 2.4 or 1.2 kbps. Generated sound at these rates is very
synthetic and LPC encoders are used in military applications, where
bandwidth is important
4.2.5 Code-excited LPC
• Code-excited LPC
– The synthesizers used in most LPC decoders are
based on a very basic model of the vocal tract
• In the CELP model,instead of treating each
digitized segment independently for encoding
purpose a limited set of segments are used.
• All coders of this type have a delay
associated with them which is incurred while
each block of digitized samples is analyzed
by the encoder and the speech is
reconstructed at the decoder
• Associated delay – Processing delay,
algorithmic delay and lookahead
CELP
• Template codebook held by encoder nad
decoder before computing
• Codeword that is sent selects a particular
template from codebook whose difference
value best match those quantized by
encoder
• Hence improvement in sound quality is
obtained.
4.2.6 Perceptual coding
• Perceptual encoders have been designed for
the compression of general audio
• Perceptual coding since its role is to exploit a
number of the limitation of the human ear.
• Sensitivity of the ear
– A strong signal may reduce the level of sensitivity
of the ear to other signals which are near to it in
frequency
– The model used is Psychoaccoustic
4.2.6 Perceptual coding -cont
– The Sensitivity of the ear varies with the frequency
of the signal,the perception threshold of the ear –
that is, its minimum level of sensitivity-as a
function of frequency is show in figure 4.5(a)
– Most sensitive to signals in the range 2-5kHz
– Fig a- vertical axis – amplitude of all signals to be
heard , A and B have same level , A is heard due
to threshold level
Shown 4.5(b) shows how the the sensitivity of the ear
changes in the vicinity of a loud signal

Frequency masking- when multiple signals present ,


strong signal s reduce the sensitivity level of ear to
other signals in the nearer freq
4.2.6 Perceptual coding -cont
– The masking effect also varies with frequency as
show in figure 4.6
– Critical bandwidth- width of each curve at a
particular signal level for that freq . For freq less
than 500 hz critical bw is 100 hz and greater than
500 it changes .
• Temporal masking:
– When the ear hears a loud sound,it takes a short
but finite time before it can hear a quieter sound
– Masking effect varies with freq-fig 4.6
– SHOWN IN 4.7 –effect of temporal masking –
signal amplitude decays after a time period after
the loud sound ceases and at this time signal
amplitude less than decay envelope will not be
heard.
4.2.6 Perceptual coding-cont (figure4.6)
4.2.6 Perceptual coding -cont

– The masking effect also varies with frequency as


show in figure 4.6
– Critical bandwidth
• Temporal masking:
– When the ear hears a loud sound,it takes a short
but finite time before it can hear a quieter sound
– SHOW 4.7
4.2.7 MPEG AUDIO CODERS
MPEG (Moving Picture Experts Group)
– A STANDARD FOR MULTIMEDIA APPLICATIONS

• ENCODING
– Input signal is first sampled and quantized using
PCM
– The bandwidth that is available for transmission is
divided into a number of frequency subbands
using a bank of analysis filters/ critical band filters
– Scaling factor:
• The analysis filter band also determines the maximum
amplitude of the 12 subband samples in each subband,
each known as scaling factor
Output of this is passed both to Psychoaccoustic
model and the quantizer block

Temporal and freq masking takes place in the


model

12 sets of 32 PCM samples are transformed into


freq components using mathematical modelling
4.2.7 MPEG AUDIO CODERS -contd

• Discrete Fourier transform(DFT)


– The 12 set of 32 PCM samples are first
transformed into an equivalent set of frequency
components using a mathematical technique
• Signal-to-mask ratios(SMRs)
– Using the known hearing thresholds and masking
properties of each subband,the model determines
the various masking effects of this set of signals
Frame format
• Header- sampling freq used
• Quantization is performed at 2 levels using
companding
• Scaling factor is quantized using 6 bits –
1of 64 levels
• 4 bits are used to quantize 12 freq
components in the subband – subband
sample format contains all data for
decoding
• In the decoder after the magnitude of each
set of 32 band samples are defined by
dequantizers, these are passed to
synthesis filter bank- produces PCM
samples , decoded to produce time
varying analog o/p
• Ancillary data contains additional
information about sound.
• Encoder uses psychoacoustic model,
more complex than decoder.
• Encoder has different scaling factors, freq
components in different subbands have
varying quantization noise level
• ISO recommended standard – 11172.3
which defines three levels of processing
• Layers 1,2,3, layer1 is basic mode level
and does not include temporal masking
• Layer 2,3 have increasing levels of
processing with increased compression
level and perception level
4.2.7 MPEG AUDIO CODERS -cont
table 4.2
Table 4.2 Summary of MPEG layer1,2 and 3 perceptual encoders
Compressed Example
Layer Apllication Quality
bit rate input-to-output
delay
Digital audio 32-448kbps Hi-fi quality at 192 20ms
1
cassette kbps per channel
Digital audio and
32- Near CD-quality
2 digital video 40ms
192kbps at 128 kbps per
broadcasting
channel

CD-quality of
3 CD-quality 64kbps 60ms
64kbps per
channel
•The Psychoaccoustic model controls the accuracy
of quantization by computing

• and allocating the no. of bits used to quantize


each sample .

•Quantization bits vary from one sample to other

•Bit allocation data is sent along with samples


which is used for dequantization of samples

•Forward adaptive bit allocation mode


4.2.8 Dolby audio coders
• MPEG V.S Dolby AC-1 ,show figure 4.9
– MPEG:
• Advantage: psychoacoustic model is required
only in the encoder
• Disadvantage:a significant portion of each
encoded frame contains bit allocation
information
– Dolby AC-1:
• Use a fixed bit allocation strategy for each
subband which is then used by both the
encoder and decoder
DOLBY –AC1
• Uses fixed bit allocation
• Bit allocation for each subband is based
on sensitivity of ear
• Bit allocation information my not be sent in
the frame
• AC-Accoustic coder
• Bit allocation per sample is adaptive
• Decoder also has copy of the
psychoacoustic model,
• instead of each frame containing bit
allocation information , it contains
encoded freq coefficients in the sampled
waveform segment-encodedspectral
envelope- this mode of operation –
backward adaptive bit allocation mode
• Disadvantage- model in the encoder
cannot be modified without changing
decoders
• Hybrid backward and forward adaptive bit
allocation mode is used.
• Hybrid approach is used in Dolby AC3 std.
4.2.8 Dolby audio coders-cont
• Dolby AC-2 standard which is utilized in many
applications including the compression
associated with the audio of a number of PC
sound cards
• The hybrid approach is used in the Dolby AC-3
standard which has been defined for use in a
similar range of applications as the MPEG audio
standards including the audio associated with
advanced television(ATV)
4.3 Video compression
• The digitization format defines the sampling
rate that is used for the luminance ,Y ,and
two chrominance,Cb and Cr
principles
• A technique used is based on combination
of preceeding and succeeding frame .
• Instead of video as set of compressed
frames, difference between actual frame
and predicted frame contents is sent-
motion estimation and motion
compensation .
4.3.1 video compress principles

• Frame type
– I-frame- Intracoded
• I-frames are encoded without reference to any
other frames
• Group of Pictures (GOP):The number of frame
between successive I-frames

– P-frame:intercoded
• encoding of a p-frame is relative to the contents of
either a preceding I-frame or a preceding P-frame
GOPs consist of I-frames (keyframes), P-frames (predicted frames), and B-frames
(bidirectionally predicted frames).
• The number of P-frames between I-frame is
limited since any errors present in the first P-
frame will be propagated to the next
– B-frame:their contents are predicted using search
regions in both past and future frames
– PB-frame:this does not refer to a new frame type
as such but rather the way two neighboring P- and
B-frame are encoded as if they were a single
frame
– D-frame:only used in a specific type of application.
It has been defined for use in movie/video-on-
demand application
MOTION ESTIMATION AND COMPENSATION

The encoded contents of both p and B frames are predicted by


estimating any motion that has taken place between the present
frame and the preceeding I or P frame and in B frames the
succeeding P or I frames.

The digitized contents of Y matrix of each frame is divided into two


dimensional matrix (16x16) pixels – macroblock

4:1:1 is considered and Cb Cr will be 8x8 pixels

Block size for DCT is also 8x8, A macroblock contains 4 DCT


blocks for Y and one each for two chrominance signals

P frame encoding
For encoding p frame, the contents of each macroblock in the
target frame is compared with the corresponding macroblock in
the I or Pframe – reference frame.

If there is a close match then the address of the macroblock is


coded else the search is continued for the nearby macroblock
If a close match is found then two parameters are encoded.
Motion vector, the(x,y) offset of the macroblock being
encoded, and the location of the block of pixels in the ref
frame and prediction error

Offset can be on macroblock or pixel boundary

Mv is known as the single pixel resolution.

prediction error- three matrices for Y cb cr each containing the


difference values between target MB and the set of pixels in the
search area which produced the close match.

MVs are encoded using Differential encoding and resulting


codeword are huffmann coded.
B frame encoding

Motion estimation is with preceeding Ior Pframe and the immediate


succeeding I or P frame

Mv and the difference matrices are computed using first preceeding


frame as ref and then succeeding frame as ref.

Third mv and the difference is calculated using target and the mean of
the two predicted values.

The set with least difference matrices is selected and encoded similar
to p frame.
Mv resolution is termed as half pixel resolution.
B frame encoding
IMPLEMENTATION ISSUES

For I frame three steps are : Forward DCT, Quantization and entropy coding

For four block of Y, two for chrominance, each macroblock requires 8x8
pixel blocks to be encoded.

For p frames the encoding of each macroblock is dependent on the output


of motion estimation which depends on contents of macroblock encoded
and the contents of macroblock in the search area of ref frame which
produces closest match.

Three possibilities are:

If the two contents are same, only address in the macroblock in ref frame is
encoded
If the two contents are very close, both the mv, the difference matrices are
encoded.

If no match is found ,target macroblok encoded similar to I frame.


Motion estimation contains the search logic – uses computed difference values,
considering
Target frame and ref frame contents, decompresses by dequantizes and IDCT
block

After the target frame is completely compressed the difference values are used
to update the ref frame contents for next frame encoding

The type of encoding for each macroblok is identified by formatter

Typical format is as shown

Type field indicates the type of frame being encoded- I/P / B

Address identifies the location of maroblock in the frame

Quantiztion value is the threshold value, to quantize all DCT coeffecients

Motion vector is the encoded vector

Blocks – the six 8x8 block that make up the macroblock


Decoding of received bitstream is simpler as it does not require estimation

At the receipt of the bitstream, each new frame is assembled a macroblock at


a time

Decoding of I frame is same as JPEG

To decode p frame, the decoder keeps the copy of the preceeding I or P


frame in a buffer and uses it along with encoded information of each
macroblock to build the Y, Cb, Cr matrices for new frame in second buffer.

With uncoded macroblocks, the macroblocks address is used to locate the


macroblocks in the previous frame and its contents are transferred to second
buffer
With fully encoded macroblocks these are
decoded directly and contents sent to buffer

For macroblocks with mv, set of difference


matrices , they are together with matrices in
first buffer are used to define values of
macroblocks in second buffer.

For Bframe decoding 3 buffers are used.


H.261

• For the provision of video telephony and


videoconferencing services over an ISDN
• Transmission channels multiples of 64kbps
• Digitization format used is either the common
intermediate format(CIF) or the quarter
CIF(QCIF)
• Progressive scanning used with frame refresh
rate of 30fps for CIF and 15or 7.5fps for
QCIF
– CIF:Y=352X288, Cb=Cr=176X144
– QCIF:Y=176X144, Cb=Cr=88X72

• H.261 encoding format show figure 4.15


I Frame and pframes are used with 3 p frames between each pair of I frames

Each macroblock has an address for identification

Type field indicates the macroblock is intracoded or intercoded

Quantization value is threshold value and mv is the encoded vector

Coded block pattern defines which of six 8x8 pixel block make up macroblock and
the JPEG encoded DCT COEFFICIENTS are given in each block

Picture start code- Start of each video frame

Temporal ref field- time stamp to synchronize video block with the associated audio
block of the same time stamp

Picture type field- type of frame ( I or P frame)

GOB- GROUP OF MACROBLOCKS (size is chosen such that CIF and QCIF has
integral number of GOBs)

EACH GOB – Unique start code – resynchronization marker


Each GOB also has group no.
For bandwidth optimization variable bit rate of encoder is converted into const bit rate
By passing through FIFO buffer

Feedback is provided to quantizer

o/p of the buffer is defined by the transmission bit rate, two threshold values are
defined low and high

If contents of buffer is below the low threshold ,quantization threshold is reduced and
the o/p rate is increased, if it is above high threshold then the threshold is increased
and the o/p rate is reduced

Control proceedure is implemented for GOB


4.3.2 H.261 -cont
Video encoder principles
• Two threshold
– Low
– high
4.3.3 H.263
• Over wireless and public switched telephone
networks(PSTN)
• Include video telephony videoconferencing ,
security surveillance ,interactive game
• Low bit rates
• Digitization formats
– QCIF:Y=176X144 , Cb=Cr=88X72
– S-QCIF:Y=128X96, Cb=Cr=64X68
– Progressive scanning with frame rate of 15or7.5fps.
4.3.3 H.263 -cont

• Frame types:
– I-frame
– P-frame
– B-frame
– PB-frame:because of the much reduced encoding
overhead
• Unrestricted motion vectors
• Mv associated with predicted macroblocks
are restricted to defined area and the search
area is restricted to the edge of the frame.

– To overcome this limitation ,for those pixels of a


potential close-match macroblock that fall outside
of the frame boundary edge pixels are used.
4.3.3 H.263 -cont

• Error resilience
– Cause error propagation,show figure4.17(a)
– For PSTN , errors present in bitstream is more
– Difficulty in finding the error macroblock
– -GOB(group of macroblocks) may contain any error
macroblock also.
– When error in GOB is detected it skips the remaining
macroblocks in the affected GOB and finds
resynchronization marker.
– Masking of error – error concealment scheme.
Error propagation
It leads to error propagation to other regions of the
frame

To avoid this the schemes used are:

•Error tracking

•Independent segment decoding

•Reference picture selection


Error tracking and resilience,show figure4.17(b)

For the information to encode regarding error in


GOB a two way channel communication is used

Error detection types:


•One or more out of range mvs
•One or more invalid variable length codewords
•One or more out of range DCT coefficients
•An excessive no of coefficients within a
macroblock
It has error prediction information of all
GOBs in the recently transmitted frames.

When an error is detected , decoder send


NAK(negetive ack) to encoder in the source
code with frame no, location of GOB in
frame in error.

It identifies the macroblocks to be likely


affected in the later frames

Affected macroblocks are intracoded.


Independent segment decoding

Prevent these errors from affecting neighboring GOBs in


succeeding frames
Show figure 4.18

Motion estimation and compensation is with ref boundary


pixels of GOB.

An error in GOB will affect the same GOB in successive


frames till anew intracoded GOB is sent by the encoder.

Used in conjunction with other schemes,.


Reference picture selection

Similar to error tracking scheme

DECODER sends ack messages to avoid error propagation

During encoding of intercoded frames a copy of preceding frame is


retained in the encoder

The encoder can select any of the previously decoded frame as ref.

In ex, NAK form frame 2 is received, the encoder selects GOB 3 of


frame 1 as ref to encode GOB of next frame.

With this error propagates to no of frames alternate method is to use


ACK mode.
4.3.4 MPEG
• MPEG-1
• Recommended by ISO as 11172
– Source intermediate digitization format(SIF)
– Resolution:352X288
– VHS-quality audio
– Video on CD-ROM at bit rates up to 1.5Mbps
• MPEG-2
• Used for recording and transmission of studio quality
audio and video
– Four levels of video resolution
• LOW – VHS quality video
• MAIN – 4:2:2 – studio quality digital video
• High 1440- 4:2:0 - HDTV
• High – 4:2:0- wide screen HDTV
4.3.4 MPEG -cont
• MPEG-4
– Similar H.263
– Low bit rate range from 4.8 to 64kbps
– Interactive multimedia application
4.3.5 MPEG-1
• USES SIF –source intermediate format
• Support two type spatial resolutions for two
types of video source
– NTSC
– PAL
• Frame type:I,P,B-frame,(figure 4.20)
• Based on the H.261,there are two main
differences:
– Temporal – time stamp inserted within the frame
for decoder to synchronize more quickly in the
event of macroblock corruption
– No of macroblocks between two time stamps-
slice , std no of macroblocks =22
– B-frame increases time interval between I and P
• Video bitstream structure (figure 4.21)
4.3.5 MPEG-1 -cont (figure 4.20)
SEQUENCE – CONSISTS OF GROUP OF PICTURES

GOP- SEQUENCE OF I,P OR B PICTURES / FRAMES

EACH PICTURE /FRAME – MADE UP OF SLICE – CONSISTS OF


MULTIPLE MACROBLOCKS

START CODE- INDICATES START OF SEQUENCE

VIDEO PARAMETERS – SCREEN SIZE, ASPECT RATIO

BIT STREAM PARAMETERS – BIT RATE AND SIZE OF MEMORY

QUANTIZATION PARAMETERS- CONTENTS OF QUANTIZATION


TABLES.

EACH GOP- START CODE , TIME STAMP FOR SYNCHRONIZATION,


PARAMETER FRAME – SEQUENCE OF FRAME TYPES.

SLICE IS SIMILAR TO GOB IN H.261


4.3.6 MPEG-2
• Support four levels and five profiles
• Four levels- low, main, high1440,high
• Simple, main, spatial resolution, quantization
accuracy and high
• MP@ML- main profile at main level
• Used For digital television broadcasting
– Resolution of either 720X480 pixels at 30Hz or
720X576 pixels at 25Hz
– Bit rate from 4Mbps – 15Mbps
– Use interlaced scanning,show 4.22(a)- frame
divided into 2 fields
– Field mode(figure 4.22(b))
– Frame mode(figure 4.22(c))
Interlaced scanning - each frame has two fields, alternate lines in each field
Two modes- field mode and frame mode depending on the video motion

For larger movement encoding is on the lines in a field for better


compression- ex- live event

For smaller motion frame mode is used- studio based prog

Motion estimation for encoding of macroblocks in p and b frames- three


different modes are : field, frame and mixed

In field mode – mv is computed using search window around the


corresponding macroblock in preceeding I OR P fields

Bframes- immediate succeeding p or I field

In frame mode – a mb in odd or even is encoded relative to the that in


preceeding /succeeding odd /even field

Mixed mode- mv for both field and frame modes are computed and mean
value selected
4.3.6 MPEG-2 -cont
• 3 standards with HDTV- ATV – North America,
• DIGITAL VIDEO BROADCAST (DVB)-Europe
• Multiple sub-Nyquist sampling encoding (MUSE)-
Japan, Asia
• These standards define the transmission of bitstream
over network

• ITU-R HDTV specification is with TV studio and


International exchange prog (1920 X 1152 Pixels)

• HDTV(Grand Alliance)- TV Manufactures
defined ATV std
– ITU-R HDTV
• 16/9 ASPECT RATIO
• 1280 x 720 pixels
• MP@HL-main profile at high level
• Audio compression : Dolby AC-3

– DVB HDTV
• 4/3 ASPECT RATIO (1440 x 1152 pixels)
• SSP@H1440-SPATIALLY-SCALEABLE
PROFILE AT HIGH 1440
• Compression std- MPEG audio layer 2
• MUSE- 16/9 ASPECT RATIO
• 1920 SAMPLES/LINE AND 1035 PER
FRAME
4.3.7 MPEG-4
• Application of MPEG-4 is related to Audio and
Video associated with interactive multimedia
applications over internet and entertainment
networks
• Contains features to enable a user to access
and manipulate individual elements of picture
• Due to high coding efficiency, used for low bit
rate networks such as wireless and PSTNs.
• Alternate to H.263, supports low bitrates also
• MPEG-4 has:
– Content-based functionalities-each scene is defined in the form
of background and one or more foreground objects -Audio-visual
object(AVOs)
– Each AVO- is defined as one or more audio and video objects
– Object descriptor- each audio and video object origin description
is required for manipulation
– Binary format for scenes (BIFS)- language used for modifying
objects( example for video objects changing shape, color,
appearance and for audio objects changing volume etc..)
– Scene descriptor- contains composition of a scene.
➢ Defines how the various AVOs are related to each other in the
context of the complete scene
--Figure 4.23- A frame/scene is defined in the form of a number of
AVOs
---Each video frame is segmented into a number of Video object
planes(VOPs), each of which corresponds to an AVO of interest
• In the example frame is shown as consisting of three
VOPs:
➢ VOP 0 – To represent the person approaching the car
➢ VOP 1 – the remainder of the car parked outside the
house
➢ VOP 2 – the remainder of the background
➢ Each VOP is encoded separately based on its shape,
motion and texture
➢ Each VOP is encapsulated within a rectangle. It is chosen
so that it completely covers the related AVO using the
minimum number of macroblocks
EACH VOP in the rectangle which refers the related AVO WITH min MB.

The motion and texture of each VOP is also encoded and the bitstream is
multiplexed together with related object and screen descriptor Information .

At the receiver the bitstream is demultiplexed and individual stream decoded.

Decompressed information , screen descriptor and object information together


creates the video frame

AUDIO AND VIDEO COMPRESSION


Audio associated with an AVO is compressed using one of the audio compression
algorithms and it depends on the available bandwidth/ bit rate of the transmission
channel and the sound quality required.

Different algorithms like G.723.1, DOLBY AC3, MPEG LAYER 2 USED for
different applications

EACH VOP is identified, defined and encoded separately

EACH object in the VOP is based on the similar properties of texture, color and
brightness and such video objects bounded by rectangle containing min MB is
encoded based on shape, motion and texture
Any VOP which has no motion associated with it will produce a minimum of
compressed information

VOPs which move often occupy only a small portion of the scene/frame

Bitrate of the multiplexed video stream is much lower than that obtained with the
other standards.
DECODER
4.3.7 MPEG-4 -cont
• Transmission format(figure 4.25)
– Transport stream – information of frame and n/w
– Packetized elementary- TS consists of multiplexed
stream of PES
– Elementary stream(ES)- compressed audio and
video information relating to each AVO
– Each PES contains type field in the packet header
– FlexMux layer- routes the PES to synchronization
block
– Synchronization layer- consists of synchronized blocks
– Elementary stream descriptor(ESD)- carries
compressed audio and video for each AVO and routes
to the decoder.
– Contains timing stamps – timing synchronization block
– Composition and rendering block- decompressed data
and screen descriptor information is used to compose
frame.
MPEG4 DECODER
4.3.7 MPEG-4 -cont
• Error resilience techniques are for
transmission channels (figure 4.26)
– Use of fixed-length instead of GOB
– Based on reversible VLCs- a new variable length
coding
– Error occur
• macroblock
• header
Video packets –

Resynchroniztion markers in the encoder at the start of GOB helps in error


detection and discarding the GOB in H.263 /261 and also slice in MPEG 1,2

IN MPEG4, compressed bitstream is divided into equal no of bits and this group
– video packet, seperated by resynchronization marker

Fixed length video packets helps in multiple resynchronization markers in the


activity region of frame, only few MBs are affected

IN fig c video packet format shows the MB block- no of first MB , MVS are
limited to boundaries of the MB.

Mv for MB is seperated from DCT INFORMATION and the 2 fields are


seperated by motion boundary marker MBM to identify the affected one.

The header contains copy of picture /frame level parameters which are
identified by header extension code – HEC bit
4.3.7 MPEG-4 -cont
• Reversible VLCs (figure 4.27)
– The associated set of RVLCs is then produced by
adding a fixed—length prefix and suffix to each of
the corresponding VLCs and decoded in
– Forward direction scan
– Reverse direction scan
– The error at difference points in the bitstream
resulting in an overlap region
– In the ex each VLC has a hamming weight of 1,
fixed length and sufix single binary and hence
each RVLC has 3 binary 1s in it, when decoding a
string can be identified by no. of 1s and the last 1
indicates end
Ex 4.2

You might also like