0% found this document useful (0 votes)

9 views

4 Chapter Audio and Video Compression (1)

Chapter 4 discusses audio and video compression techniques, highlighting the differences in algorithms for digitized audio and video signals. It covers various audio compression methods such as PCM, DPCM, ADPCM, and perceptual coding, as well as video compression principles including frame types and motion estimation. The chapter emphasizes the importance of psychoacoustic models in audio encoding and the role of motion compensation in video encoding.

Uploaded by

sadi22ece

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

4 Chapter Audio and Video Compression (1)

Uploaded by

sadi22ece

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 122

Chapter 4 Audio and video compression

• 4.1 Introduction
• 4.2 audio compression
• 4.3 Video compression
4.1 introduction
• Both audio and most video signals are
continuously varying analog signals
• The compression algorithms associated with
digitized audio and video are different
4.2 Audio compress
• Pulse code modulation(PCM)
-requires sampling of analog signal at require
rate
• Bandlimited signal
• The bandwidth of the communication
channels that are available dictate rates that
are less than these.This can be achieved in
one of two ways:
– Audio signal is sampled at a lower rate
– A compression algorithm is used
4.2.1 Differential pulse code modulation
• DPCM is a derivative of standard PCM

• for most audio signals, the range of the

differences in amplitude between successive
samples of the audio waveform is less than
the range of the actual sample amplitudes.
• Figure4.1
The previous digitized sample value is held in reg R

Difference signal is by subtracting (Ro) from the digitized sample of ADC

Reg R is updated with the difference signal

The decoder adds the DPCM with previously computed signal in the reg

The o/p of ADC is also known as residual

There are schemes to predict the more accurate previous signal

The proportions used are determined by predictor co-efficients

4.2.2 Adaptive differential PCM
• Additional savings in bandwidth –or improved
quality –can be obtained by varying the
number of bits used for the difference signal
depending on its amplitude
• A second ADPCM standard ,which is G.722.It
has added subband coding.better sound
quality
• A third standard based on ADPCM is also
available.this is defined in G.726.This also
uses subband coding but with a speech
bandwidth of 3.4kHz
For higher signal bandwidth, before sampling the i/p signal is passed
through filters .

The o/p of filters are lower band signal and upper sub band signal

It is sampled and encoded independently using ADPCM

Two bitstreams are multiplexed to produce transmitted signal and the

decoder divides into separate stream for decoding
4.2.3 Adaptive Predictive Coding(APC)

• Even higher levels of compression-but at

higher levels of complexity-can be obtained
by also making the predictor coefficients
adaptive.This is the principle of adaptive of
adaptive predictive coding
Linear predictive coding

All algorithms – sampling, digitization and quantization using

DPCM / ADPCM

DSP crcuits help in analyzing the signal based on the required

features (perceptual) and then quantized

Origin of sound is also important – vocal tract excitation

parameters

Voiced sounds-generated through vocal chords

Unvoiced sounds – vocal chords are open

These are used with proper model of vocal tract to produce

synthesized speech
4.2.4 Linear predictive coding
• After analyzing the audio waveform , These are
then quantized and sent and the destination
uses them,together with a sound synthesizer,to
regenerate a sound that is perceptually
comparable with the source audio signal.this is
LPC technique.
• Three feature which determine the perception
of a signal by the ear are its:
– Pitch
– Period
– Loudness
• Basic feature of an LPC encoder/decoder:
figure 4.4
The i/p waveform is first sampled and quantized at a defined rate

Segment- block of sampled signals are analyzed to define perceptual

parameters of speech

The speech signal generated by the vocal tract model in the decoder is the
present o/p signal of speech synthesizers and linear combination of previous
set of model coefficients

Hence the vocal tract model is adaptive

Encoder determines and sends a new set of coefficients for each quantized
segment

The output of encoder is a set of frames ,each frame consists of fields for
pitch and loudness

Bit rates as low as 2.4 or 1.2 kbps. Generated sound at these rates is very
synthetic and LPC encoders are used in military applications, where
bandwidth is important
4.2.5 Code-excited LPC
• Code-excited LPC
– The synthesizers used in most LPC decoders are
based on a very basic model of the vocal tract
• In the CELP model,instead of treating each
digitized segment independently for encoding
purpose a limited set of segments are used.
• All coders of this type have a delay
associated with them which is incurred while
each block of digitized samples is analyzed
by the encoder and the speech is
reconstructed at the decoder
• Associated delay – Processing delay,
algorithmic delay and lookahead
CELP
• Template codebook held by encoder nad
decoder before computing
• Codeword that is sent selects a particular
template from codebook whose difference
value best match those quantized by
encoder
• Hence improvement in sound quality is
obtained.
4.2.6 Perceptual coding
• Perceptual encoders have been designed for
the compression of general audio
• Perceptual coding since its role is to exploit a
number of the limitation of the human ear.
• Sensitivity of the ear
– A strong signal may reduce the level of sensitivity
of the ear to other signals which are near to it in
frequency
– The model used is Psychoaccoustic
4.2.6 Perceptual coding -cont
– The Sensitivity of the ear varies with the frequency
of the signal,the perception threshold of the ear –
that is, its minimum level of sensitivity-as a
function of frequency is show in figure 4.5(a)
– Most sensitive to signals in the range 2-5kHz
– Fig a- vertical axis – amplitude of all signals to be
heard , A and B have same level , A is heard due
to threshold level
Shown 4.5(b) shows how the the sensitivity of the ear
changes in the vicinity of a loud signal

Frequency masking- when multiple signals present ,

strong signal s reduce the sensitivity level of ear to
other signals in the nearer freq
4.2.6 Perceptual coding -cont
– The masking effect also varies with frequency as
show in figure 4.6
– Critical bandwidth- width of each curve at a
particular signal level for that freq . For freq less
than 500 hz critical bw is 100 hz and greater than
500 it changes .
• Temporal masking:
– When the ear hears a loud sound,it takes a short
but finite time before it can hear a quieter sound
– Masking effect varies with freq-fig 4.6
– SHOWN IN 4.7 –effect of temporal masking –
signal amplitude decays after a time period after
the loud sound ceases and at this time signal
amplitude less than decay envelope will not be
heard.
4.2.6 Perceptual coding-cont (figure4.6)
4.2.6 Perceptual coding -cont

– The masking effect also varies with frequency as

show in figure 4.6
– Critical bandwidth
• Temporal masking:
– When the ear hears a loud sound,it takes a short
but finite time before it can hear a quieter sound
– SHOW 4.7
4.2.7 MPEG AUDIO CODERS
MPEG (Moving Picture Experts Group)
– A STANDARD FOR MULTIMEDIA APPLICATIONS

• ENCODING
– Input signal is first sampled and quantized using
PCM
– The bandwidth that is available for transmission is
divided into a number of frequency subbands
using a bank of analysis filters/ critical band filters
– Scaling factor:
• The analysis filter band also determines the maximum
amplitude of the 12 subband samples in each subband,
each known as scaling factor
Output of this is passed both to Psychoaccoustic
model and the quantizer block

Temporal and freq masking takes place in the

model

12 sets of 32 PCM samples are transformed into

freq components using mathematical modelling
4.2.7 MPEG AUDIO CODERS -contd

• Discrete Fourier transform(DFT)

– The 12 set of 32 PCM samples are first
transformed into an equivalent set of frequency
components using a mathematical technique
• Signal-to-mask ratios(SMRs)
– Using the known hearing thresholds and masking
properties of each subband,the model determines
the various masking effects of this set of signals
Frame format
• Header- sampling freq used
• Quantization is performed at 2 levels using
companding
• Scaling factor is quantized using 6 bits –
1of 64 levels
• 4 bits are used to quantize 12 freq
components in the subband – subband
sample format contains all data for
decoding
• In the decoder after the magnitude of each
set of 32 band samples are defined by
dequantizers, these are passed to
synthesis filter bank- produces PCM
samples , decoded to produce time
varying analog o/p
• Ancillary data contains additional
information about sound.
• Encoder uses psychoacoustic model,
more complex than decoder.
• Encoder has different scaling factors, freq
components in different subbands have
varying quantization noise level
• ISO recommended standard – 11172.3
which defines three levels of processing
• Layers 1,2,3, layer1 is basic mode level
and does not include temporal masking
• Layer 2,3 have increasing levels of
processing with increased compression
level and perception level
4.2.7 MPEG AUDIO CODERS -cont
table 4.2
Table 4.2 Summary of MPEG layer1,2 and 3 perceptual encoders
Compressed Example
Layer Apllication Quality
bit rate input-to-output
delay
Digital audio 32-448kbps Hi-fi quality at 192 20ms
1
cassette kbps per channel
Digital audio and
32- Near CD-quality
2 digital video 40ms
192kbps at 128 kbps per
broadcasting
channel

CD-quality of
3 CD-quality 64kbps 60ms
64kbps per
channel
•The Psychoaccoustic model controls the accuracy
of quantization by computing

• and allocating the no. of bits used to quantize

each sample .

•Quantization bits vary from one sample to other

•Bit allocation data is sent along with samples

which is used for dequantization of samples

•Forward adaptive bit allocation mode

4.2.8 Dolby audio coders
• MPEG V.S Dolby AC-1 ,show figure 4.9
– MPEG:
• Advantage: psychoacoustic model is required
only in the encoder
• Disadvantage:a significant portion of each
encoded frame contains bit allocation
information
– Dolby AC-1:
• Use a fixed bit allocation strategy for each
subband which is then used by both the
encoder and decoder
DOLBY –AC1
• Uses fixed bit allocation
• Bit allocation for each subband is based
on sensitivity of ear
• Bit allocation information my not be sent in
the frame
• AC-Accoustic coder
• Bit allocation per sample is adaptive
• Decoder also has copy of the
psychoacoustic model,
• instead of each frame containing bit
allocation information , it contains
encoded freq coefficients in the sampled
waveform segment-encodedspectral
envelope- this mode of operation –
backward adaptive bit allocation mode
• Disadvantage- model in the encoder
cannot be modified without changing
decoders
• Hybrid backward and forward adaptive bit
allocation mode is used.
• Hybrid approach is used in Dolby AC3 std.
4.2.8 Dolby audio coders-cont
• Dolby AC-2 standard which is utilized in many
applications including the compression
associated with the audio of a number of PC
sound cards
• The hybrid approach is used in the Dolby AC-3
standard which has been defined for use in a
similar range of applications as the MPEG audio
standards including the audio associated with
advanced television(ATV)
4.3 Video compression
• The digitization format defines the sampling
rate that is used for the luminance ,Y ,and
two chrominance,Cb and Cr
principles
• A technique used is based on combination
of preceeding and succeeding frame .
• Instead of video as set of compressed
frames, difference between actual frame
and predicted frame contents is sent-
motion estimation and motion
compensation .
4.3.1 video compress principles

• Frame type
– I-frame- Intracoded
• I-frames are encoded without reference to any
other frames
• Group of Pictures (GOP):The number of frame
between successive I-frames

– P-frame:intercoded
• encoding of a p-frame is relative to the contents of
either a preceding I-frame or a preceding P-frame
GOPs consist of I-frames (keyframes), P-frames (predicted frames), and B-frames
(bidirectionally predicted frames).
• The number of P-frames between I-frame is
limited since any errors present in the first P-
frame will be propagated to the next
– B-frame:their contents are predicted using search
regions in both past and future frames
– PB-frame:this does not refer to a new frame type
as such but rather the way two neighboring P- and
B-frame are encoded as if they were a single
frame
– D-frame:only used in a specific type of application.
It has been defined for use in movie/video-on-
demand application
MOTION ESTIMATION AND COMPENSATION

The encoded contents of both p and B frames are predicted by

estimating any motion that has taken place between the present
frame and the preceeding I or P frame and in B frames the
succeeding P or I frames.

The digitized contents of Y matrix of each frame is divided into two

dimensional matrix (16x16) pixels – macroblock

4:1:1 is considered and Cb Cr will be 8x8 pixels

Block size for DCT is also 8x8, A macroblock contains 4 DCT

blocks for Y and one each for two chrominance signals

P frame encoding
For encoding p frame, the contents of each macroblock in the
target frame is compared with the corresponding macroblock in
the I or Pframe – reference frame.

If there is a close match then the address of the macroblock is

coded else the search is continued for the nearby macroblock
If a close match is found then two parameters are encoded.
Motion vector, the(x,y) offset of the macroblock being
encoded, and the location of the block of pixels in the ref
frame and prediction error

Offset can be on macroblock or pixel boundary

Mv is known as the single pixel resolution.

prediction error- three matrices for Y cb cr each containing the

difference values between target MB and the set of pixels in the
search area which produced the close match.

MVs are encoded using Differential encoding and resulting

codeword are huffmann coded.
B frame encoding

Motion estimation is with preceeding Ior Pframe and the immediate

succeeding I or P frame

Mv and the difference matrices are computed using first preceeding

frame as ref and then succeeding frame as ref.

Third mv and the difference is calculated using target and the mean of
the two predicted values.

The set with least difference matrices is selected and encoded similar
to p frame.
Mv resolution is termed as half pixel resolution.
B frame encoding
IMPLEMENTATION ISSUES

For I frame three steps are : Forward DCT, Quantization and entropy coding

For four block of Y, two for chrominance, each macroblock requires 8x8
pixel blocks to be encoded.

For p frames the encoding of each macroblock is dependent on the output

of motion estimation which depends on contents of macroblock encoded
and the contents of macroblock in the search area of ref frame which
produces closest match.

Three possibilities are:

If the two contents are same, only address in the macroblock in ref frame is
encoded
If the two contents are very close, both the mv, the difference matrices are
encoded.

If no match is found ,target macroblok encoded similar to I frame.

Motion estimation contains the search logic – uses computed difference values,
considering
Target frame and ref frame contents, decompresses by dequantizes and IDCT
block

After the target frame is completely compressed the difference values are used
to update the ref frame contents for next frame encoding

The type of encoding for each macroblok is identified by formatter

Typical format is as shown

Type field indicates the type of frame being encoded- I/P / B

Address identifies the location of maroblock in the frame

Quantiztion value is the threshold value, to quantize all DCT coeffecients

Motion vector is the encoded vector

Blocks – the six 8x8 block that make up the macroblock

Decoding of received bitstream is simpler as it does not require estimation

At the receipt of the bitstream, each new frame is assembled a macroblock at

a time

Decoding of I frame is same as JPEG

To decode p frame, the decoder keeps the copy of the preceeding I or P

frame in a buffer and uses it along with encoded information of each
macroblock to build the Y, Cb, Cr matrices for new frame in second buffer.

With uncoded macroblocks, the macroblocks address is used to locate the

macroblocks in the previous frame and its contents are transferred to second
buffer
With fully encoded macroblocks these are
decoded directly and contents sent to buffer

For macroblocks with mv, set of difference

matrices , they are together with matrices in
first buffer are used to define values of
macroblocks in second buffer.

For Bframe decoding 3 buffers are used.

H.261

• For the provision of video telephony and

videoconferencing services over an ISDN
• Transmission channels multiples of 64kbps
• Digitization format used is either the common
intermediate format(CIF) or the quarter
CIF(QCIF)
• Progressive scanning used with frame refresh
rate of 30fps for CIF and 15or 7.5fps for
QCIF
– CIF:Y=352X288, Cb=Cr=176X144
– QCIF:Y=176X144, Cb=Cr=88X72

• H.261 encoding format show figure 4.15

I Frame and pframes are used with 3 p frames between each pair of I frames

Each macroblock has an address for identification

Type field indicates the macroblock is intracoded or intercoded

Quantization value is threshold value and mv is the encoded vector

Coded block pattern defines which of six 8x8 pixel block make up macroblock and
the JPEG encoded DCT COEFFICIENTS are given in each block

Picture start code- Start of each video frame

Temporal ref field- time stamp to synchronize video block with the associated audio
block of the same time stamp

Picture type field- type of frame ( I or P frame)

GOB- GROUP OF MACROBLOCKS (size is chosen such that CIF and QCIF has
integral number of GOBs)

EACH GOB – Unique start code – resynchronization marker

Each GOB also has group no.
For bandwidth optimization variable bit rate of encoder is converted into const bit rate
By passing through FIFO buffer

Feedback is provided to quantizer

o/p of the buffer is defined by the transmission bit rate, two threshold values are
defined low and high

If contents of buffer is below the low threshold ,quantization threshold is reduced and
the o/p rate is increased, if it is above high threshold then the threshold is increased
and the o/p rate is reduced

Control proceedure is implemented for GOB

4.3.2 H.261 -cont
Video encoder principles
• Two threshold
– Low
– high
4.3.3 H.263
• Over wireless and public switched telephone
networks(PSTN)
• Include video telephony videoconferencing ,
security surveillance ,interactive game
• Low bit rates
• Digitization formats
– QCIF:Y=176X144 , Cb=Cr=88X72
– S-QCIF:Y=128X96, Cb=Cr=64X68
– Progressive scanning with frame rate of 15or7.5fps.
4.3.3 H.263 -cont

• Frame types:
– I-frame
– P-frame
– B-frame
– PB-frame:because of the much reduced encoding
overhead
• Unrestricted motion vectors
• Mv associated with predicted macroblocks
are restricted to defined area and the search
area is restricted to the edge of the frame.

– To overcome this limitation ,for those pixels of a

potential close-match macroblock that fall outside
of the frame boundary edge pixels are used.
4.3.3 H.263 -cont

• Error resilience
– Cause error propagation,show figure4.17(a)
– For PSTN , errors present in bitstream is more
– Difficulty in finding the error macroblock
– -GOB(group of macroblocks) may contain any error
macroblock also.
– When error in GOB is detected it skips the remaining
macroblocks in the affected GOB and finds
resynchronization marker.
– Masking of error – error concealment scheme.
Error propagation
It leads to error propagation to other regions of the
frame

To avoid this the schemes used are:

•Error tracking

•Independent segment decoding

•Reference picture selection

Error tracking and resilience,show figure4.17(b)

For the information to encode regarding error in

GOB a two way channel communication is used

Error detection types:

•One or more out of range mvs
•One or more invalid variable length codewords
•One or more out of range DCT coefficients
•An excessive no of coefficients within a
macroblock
It has error prediction information of all
GOBs in the recently transmitted frames.

When an error is detected , decoder send

NAK(negetive ack) to encoder in the source
code with frame no, location of GOB in
frame in error.

It identifies the macroblocks to be likely

affected in the later frames

Affected macroblocks are intracoded.

Independent segment decoding

Prevent these errors from affecting neighboring GOBs in

succeeding frames
Show figure 4.18

Motion estimation and compensation is with ref boundary

pixels of GOB.

An error in GOB will affect the same GOB in successive

frames till anew intracoded GOB is sent by the encoder.

Used in conjunction with other schemes,.

Reference picture selection

Similar to error tracking scheme

DECODER sends ack messages to avoid error propagation

During encoding of intercoded frames a copy of preceding frame is

retained in the encoder

The encoder can select any of the previously decoded frame as ref.

In ex, NAK form frame 2 is received, the encoder selects GOB 3 of

frame 1 as ref to encode GOB of next frame.

With this error propagates to no of frames alternate method is to use

ACK mode.
4.3.4 MPEG
• MPEG-1
• Recommended by ISO as 11172
– Source intermediate digitization format(SIF)
– Resolution:352X288
– VHS-quality audio
– Video on CD-ROM at bit rates up to 1.5Mbps
• MPEG-2
• Used for recording and transmission of studio quality
audio and video
– Four levels of video resolution
• LOW – VHS quality video
• MAIN – 4:2:2 – studio quality digital video
• High 1440- 4:2:0 - HDTV
• High – 4:2:0- wide screen HDTV
4.3.4 MPEG -cont
• MPEG-4
– Similar H.263
– Low bit rate range from 4.8 to 64kbps
– Interactive multimedia application
4.3.5 MPEG-1
• USES SIF –source intermediate format
• Support two type spatial resolutions for two
types of video source
– NTSC
– PAL
• Frame type:I,P,B-frame,(figure 4.20)
• Based on the H.261,there are two main
differences:
– Temporal – time stamp inserted within the frame
for decoder to synchronize more quickly in the
event of macroblock corruption
– No of macroblocks between two time stamps-
slice , std no of macroblocks =22
– B-frame increases time interval between I and P
• Video bitstream structure (figure 4.21)
4.3.5 MPEG-1 -cont (figure 4.20)
SEQUENCE – CONSISTS OF GROUP OF PICTURES

GOP- SEQUENCE OF I,P OR B PICTURES / FRAMES

EACH PICTURE /FRAME – MADE UP OF SLICE – CONSISTS OF

MULTIPLE MACROBLOCKS

START CODE- INDICATES START OF SEQUENCE

VIDEO PARAMETERS – SCREEN SIZE, ASPECT RATIO

BIT STREAM PARAMETERS – BIT RATE AND SIZE OF MEMORY

QUANTIZATION PARAMETERS- CONTENTS OF QUANTIZATION

TABLES.

EACH GOP- START CODE , TIME STAMP FOR SYNCHRONIZATION,

PARAMETER FRAME – SEQUENCE OF FRAME TYPES.

SLICE IS SIMILAR TO GOB IN H.261

4.3.6 MPEG-2
• Support four levels and five profiles
• Four levels- low, main, high1440,high
• Simple, main, spatial resolution, quantization
accuracy and high
• MP@ML- main profile at main level
• Used For digital television broadcasting
– Resolution of either 720X480 pixels at 30Hz or
720X576 pixels at 25Hz
– Bit rate from 4Mbps – 15Mbps
– Use interlaced scanning,show 4.22(a)- frame
divided into 2 fields
– Field mode(figure 4.22(b))
– Frame mode(figure 4.22(c))
Interlaced scanning - each frame has two fields, alternate lines in each field
Two modes- field mode and frame mode depending on the video motion

For larger movement encoding is on the lines in a field for better

compression- ex- live event

For smaller motion frame mode is used- studio based prog

Motion estimation for encoding of macroblocks in p and b frames- three

different modes are : field, frame and mixed

In field mode – mv is computed using search window around the

corresponding macroblock in preceeding I OR P fields

Bframes- immediate succeeding p or I field

In frame mode – a mb in odd or even is encoded relative to the that in

preceeding /succeeding odd /even field

Mixed mode- mv for both field and frame modes are computed and mean
value selected
4.3.6 MPEG-2 -cont
• 3 standards with HDTV- ATV – North America,
• DIGITAL VIDEO BROADCAST (DVB)-Europe
• Multiple sub-Nyquist sampling encoding (MUSE)-
Japan, Asia
• These standards define the transmission of bitstream
over network

• ITU-R HDTV specification is with TV studio and

International exchange prog (1920 X 1152 Pixels)
•
• HDTV(Grand Alliance)- TV Manufactures
defined ATV std
– ITU-R HDTV
• 16/9 ASPECT RATIO
• 1280 x 720 pixels
• MP@HL-main profile at high level
• Audio compression : Dolby AC-3

– DVB HDTV
• 4/3 ASPECT RATIO (1440 x 1152 pixels)
• SSP@H1440-SPATIALLY-SCALEABLE
PROFILE AT HIGH 1440
• Compression std- MPEG audio layer 2
• MUSE- 16/9 ASPECT RATIO
• 1920 SAMPLES/LINE AND 1035 PER
FRAME
4.3.7 MPEG-4
• Application of MPEG-4 is related to Audio and
Video associated with interactive multimedia
applications over internet and entertainment
networks
• Contains features to enable a user to access
and manipulate individual elements of picture
• Due to high coding efficiency, used for low bit
rate networks such as wireless and PSTNs.
• Alternate to H.263, supports low bitrates also
• MPEG-4 has:
– Content-based functionalities-each scene is defined in the form
of background and one or more foreground objects -Audio-visual
object(AVOs)
– Each AVO- is defined as one or more audio and video objects
– Object descriptor- each audio and video object origin description
is required for manipulation
– Binary format for scenes (BIFS)- language used for modifying
objects( example for video objects changing shape, color,
appearance and for audio objects changing volume etc..)
– Scene descriptor- contains composition of a scene.
➢ Defines how the various AVOs are related to each other in the
context of the complete scene
--Figure 4.23- A frame/scene is defined in the form of a number of
AVOs
---Each video frame is segmented into a number of Video object
planes(VOPs), each of which corresponds to an AVO of interest
• In the example frame is shown as consisting of three
VOPs:
➢ VOP 0 – To represent the person approaching the car
➢ VOP 1 – the remainder of the car parked outside the
house
➢ VOP 2 – the remainder of the background
➢ Each VOP is encoded separately based on its shape,
motion and texture
➢ Each VOP is encapsulated within a rectangle. It is chosen
so that it completely covers the related AVO using the
minimum number of macroblocks
EACH VOP in the rectangle which refers the related AVO WITH min MB.

The motion and texture of each VOP is also encoded and the bitstream is
multiplexed together with related object and screen descriptor Information .

At the receiver the bitstream is demultiplexed and individual stream decoded.

Decompressed information , screen descriptor and object information together

creates the video frame

AUDIO AND VIDEO COMPRESSION

Audio associated with an AVO is compressed using one of the audio compression
algorithms and it depends on the available bandwidth/ bit rate of the transmission
channel and the sound quality required.

Different algorithms like G.723.1, DOLBY AC3, MPEG LAYER 2 USED for
different applications

EACH VOP is identified, defined and encoded separately

EACH object in the VOP is based on the similar properties of texture, color and
brightness and such video objects bounded by rectangle containing min MB is
encoded based on shape, motion and texture
Any VOP which has no motion associated with it will produce a minimum of
compressed information

VOPs which move often occupy only a small portion of the scene/frame

Bitrate of the multiplexed video stream is much lower than that obtained with the
other standards.
DECODER
4.3.7 MPEG-4 -cont
• Transmission format(figure 4.25)
– Transport stream – information of frame and n/w
– Packetized elementary- TS consists of multiplexed
stream of PES
– Elementary stream(ES)- compressed audio and
video information relating to each AVO
– Each PES contains type field in the packet header
– FlexMux layer- routes the PES to synchronization
block
– Synchronization layer- consists of synchronized blocks
– Elementary stream descriptor(ESD)- carries
compressed audio and video for each AVO and routes
to the decoder.
– Contains timing stamps – timing synchronization block
– Composition and rendering block- decompressed data
and screen descriptor information is used to compose
frame.
MPEG4 DECODER
4.3.7 MPEG-4 -cont
• Error resilience techniques are for
transmission channels (figure 4.26)
– Use of fixed-length instead of GOB
– Based on reversible VLCs- a new variable length
coding
– Error occur
• macroblock
• header
Video packets –

Resynchroniztion markers in the encoder at the start of GOB helps in error

detection and discarding the GOB in H.263 /261 and also slice in MPEG 1,2

IN MPEG4, compressed bitstream is divided into equal no of bits and this group
– video packet, seperated by resynchronization marker

Fixed length video packets helps in multiple resynchronization markers in the

activity region of frame, only few MBs are affected

IN fig c video packet format shows the MB block- no of first MB , MVS are
limited to boundaries of the MB.

Mv for MB is seperated from DCT INFORMATION and the 2 fields are

seperated by motion boundary marker MBM to identify the affected one.

The header contains copy of picture /frame level parameters which are
identified by header extension code – HEC bit
4.3.7 MPEG-4 -cont
• Reversible VLCs (figure 4.27)
– The associated set of RVLCs is then produced by
adding a fixed—length prefix and suffix to each of
the corresponding VLCs and decoded in
– Forward direction scan
– Reverse direction scan
– The error at difference points in the bitstream
resulting in an overlap region
– In the ex each VLC has a hamming weight of 1,
fixed length and sufix single binary and hence
each RVLC has 3 binary 1s in it, when decoding a
string can be identified by no. of 1s and the last 1
indicates end
Ex 4.2

Chain_Scarf_english_final
100% (2)
Chain_Scarf_english_final
5 pages
Unit 2 - Audio and Video Compression
100% (2)
Unit 2 - Audio and Video Compression
59 pages
Audio Compression
No ratings yet
Audio Compression
81 pages
Audio Compression
No ratings yet
Audio Compression
11 pages
Audio and Audio Compression
No ratings yet
Audio and Audio Compression
27 pages
Noakhali Science & Technology University
No ratings yet
Noakhali Science & Technology University
12 pages
EE412/CS455 Principles of Digital Audio and Video
No ratings yet
EE412/CS455 Principles of Digital Audio and Video
71 pages
MPEG Standards For Audio
No ratings yet
MPEG Standards For Audio
46 pages
5. Audio Coding and Standards
No ratings yet
5. Audio Coding and Standards
32 pages
Huff Man 1
No ratings yet
Huff Man 1
4 pages
Low Bit Rate Coding
No ratings yet
Low Bit Rate Coding
4 pages
Audio Compression
No ratings yet
Audio Compression
53 pages
Audio Compression
No ratings yet
Audio Compression
23 pages
ضغط الصوت
No ratings yet
ضغط الصوت
31 pages
Simple Audio Compression Methods: A Udio Com Pression
No ratings yet
Simple Audio Compression Methods: A Udio Com Pression
6 pages
Bab 7 Multimedia Kompresi Audio
No ratings yet
Bab 7 Multimedia Kompresi Audio
52 pages
AUDIO COMPRESSION1 (1)
No ratings yet
AUDIO COMPRESSION1 (1)
22 pages
Audio and Video Compresssion
100% (1)
Audio and Video Compresssion
61 pages
Audio Compression: Usha Sree
No ratings yet
Audio Compression: Usha Sree
23 pages
Digital Audio Coding - Dr. T. Collins: Standard MIDI Files Perceptual Audio Coding MPEG-1 Layers 1, 2 & 3 MPEG-4
No ratings yet
Digital Audio Coding - Dr. T. Collins: Standard MIDI Files Perceptual Audio Coding MPEG-1 Layers 1, 2 & 3 MPEG-4
23 pages
Audio Compression
No ratings yet
Audio Compression
31 pages
MMC Unit III-1
No ratings yet
MMC Unit III-1
122 pages
MPEG
No ratings yet
MPEG
12 pages
Advanced Audio Coding (Aac)
100% (1)
Advanced Audio Coding (Aac)
33 pages
MPEG, The MP3 Standard, and Audio Compression
No ratings yet
MPEG, The MP3 Standard, and Audio Compression
12 pages
MP3 Format
No ratings yet
MP3 Format
25 pages
Unit Ii
No ratings yet
Unit Ii
34 pages
4: Speech Compression: Data Rates
No ratings yet
4: Speech Compression: Data Rates
14 pages
Audio Compression
No ratings yet
Audio Compression
5 pages
Digital Audio Compression: by Davis Yen Pan
No ratings yet
Digital Audio Compression: by Davis Yen Pan
14 pages
Audio Compression
No ratings yet
Audio Compression
50 pages
DAP Speech Coding v3 2025
No ratings yet
DAP Speech Coding v3 2025
49 pages
Lecture 16
No ratings yet
Lecture 16
23 pages
Audio Compression
No ratings yet
Audio Compression
6 pages
Audio Coding For TV
No ratings yet
Audio Coding For TV
36 pages
Speech Coder
No ratings yet
Speech Coder
20 pages
MPEG-4 Advanced Audio Coding
No ratings yet
MPEG-4 Advanced Audio Coding
13 pages
Dolby Audio Coders
100% (3)
Dolby Audio Coders
17 pages
PCM, Differential Coding, DPCM, DM, ADPCM - Ze-Nian Li and Mark S
No ratings yet
PCM, Differential Coding, DPCM, DM, ADPCM - Ze-Nian Li and Mark S
13 pages
36-Perceptual Coding, MPEG Audio Coding-03!04!2025
No ratings yet
36-Perceptual Coding, MPEG Audio Coding-03!04!2025
57 pages
Speech and Audio Processing: Lecture-3
No ratings yet
Speech and Audio Processing: Lecture-3
20 pages
Slide14 Short
No ratings yet
Slide14 Short
25 pages
Fundamentals of Perceptual Audio Coding
No ratings yet
Fundamentals of Perceptual Audio Coding
30 pages
Chapter 2 part 3
No ratings yet
Chapter 2 part 3
19 pages
Digital Audio
No ratings yet
Digital Audio
29 pages
EC8002 MCC Question Bank Watermark
No ratings yet
EC8002 MCC Question Bank Watermark
109 pages
Speech Coding: Fundamentals and Applications: ARK Asegawa Ohnson
No ratings yet
Speech Coding: Fundamentals and Applications: ARK Asegawa Ohnson
20 pages
18EC743-MMC-Module-4 Notes
100% (1)
18EC743-MMC-Module-4 Notes
45 pages
Speech Coding: Fundamentals and Applications: ARK Asegawa Ohnson
No ratings yet
Speech Coding: Fundamentals and Applications: ARK Asegawa Ohnson
20 pages
Speech Coding Journal
No ratings yet
Speech Coding Journal
20 pages
LPC Modeling: Unit 5 1.speech Compression
No ratings yet
LPC Modeling: Unit 5 1.speech Compression
13 pages
MPEG Audio - Compression - 2
No ratings yet
MPEG Audio - Compression - 2
5 pages
BCE613A-MOD 4
No ratings yet
BCE613A-MOD 4
20 pages
Audio Compression Standards: James Rodney P. Santiago
No ratings yet
Audio Compression Standards: James Rodney P. Santiago
51 pages
Unit-Ii Itc
No ratings yet
Unit-Ii Itc
42 pages
New Implementation Techniques of An Effi
No ratings yet
New Implementation Techniques of An Effi
11 pages
EC8002 Multimedia Compression and Communication Notes 2
No ratings yet
EC8002 Multimedia Compression and Communication Notes 2
109 pages
PPT Sistem Digital Nirkabel [TM3]
No ratings yet
PPT Sistem Digital Nirkabel [TM3]
64 pages
Transmission of Information: David Falconer and Halim Yanikomeroglu
No ratings yet
Transmission of Information: David Falconer and Halim Yanikomeroglu
42 pages
Digital Audio Formats
From Everand
Digital Audio Formats
Ambrose Delaney
No ratings yet
Analog Dialogue, Volume 45, Number 4: Analog Dialogue, #4
From Everand
Analog Dialogue, Volume 45, Number 4: Analog Dialogue, #4
Analog Dialogue
No ratings yet
7c144148dulal Sir Notes
No ratings yet
7c144148dulal Sir Notes
21 pages
STUDENTS Day Script
No ratings yet
STUDENTS Day Script
3 pages
110T15456 1
No ratings yet
110T15456 1
151 pages
4 Against Darkness Combat Steps
No ratings yet
4 Against Darkness Combat Steps
1 page
developing humor
No ratings yet
developing humor
5 pages
Cheerdance 46thGoundRules
No ratings yet
Cheerdance 46thGoundRules
7 pages
ingilis_dili_ru_a
No ratings yet
ingilis_dili_ru_a
3 pages
Castles & Crusades - Castle Keeper Screen
No ratings yet
Castles & Crusades - Castle Keeper Screen
5 pages
Usha 23-24 Articles
No ratings yet
Usha 23-24 Articles
45 pages
Group - Freehand Tamashii - Free Hentai Manga, Doujinshi and Anim
No ratings yet
Group - Freehand Tamashii - Free Hentai Manga, Doujinshi and Anim
6 pages
PDF文件 3
No ratings yet
PDF文件 3
1 page
Works Cited
No ratings yet
Works Cited
3 pages
Pesach 5783 Order Form.01
No ratings yet
Pesach 5783 Order Form.01
4 pages
Notes From The Wild - 1996 - Anna's Archive
No ratings yet
Notes From The Wild - 1996 - Anna's Archive
104 pages
Sunlit Sweater
100% (1)
Sunlit Sweater
7 pages
MFO 51 Toilet Paper: Super Spectacular Swimsuit Edition
No ratings yet
MFO 51 Toilet Paper: Super Spectacular Swimsuit Edition
4 pages
By Humphrey Kimathi: CRT Television Repair Course
No ratings yet
By Humphrey Kimathi: CRT Television Repair Course
131 pages
36rh, 37rh, 46rh, 46re, 47rh, 47re, 48re
88% (8)
36rh, 37rh, 46rh, 46re, 47rh, 47re, 48re
12 pages
Welcome To PNC Park: Illustration: Daniel Marsula
No ratings yet
Welcome To PNC Park: Illustration: Daniel Marsula
1 page
Dargah Ajmer Sharif Rajasthan India and You March April 2016
No ratings yet
Dargah Ajmer Sharif Rajasthan India and You March April 2016
3 pages
Shure PG 57 XLR Guide
No ratings yet
Shure PG 57 XLR Guide
1 page
Creating Planned Independent Requirements
No ratings yet
Creating Planned Independent Requirements
9 pages
They Had Always Called It The Green River
No ratings yet
They Had Always Called It The Green River
2 pages
Theatre 11 THV
No ratings yet
Theatre 11 THV
13 pages
Contemporary & Modern Dance
No ratings yet
Contemporary & Modern Dance
7 pages
FINAL-Much-Ado-Study-Guide
No ratings yet
FINAL-Much-Ado-Study-Guide
19 pages
7.12.4 MONITOR FHD 42-LG SM5KE Series Standard Signage - LG IN
No ratings yet
7.12.4 MONITOR FHD 42-LG SM5KE Series Standard Signage - LG IN
2 pages
Cubs 2018 Schedule
No ratings yet
Cubs 2018 Schedule
5 pages
L2 Solutions Quiz Units 4 Answer Key
No ratings yet
L2 Solutions Quiz Units 4 Answer Key
2 pages