0% found this document useful (0 votes)
2 views

mpeg audio

The document discusses the MPEG-1 and MPEG-2 audio coding standards, detailing the encoding algorithms, bitstream formats, and psychoacoustic models used in audio compression. It outlines the development history of MPEG audio, the differences between its three layers (Layer 1, Layer 2, and Layer 3), and the technical aspects of each layer, including quantization, filter banks, and coding techniques. Additionally, it addresses issues such as aliasing in filter banks and the use of Huffman coding for efficient data representation.

Uploaded by

eddiebatista
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

mpeg audio

The document discusses the MPEG-1 and MPEG-2 audio coding standards, detailing the encoding algorithms, bitstream formats, and psychoacoustic models used in audio compression. It outlines the development history of MPEG audio, the differences between its three layers (Layer 1, Layer 2, and Layer 3), and the technical aspects of each layer, including quantization, filter banks, and coding techniques. Additionally, it addresses issues such as aliasing in filter banks and the use of Huffman coding for efficient data representation.

Uploaded by

eddiebatista
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

MPEG-1 / MPEG-2 BC

Audio

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 1


The Basic Paradigm of T/F Domain Audio
Coding
Digital
Audio
Quantized Encoded
Input
Samples Bitstream
Filter Bank Bit or Noise Bitstream
Allocation Formatting

Signal to
Mask Ratio

Psychoacoustic
Model

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 2


MPEG Audio Standardization Philosophy (1)
• Definition of a complete transmission chain
consists of specification of
– Encoding algorithm
– Bitstream format
– Decoding algorithm

ITU-T Approach • ITU-T standardizes all three parts


⇒ Encoder output predictable

MPEG Approach
• MPEG standardizes only bitstream format and
decoder, not the encoder (“informative part”)

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 3


MPEG Audio Standardization Philosophy (2)

• Motivation: open for further improvements,


room for specific corporate know-how

• But: No sound quality guaranteed !

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 4


MPEG-1/2 Audio
• MPEG-1 Audio
– Audio coding 32 - 48 kHz, mono/stereo
– Layer 1, 2, 3
– Layer-3 (aka .mp3) optimized for lower bit-rates
– Copy protection via SCMS included

• MPEG-2 Audio
– Low sampling frequencies audio
add 16 - 24 kHz to Layer 1, 2, 3
– Multichannel audio, BC

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 5


MPEG-1 Development History

IRT (MASCAM) simplified version


LAYER 1
Philips MUSICAM
LAYER 2
CCETT Psychoacoustic
model 2 (optional)

AT&T (PXFM)

Fraunhofer-IIS ASPEC LAYER 3


UNI-Erl.(OCF)
Thomson
Brandt (MSC)

CNET

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 6


MPEG-1 Audio
• Developed Dec. 88 to Nov. 92
• Coding of mono and stereo signals
• Bitrates from 32 kbit/s to 448 kbit/s
• Three "Layers":
– Layer 1: lowest complexity
– Layer 2: increased complexity and quality
– Layer 3: highest complexity and quality at
low bit-rates
• Target bitrates 384 kbits/s, 256 kbit/s,
< 192 kbit/s for Layers 1, 2, 3 respectively
• Sampling frequencies supported:
– 48 kHz, 44.1 kHz, 32 kHz
Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 7
The main building blocks
• Perceptual model
– using psychoacoustics, mostly proprietary

• Filter bank
– subdividing the input signal into spectral
components
– more lines  more coding gain
– longer impulse response  pre-echo artifacts

• Quantization and coding


– this is the step introducing quantization noise
– spectral shape of quantization noise determines
the audibility
– can be designed to leave encoding methods
optional
Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 8
MPEG Audio - Short Description of the Layers (1)
• Layer I
– Frame length: 384 samples (8 ms@ 48 kHz)
– Frequency resolution: 32 subbands
– Quantization: Block-companding (12 samples),
amplitude of subband samples indicated by
“scalefactors” (SCF); 2 dB resolution

• Layer II
– Frame length: 1152 samples (24 ms@ 48 kHz)
– Frequency resolution: 32 subbands
– Quantization: Block-companding (12 samples)
– Use of Scalefactor select information
Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 9
MPEG Audio - Short Description of the Layers (2)

• Layer III
– Standard frame length: 1152 samples (24 ms @
48 kHz)
– Frequency resolution: 576/192 subbands
– Quantization: non-uniform with Huffman coding
– Use of Scalefactor Select Information

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 10


MPEG - Layer-1, -2 and -3 Compression: Header

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 11


MPEG-1 Layer 1

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 12


Block Diagram MPEG-1 Layer 1

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 13


MPEG Audio - Layer-1

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 14


MPEG Audio - Layer-1 (2)

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 15


MPEG Audio Layer-1 Signal-to-Noise Ratios

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 16


MPEG-1 Audio Layer 1 Bitstream Syntax

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 17


MPEG-1 Layer 2

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 18


MPEG Audio - Layer-2 (1)
• Processing of 1152 sample long frames
• 32 Sub-bands used; grouping in 3 granules of
12 subband samples
• Layer II additionally offers coding of bit
allocation, scalefactors and samples
• The number of necessary scalefactors can vary
• Use of different windows
• Theoretically Minimum of the Coder-/Decoder-
Delays around 35 ms

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 19


MPEG-1 Audio Layer 2 Bitstream Syntax

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 20


MPEG-1 Audio Layer 2 Bitstream Syntax (2)

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 21


MPEG-1 Layer 3

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 22


MPEG Audio – Layer 3
• MP3 is the nickname of the ISO/IEC-Standard
for audio compression described by ISO/IEC IS
11172-3 (MPEG-1 Layer-3) and 13818-3
(MPEG-2 Layer-3)
• The benefits of MP3-formats is that it is a
headerless file format, which means that it is
not necessary to have the header to play the
music
• Allows MP3 streaming
• Theoretic minimum delay of the Coder/Decoder
is around 59 ms
Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 23
Block diagram Layer-3
32 MDCT‘s

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 24


MPEG Layer-3
• Same basic configuration as Layer-2:
– Frame length 24 ms at 48 kHz
– Polyphase filter bank

• Specifics of Layer-3:
– Hybrid filter bank (32*18 = 576 subbands or
32*6=192 subbands)
– nonuniform quantization (implicit noise shaping)
with a power law ( ^.75)
– Huffman coding
– Analysis-by-Synthesis structure
– Bitreservoir (short time buffer)
– Support for variable bitrate (mandatory)

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 25


MPEG-1 Layer 3

Psychoacoustic mode • often used: “Psychoacoustic model 2”

Block Size • 1 Frame = 1152 samples = 36 samples per


Polyphase filter bank subband

Hybrid Filter bank • Additional Modified Discrete Cosine Transform


(MDCT), and aliasing reduction stage

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 26


Hybrid Filter Bank & Aliasing (1)
Aliasing in the subbands
32 bands
e.g. 32kHz e.g. ≈32Hz

 32*18=576 bands

6-18 bands better compression


32 bands
6 or 18
 pre-echo avoidance

• Filter bank is critically sampled


• Problem of aliasing in the analysis filter bank
Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 27
Hybrid Filter Bank & Aliasing (2)

Nyquist frequencies/channel boundaries

3dB

yk

signal mirrored inside


aliasing cannot be neglected
Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 28
Hybrid Filter Bank & Aliasing (3)

mirrored original

Nyquist frequency fp

• Mirroring of the original signals greater than


the nyquist frequency
– occurs on downsampling
Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 29
Problem of Aliasing in a Cascaded Filter Bank (1)
• Result of cascading:
– Far off frequencies are aliased into the SB of the
2nd filter bank
Observe:
Aliasing spreads Nyquist frequency
over several
subbands of attenuation of
second stage! the filter subbands of
Not the case 2nd FB first stage
for only single filter bank
stage FB
(practically)

signal aliased signal


subband of 1st FB
Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 30
Problem of Aliasing in a Cascaded Filter Bank (2)
• Frequency response contains peaks from the
aliasing:

passband

peaks continue aliasing aliasing

 not just one


Reduced by alias reduction
passband, but
several, only • Signal in many subbands
slightly attenuated
• Must be coded in many subband
– Worse coding efficiency
Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 31
Aliasing Reduction Structure (MP3)
similar approach as synthesis FB
uses to reduce/cancel aliasing

6/18 -
bands
32 -
bands
6/18
bands

suitable factors (fixed),


(-1)n (1,-1,1,-1,…),
obtained from frequency
high / low frequencies
response of filters,
n are mirrored every
for equal level of aliasing
Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected]
second subband Page 32
Problem of Aliasing in a Cascaded Filter Bank (3)

• Solution:
– Less downsampling in first stage (non-critical
sampling)
– better filter
– Aliasing reduction with subtraction from
neighboring bands (e.g.MP3)
– no cascaded filter bank (e.g. AAC)

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 33


Huffman Coding Details
• Vector coding
– One codeword for 2 or 4 subbands
• Escape sequence
– Large words are coded as sum of
ESC-pointer + difference
• Adaptive table selection (signal table number)
– Table selection according to maximum
– Table selection according to local statistics
• Adaptive sectioning (4 sections)
– Each section defines Huffman table
• Coding efficiency approaches theoretical limits !

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 34


Layer-3 Iteration Loops

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 35


Possibilities to control the Quantization
• Two basic ways
– Bit Allocation
– Noise Allocation

• Detailed possibilities:
– Bit allocation between fixed „Worst-Case“ and
„Maximum-SNR“ situations
– Bit allocation is calculated from the Threshold
Estimation
– Direct calculation of the allowed noise (Noise
Allocation)
– Simplified „Noise Allocation“
Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 36
Layer-3 : Outer Loop
• Distortion Loop (control of the distortion)

• Saves the unquantized spectral values

• Compares the reconstructed values with the original

• Builds the actual distortion in the frequency domain

• Scaling by frequency groups with the amount of


distortion

• Convergence of the iteration is not guaranteed

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 37


Layer-3 : Inner Loop
• Rate Loop (Data rate controller)

• Entropy coding: Data rate depends on actual


data set

• Buffer Control: Controls the necessary bits

• Convergence through iterations: is always


possible

• Beginning level: Calculated from SFM (Spectral


Flatness Measure)
Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 38
MPEG Audio - Layer-3: Bitstream
• Organization of the bit streams
• Fixed length of bytes: 17 at mono, 32 at
stereo, independent of the bitrate
• Constant Section
– Header (ISO Standard, like with Layer-1 and -2)
– Additional information for a frame (e.g. Pointer to the
variable section)
– Additional information for each granule (e.g. Number
of the Huffman-Code table)
• Variable Section
– Scalefactors
– Huffman-coded frequency lines
– Additional Data

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 39


MPEG Audio - Layer-3: Bitstream (2)

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 40


MPEG Audio - Layer-3: Bitstream (3)

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 41


MPEG Audio - Layer-3: Bitstream (4)

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 42


MPEG-1 Audio Decoder

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 43


MPEG Audio – General Decoder Structure

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 44


MPEG - Audio Decoder Process (1) Layer-1 and -2 Decoder flow chart

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 45


MPEG - Audio Decoder Process (2) Layer-3 Decoder flow chart

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 46


MPEG - Audio Decoder Process (3) Synthesis of the subband filter

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 47


MPEG - Audio Decoder Process (4) Layer-3 Decoder Diagramm

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 48


Stereo Coding in MPEG-1/-2

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 49


MPEG Audio - Layer-1 and Layer-2

• For better stereo performance, Intensity Stereo


(IS) can be applied

• With IS-Coding, a mono signal is transmitted in


higher frequency bands and the decoder places
it close to the original stereo position

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 50


MPEG Audio - Layer-3 (3)
• "Joint Stereo"-Coding to additionally improve
compression: Mid/Side coding (M/S) and
Intensity Stereo (IS)

• M/S coding: two channels are coded where


left is actually the sum of the original left and
right and the right channel is the difference

• Either both channels are separately coded


("stereo"- mode) or "Intensity Stereo"-Coding
is used

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 51


MPEG-2 Audio

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 52


MPEG-2 Audio LSR extensions:
Technical • Run coder at half the sampling rate
Background • Only minor changes required (e.g. tables)
Implications
⇒ Increased frequency resolution
⇒ Less bitrate for side information
⇒ Better coding efficiency
⇒ Only one half of the workload required
(SW implementations!)
• But: Transparent coding not possible due
due restriction in bandwidth (max. 12 kHz)
• All current MP3 decoders support this mode

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 53


MPEG-2 MC Features
Sound Formats • Supports 3/2+1 multi-channel sound (ITU-R):
Left, right, center, left surround, right surround
C
L R
• Low frequency enhancement channel (LFE)

• “Downward compatibility”:
LS RS Also 3/1, 3/0, 2/2, 2/1, 1/0 supported

• Multi-Lingual capability
Up to 7 channels e.g. for different languages,
commentary channel, “clean dialog” etc.
Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 54
MPEG - 2 Audio Multichannel
Structure of the ISO 13818-3
Layer II multichannel
extension, backwards
compatible with ISO 11172-3
Layer II

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 55


Annex: Abbreviations and Companies

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 56


Abbreviations and Companies (1)
• AAC: Advanced Audio Coding
• ASPEC: Adaptive Spectral Perceptual Entropy
Coding
• AT&T:American Telephone and Telegraph
Company
• CCETT: Centre Commun d’Etudes de
Télédiffusion et Télécommunication
• CNET: Research and Development Center of
France Télécom
• FhG-IIS: Fraunhofer Gesellschaft/Institut für
Integrierte Schaltungen (Erlangen)
Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 57
Abbreviations and Companies (2)
• IRT: Institut für Rundfunktechnik GmbH,
München, Research and Development Institute
of ARD, ZDF, DLR, ORF and SRG

• ITU-R: International Telecommunication Union


– Radio Communication Sector

• MASCAM: Masking-pattern Adapted Subband


Coding and Multiplexing AT&T:American
Telephone and Telegraph Company

• MUSICAM: Masking-pattern Universal


Subband Integrated Coding and Multiplexing
Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 58
Abbreviations and Companies (3)

• NTT: Nippon Telegraph and Telephone


Corp./Human Interface Laboratories

• Thomson: Thomson, Telefunken, Saba, RCA,


GE, ProScan

• TwinVQ: Transform-domain Weighted


Interleave Vector Quantization

Prof. Dr.-Ing. K. Brandenburg, [email protected] Dr.-Ing. G. Schuller, [email protected] Page 59

You might also like