Audio Compression
Audio Compression
Techniques
1
Introduction
Digital Audio Compression
Removal of redundant or otherwise irrelevant
information from audio signal
Audio compression algorithms are often referred to as
“audio encoders”
Applications
Reduces required storage space
Reduces required transmission bandwidth
2
Audio Compression
Audio signal – overview
Sampling rate (# of samples per second)
Bit rate (# of bits per second). Typically,
uncompressed stereo 16-bit 44.1KHz signal has a
1.4MBps bit rate
Number of channels (mono / stereo / multichannel)
Reduction by lowering those values or by data
compression / encoding
3
Why Compression is Needed
Data rate = sampling rate * quantization
bits * channels (+ control information)
Irrelevant information
Perceptuallyinsignificant
Cannot be recovered from remaining
information
5
Audio Data Compression
Lossless Audio Compression
Removes redundant data
Resulting signal is same as original – perfect
reconstruction E.g. Huffmann, LZW
Lossy Audio Encoding
Removes irrelevant data
Resulting signal is similar to original
E.g. ADPCM, LPC
6
Audio Data Compression
Audio vs. Speech Compression
Techniques
Speech Compression uses a human vocal
tract model to compress signals
Audio Compression does not use this
technique due to larger variety of possible
signal variations
7
Generic Audio Encoder
Psychoacoustic Model
Psychoacoustics – study of how sounds are
perceived by humans
Uses perceptual coding
eliminate information from audio signal that is
inaudible to the ear
Detectsconditions under which different audio
signal components mask each other
8
Additional Encoding Techniques
Other encoding techniques techniques are
available (alternative or in combination)
Predictive Coding
Coupling / Delta Encoding
Huffman Encoding
9
Additional Encoding Techniques
Predictive Coding
Often used in speech and image compression
Estimates the expected value for each sample based
on previous sample values
Transmits/stores the difference between the expected
and received value
Generates an estimate for the next sample and then
adjusts it by the difference stored for the current
sample
Used for additional compression in MPEG2 AAC
10
Additional Encoding Techniques
Coupling / Delta encoding
Used in cases where audio signal consists of two or
more channels (stereo or surround sound)
Similarities between channels are used for
compression
A sum and difference between two channels are
derived; difference is usually some value close to
zero and therefore requires less space to encode
This is a case of lossless encoding process
11
Additional Encoding Techniques
Huffman Coding
Information-theory-based technique
An element of a signal that often reoccurs in the
signal is represented by a simpler symbol, and its
value is stored in a look-up table
Implemented using a look-up tables in encoder and in
decoder
Provides substantial lossless compression, but
requires high computational power and therefore is
not very popular
Used by MPEG1 and MPEG2 AAC
12
Psychoacoustics
Limits of Human Hearing
Wave 1
Wave 2
Magnitude
Spectrum
of Either
Limits of Human Hearing
Masking in Amplitude, Time, and Frequency
21
Spectral Analysis
Transforms
Fast Fourier Transform (FFT)
Discrete Cosine Transform (DCT) - similar to
FFT but uses cosine values only
Modified Discrete Cosine Transform (MDCT)
[used by MPEG-1 Layer-III, MPEG-2 AAC,
Dolby AC-3] – overlapped and windowed
version of DCT
22
Spectral Analysis
Filter Banks
Time sample blocks are passed through a set
of bandpass filters
Masking thresholds are applied to resulting
frequency subband signals
Poly-phase and wavelet banks are most
popular filter structures
23
Compression Models
•Perceptual Models
•Production Models
24
Perceptual Models
Exploit masking, etc., to discard perceptually
irrelevant information.
Example: Quantize soft sounds more accurately,
loud sounds less accurately
Determine number of bits needed to represent the coefficient such that, the
noise introduced by quantization is below the masking effect i.e. [noise
introduced = 12dB; masking = 15 dB]
38
Rate control loop
For a given bit rate allocation, adjust the
quantization steps to achieve the bit rate.
This loop checks if the number of bits
resulting from the coding operation exceeds
the number of bits available to code for a
given block of data.
If it is true, then the quantization step is
increased to reduce the total bits.
MPEG Audio Bit Allocation
This process determines number of code bits allocated to
each sub-band based on information from the psycho-
acoustic model
Algorithm:
1. Compute mask-to-noise ratio: MNR=SNR-SMR
Standard provides tables that give estimates for SNR resulting
from quantizing to a given number of quantizer levels
2. Search for sub-band with the lowest MNR
3. Allocate code bits to this sub-band.
If sub-band allocated gets more code bits than appropriate, look
up new estimate of SNR and repeat step 1
Distortion control loop
This loop shape the quantization steps according to the
perceptual mask threshold
Start with a default factor 1.0 for every band
If the quantization error in a band exceeds the mask
threshold, the scale factor is adjusted to reduce this
quantization error
This will cause more bits and the rate control loop has
to be invoked every time the scale factors are
changed
The distortion control is executed until the noise level
is below the perceptual mask for every band
Decoder
Decoder side is relatively easier. The gain,
scale factor, quantization steps recovered
are used to reconstruct the filter bank
responses.
Filter bank responses are combined to
reconstruct the decoded audio signal
MPEG Coding Specifications
MPEG Layer I
Filter is applied one frame (12x32 = 384 samples) at a time.
At 48 kHz, each frame carries 8ms of sound.
Uses a 512-point FFT to get detailed spectral information
about the signal. (sub-band filter).
Uses equal frequency spread per band.
Psychoacoustic model only uses frequency masking.
44
MPEG Layer III
Source: http://wiki.hydrogenaudio.org/images/e/ee/Mp3filestructure.jpg
Successor of MP3
Advanced Audio Coding (AAC)(MPEG-2
AAC)– now part of MPEG-4 Audio
Can deliver 320 kbps for five channels (5.1 Channel
system).
Also capable of delivering high quality stereo sound at
bitrates of below 128 kbps.
Inclusion of 48 full-bandwidth audio channels
Support 3 different profiles i.e. Main ,Low Complexity,
Scalable Sampling rate.
Default audio format for iPhone, iPad, PlayStation, Nokia,
Android, BlackBerry
Introduced in 1997 as MPEG-2 Part 7
In 1999 – updated and included in MPEG-4
AAC‟s Improvements over MP3
More sample frequencies (8-96 kHz)
Arbitrary bit rates and variable frame
length
Higher efficiency and simpler filterbank
Uses pure MDCT (modified discrete cosine
transform)
Used in Windows Media Audio
MPEG-4 Audio
Variety of applications
General audio signals
Speech signals
Synthetic audio
Synthesized speech (structured audio)
MPEG-4 Audio Part 3
Includes variety of audio coding technologies
Lossy speech coding (e.g., CELP)
CELP – code-excited linear prediction – speech
coding
General audio coding (AAC)
Hardware data compression
Text-to-Speech interface
Structured Audio (e.g., MIDI)
MPEG-4 Part 14
Called MP4 with Extension .mp4
Multimedia container format
Stores digital video and audio streams and
allows streaming over Internet
Container or wrapper format
meta-fileformat whose spec describes how
different data elements and metadata coexist
in computer file
Conclusion
MPEG Audio is an integral part of the
MPEG standard to be considered together
with video
MPEG-4 Audio represents a major
extension in terms of capabilities to
MPEG-1 Audio