5. Audio Coding and Standards
5. Audio Coding and Standards
Standards
Audio Coding and Standards
• Models, Techniques & Requirements of Sound Coding
• Entropy Coding: Run length Coding & Huffman coding
• Differential Coding – DPCM & ADPCM
• LPC and Parametric Coding
• Sound Masking Effect and Sub-band Coding
• ITU G.72x Speech/Audio Standards
• ISO MPEG-1/2/4 Audio Standards
• MIDI and Structured Audio
• Common Audio File Formats
PCM Audio Data Rate and Data Size
• If the level of the 8th band is 60dB; it gives a masking (threshold) of 12 dB in the 7th band
and 15dB in the 9th.
• Level in 7th band is 10 dB ( < 12 dB ), so ignore it.
• Level in 9th band is 35 dB ( > 15 dB ), so send it.
• [ Only the amount above the masking (threshold) level needs to be sent, so instead of using
6 bits to encode it, we can use 4 bits – saving 2 bits (= 12 dB). ]
MPEG Audio Layers
• MPEG defines 3 layers for audio. Basic model is same, but codec complexity
increases with each layer.
• Divides data into frames, each of them contains 384 samples, 12 samples from each
of the 32 filtered subbands.
• Layer 1: DCT type filter with one frame and equal frequency spread per band.
Psycho-acoustic model only uses frequency masking.
• Layer 2: Use three frames in filter (before, current, next, a total of 1152 samples).
This models a little bit of the temporal masking.
• Layer 3: Better critical band filter is used (non-equal frequencies), psycho-acoustic
model includes temporal masking effects, takes into account stereo redundancy, and
uses Huffman coder
• MP3: Music compression format using MPEG Layer 3
MPEG Audio Layers
Quality factor: 5 - perfect, 4 - just noticeable, 3 - slightly annoying, 2 - annoying, 1 - very annoying
• Real delay is about 3 times of the theoretical delay
MPEG-1 Audio Facts
• MPEG-1: 64K~320Kbps for audio
• – Uncompressed CD audio => 1.4 Mb/s
• • Compression factor ranging from 2.7 to 24.
• • With Compression rate 6:1 (16 bits stereo sampled at 48 KHz is reduced to 256 kb/s) and
optimal listening conditions, expert listeners could not distinguish between coded and
original audio clips.
• • MPEG audio supports sampling frequencies of 32, 44.1 and 48 KHz.
• • Supports one or two audio channels in one of the four modes:
• 1.Monophonic -- single audio channel
• 2.Dual-monophonic -- two independent chs, e.g., English and French
• 3.Stereo -- for stereo channels that share bits, but not using Jointstereo coding
• 4.Joint-stereo -- takes advantage of the correlations between stereo channels
MPEG-2 Audio Coding
• MPEG-2/MC: Provide theater-style surround sound capabilities
• - Five channels: left, right, center, rear left, and rear right
• – Five different modes: mono, stereo, three ch, four ch, five ch
• – Full five channel surround stereo: 640 Kb/s
• – 320 Kb/s for 5.1 stereo (5 channels+sub-woofer ch)
• • MPEG-2/LSF (Low sampling frequency: 16k, 22K, 24k)
• • MPEG-2/AAC (Advanced Audio Coding)
• - 7.1 channels
• - More complex coding
• • Compatibility:
• – Forward: MPEG-2 decoder can decode MPEG-1 bitstream
• – Backward: MPEG-1 decoder can decode a part of MPEG-2
MPEG-4 Audio Coding
• Consists of natural coding and synthetic coding
• • Natural coding
• - General coding: AAC and TwinVQ based arbitrary audio twice as good as MP3
• - Speech coding:
• * CELP I: 16K samp., 14.4~22.5Kbps
• * CELP II: 8K & 16K samp., 3.85~23.8Kbps
• * HVXV: 8M samp., 1.4~4Kbps
• • Synthetic coding: structured audio
• – Interface to Text-to-Speech synthesizers
• – High-quality audio synthesis with Structured Audio
• • AudioBIFS: Mix and postproduce multi-track sound streams
Structured Audio
• A description format that is made up of semantic information about the
sounds it represents, and that makes use of high-level (algorithmic) models.
• – E.g., MIDI (Musical Instrument Digital Interface).
• • Normal music digitization: perform waveform coding (we sample the music
signal and then try to reconstruct it exactly)
• • MIDI: only record musical actions such as the key depressed, the time when
the key is depressed, the duration for which the key remains depressed, and
how hard the key is struck (pressure).
• • MIDI is an example of parameter or event-list representation
• – An event list is a sequence of control parameters that, taken alone
• – Do not define the quality of a sound but instead specify the ordering and
characteristics of parts of a sound with regards to some external model.
Structured Audio Synthesis
• Sampling synthesis
• – Individual instrument sounds are digitally recorded and stored in memory
• – When the instrument is played, the note recording are reproduced and
mixed (added together) to produce the output sound.
• – This can be a very effective and realistic but requires a lot of memory
• – Good for playing music but not realistic for speech synthesis
• – Good for creating special sound effects from sample libraries
Structured Audio Synthesis
• Additive and subtractive synthesis
• – synthesize sound from the superposition of sinusoidal components
(additive)
• – Or from the filtering of an harmonically rich source sound - typically a
periodic oscillator with various form of waves (subtractive).
• – Very compact representation of the sound
• – the resulting notes often have a distinctive “analog synthesizer” character.
Applications of Structured Audio
• Low-bandwidth transmission
• – transmit a structural description and dynamically render it into sound on
the client side rather than rendering in a studio on the server side
• Sound generation from process models
• – the sound is not created from an event list but rather is dynamically
generated in response to evolving, nonsound-oriented environments such as
video games
• Music applications
• Content-based retrieval
• Virtual reality together with VRML/X3D
Common Audio File Formats
• Mulaw (Sun, NeXT) .au
• RIFF Wave (MS WAV) .wav
• MPEG Audio Layer (MPEG) .mp2 .mp3
• AIFC (Apple, SGI) .aiff .aif
• HCOM (Mac) .hcom
• SND (Sun, NeXT) .snd
• VOC (Soundblaster card proprietary standard) .voc
• AND MANY OTHERS!