Multimedia Systems Chapter 6
Multimedia Systems Chapter 6
Chapter Six
Basics of Digital Audio
Audio information is crucial for multimedia presentations and, in a sense, is the
simplest type of multimedia data. However, some important differences
between audio and image information cannot be ignored. For example, while it
is customary and useful to occasionally drop a video frame from a video stream,
to facilitate viewing speed, we simply cannot do the same with sound
information or all sense will be lost from that dimension.
6.1 Digitization of Sound
What Is Sound?
Sound is a wave phenomenon like light, but it is macroscopic and involves
molecules of air being compressed and expanded under the action of some
physical device..For example,
a speaker in an audio system vibrates back and forth and produces a
longitudinal pressure wave that we perceive as sound.
Without air there is no sound - for example, in space. Since sound is a pressure
wave, it takes on continuous values, as opposed to digitized ones with a finite
range. Nevertheless, if we wish to use a digital version of sound waves, we
must form digitized representations of audio information.
Figure shows the one-dimensional nature of sound. Values change over time
in amplitude: the pressure increases or decreases with time. The amplitude
value is a continuous quantity. Since we are interested in working with such
data in computer storage, we must digitize the analog signals (i.e.,
continuous-valued voltages) produced by microphones.
For image data, we must likewise digitize the time-dependent analog signals
produced by typical video-cameras. Digitization means conversion to a stream
of numbers - preferably integers for efficiency
Since the graph in Figure is two-dimensional, to fully digitize the signal shown
we have to sample in each dimension - in time and in amplitude. Sampling
means measuring the quantity we are interested in, usually at evenly spaced
intervals. The first kind of sampling - using measurements only at evenly
spaced time intervals - is simply called sampling (surprisingly), and the rate at
which it is performed is called the sampling frequency.
For audio, typical sampling rates are from 8 kHz (8,000 samples per second) to
48 kHz. The human ear can hear from about 20 Hz (a very deep rumble) to as
much as 20 kHz; above this level, we enter the range of ultrasound. The human
voice can reach approximately 4 kHz and we need to bound our sampling rate
from below by at least double this frequency (see the discussion of the Nyquist
sampling rate, below). Thus we arrive at the useful range about 8 to 40 or so
kHz
Nyquist Theorem
Signals can be decomposed into a sum of sinusoids, if we are willing to use
enough sinusoids. shows how weighted sinusoids can build up quite a complex
signal. Whereas frequency is an absolute measure, pitch is a perceptual,
subjective quality of sound - generally, pitch is relative.
Note that the true frequency and its alias are located symmetrically on the
frequency axis with respect to the Nyquist frequency pertaining to the sampling
rate used. For this reason, the Nyquist frequency associated with the sampling
frequency is often called the "folding" frequency. That is to say, if the sampling
frequency is less than twice the true frequency, and is greater than the true
Coding of Audio
Quantization and transformation of data are collectively known as coding of
the data. For audio, the ч-law technique for companding audio signals is
usually combined with a simple algorithm that exploits the temporal
redundancy present in audio signals.
Differences in signals between the present and a previous time can effectively
reduce the size of signal values and, most important, concentrate the histogram
of pixel values (differences, now) into a much smaller range.
DPCM
Differential Pulse Code Modulation is exactly the same as Predictive Coding,
except that it incorporates a quantizer step. Quantization is as in PCM and can
be uniform or nonuniform. One scheme for analytically determining the best
set of nonuniform quantizer steps is the Lloyd-Max quantizer, named for Stuart
Lloyd and Joel Max, which is based on a least squares minimization of the
error term.
DM
DM stands for Delta Modulation, a much-simplified version of DPCM often
used as a quick analog-to-digital converter.
ADPCM
Adaptive DPCM takes the idea of adapting the coder to suit the input much
further. Basically, two pieces make up a DPCM coder: the quantizer and the
predictor. Above, in Adaptive DM,we adapted the quantizer step size to suit the
input. In DPCM, we can adaptively modify the quantizer, by changing the step
size as well as decision boundaries in a nonuniform quantizer.
We can carry this out in two ways: using the properties of the input signal
(called forward adaptive quantization), or the properties of the quantized
output. For if quantized errors become too large, we should change the
nonuniform Lloyd-Max quantizer (this is called backward adaptive
quantization).
MIDI Overview
MIDI, which dates from the early 1980s, is an acronym that stands for Musical
Instrument Digital Interface. It forms a protocol adopted by the electronic
music industry that enables computers, synthesizers, keyboards, and other
musical devices to communicate with each other. A synthesizer produces
synthetic music and is included on sound cards, using one of the two methods
discussed above. The MIDI standard is supported by most synthesizers, so
sounds created on one can be played and manipulated on another and sound
reasonably close. Computers must have a special MIDI interface, but this is
incorporated into most sound cards. The sound card must also have both DA
and AD converters. MIDI is a scripting language - it codes "events" that stand
for the production of certain sounds. Therefore, MIDI files are generally very
small. For example, a MIDI event might include values for the pitch of a single
note, its duration, and its volume.