Chapter 2 SOUND AUDIO Systems
Chapter 2 SOUND AUDIO Systems
Example 1:
• Voice of baby is of high pitch.
• Voice of grown man is of low pitch.
Example 2:
• Chirping of bird is of high pitch.
• Roaring of lion is of low pitch.
Basic Sound Concepts
• Dynamic and Bandwidth:
→Dynamic range means the change in sound levels.
→For example: a large orchestra can reach 130dB at its climax and
drop to low as 30dB at its softest, giving a range of 100dB.
→Bandwidth is the range of frequencies a device can produce or a
human can hear.
Fun facts about Sound
1. Sound cannot travel through space since there are no molecules to
travel through. Here on earth, we have air molecules that vibrate in and
around our ears.
2. Do you know what is louder than a car horn? The cry of a human
baby, which is about 115 decibels.
3. The loudest natural sound on earth is caused by an erupting volcano.
4. Dogs are capable of hearing sounds at a much higher frequency than
humans can. They can hear sounds or noises humans cannot.
5. Flies can not hear any kind of sound. Not even their own buzzing.
Fun facts about Sound
6. Since particles are closer together in water than in air, sound can
travel four times faster in water.
7. Sound travels at a speed of around 767 miles per hour.
8. The majority of cows that listen to music end up producing more
milk than those who do not.
9. Horror films like to use infrasound, which is below the range of
human hearing. It creates shivering, anxiety, and even heart
palpitations in humans when it is being played.
Computer Representation of Sound
• Sound waves are continuous while computers are good at handling
discrete numbers. In order to store a sound wave in a computer,
samples of the wave are taken. Each sample is represented by a
number, the ‘code’. This process is known as digitization.
• Digitization is a process of converting the analog signals to a digital
signal. There are three steps of digitization of sound. These are:
a. Sampling
b. Quantization
c. Sound Hardware
Sampling
• A sampling rate is the number of times the analog sound is taken per
second.
• A higher sampling rate implies that more samples are taken during
the given time interval and ultimately, the quality of reconstruction is
better.
• CDs, for example, contain audio that was converted at a rate of 44.1
kHz/second, which means the original analogue recording was
sampled 44,100 times for every second of music.
Sampling
• Sound is made up of waves of different frequencies. The human ear
can hear frequencies up to about 20,000 hertz (Hz).
• The Nyquist sampling theorem says that in order to accurately record
sound, we need to sample it at least twice the highest frequency we
want to record. So, to record sound up to 20,000 Hz, we need to
sample it at least 40,000 times per second.
• This is why one of the most popular sampling rates for high quality
sound is 44,100 samples per second.
Quantization
• Quantization is a process of representing the amplitude of each
sample as integers or numbers.
• The number of bits used to represent each sample is called the
sample size or bit depth. A higher sample size means that more detail
can be captured in the recording.
• Commonly used sample sizes are 8 bits and 16 bits.
Quantization
• An 8-bit sample size provides 256 different levels of amplitude, while
a 16-bit sample size provides 65,536 different levels of amplitude.
• The value of each sample is rounded off to the nearest integer.
• If the amplitude of the signal is greater than the maximum value that
can be represented by the sample size, then clipping occurs.
• Clipping is when the top or bottom of the waveform is cut off, which
can cause distortion.
Sound hardware
• Before sound can be processed, a computer needs input/ output
devices.
• Microphone jacks and built in speakers are devices connected to an
ADC and DAC respectively for input and output of audio.
Quality versus File Size
• The size of a digital recording depends on the sampling rate,
resolution and number of channels.
• S = R * (b/8) * C * D
• S → file size bytes
• R → sampling rate (samples / second)
• b → resolution bits
• C → channels 1 - mono, 2 – stereo
• D → recording duration seconds
• Higher sampling rate, higher resolution gives higher quality but bigger
file size.
• For example, if we record 10 seconds of stereo music with sampling
rate 44.1kHz, 16 bits, the size will be:
• S =44100 * (16/8) * 2 * 10
• = 1,764,000 bytes =1722.7 Kbytes= 1.68 Mbytes
• High quality sound files are very big, however, the file size can be
reduced by compression.
File Size for common sampling rates and
resolution
Audio file format
• The most commonly used digital sound format in Windows systems is .wav files.
• Sound is stored in .wav as digital samples known as Pulse Code Modulation
(PCM).
• Each .wav file has a header containing information of the file.
❑type of format, e.g., PCM or other modulations
❑size of the data
❑number of channels
❑samples per second
❑bytes per sample
• There is usually no compression in .wav files.
• Other format may use different compression technique to reduce file size.
• .vox use Adaptive Delta Pulse Code Modulation (ADPCM).
• .mp3 MPEG-1 layer 3 audio.
Types of Digital Audio file formats
1. WAV (Waveform Audio File Format): A standard audio format used
primarily on Windows systems. WAV files can contain
uncompressed audio data and are known for their high audio
quality. They are often used for professional audio recording and
editing.
2. MP3 (MPEG Audio Layer III): One of the most popular and widely
used audio formats. MP3 files use lossy compression to reduce file
size while maintaining reasonable audio quality. They are commonly
used for music distribution and playback.
Types of Digital Audio file formats
3. AIFF (Audio Interchange File Format): Similar to WAV, AIFF is a high-
quality audio format commonly used on Apple systems. It supports
uncompressed audio data and is often used for professional audio
applications.
4. AAC (Advanced Audio Coding): Another widely used audio format
that offers better sound quality at lower bit rates compared to MP3.
AAC is commonly used for music streaming and is the default format
for Apple's iTunes and iOS devices.
Types of Digital Audio file formats
5. FLAC (Free Lossless Audio Codec): A lossless compression format
that retains the original audio quality while reducing file size. FLAC files
are often used by audiophiles and for archiving high-quality audio.
6. Opus: A relatively newer audio format designed for efficient
compression and high audio quality at low bit rates. Opus is suitable for
both voice and music and is often used in real-time communication and
streaming applications.
Audio Hardware
• Recording and Digitizing sound:
❑An analog-to-digital converter (ADC) converts the analog sound signal into
digital samples.
❑A digital signal processor (DSP) processes the sample, e.g. filtering,
modulation, compression, and so on.
• Play back sound:
❑A digital signal processor processes the sample, e.g. decompression,
demodulation, and so on.
❑An digital-to-analog converter (DAC) converts the digital samples into sound
signal.
• All these hardware devices are integrated into a few chips on a sound
card.
Audio Hardware
• Different sound card have different capability of processing digital
sounds.
• When buying a sound card, you should look at:
❑maximum sampling rate
❑stereo or mono
❑duplex or simplex.
Audio Software
• Windows device driver: controls the hardware device.
• Device manager: the user interface to the hardware for configuring
the devices.
❑You can choose which audio device you want to use.
❑You can set the audio volume.
Audio Software
• Mixer: its functions are:
❑To combine sound from different sources.
❑To adjust the playback volume of sound sources.
❑To adjust the recording volume of sound sources.
• Recording: Windows has a simple Sound Recorder.
• Editing: The Windows Sound Recorder has a limiting editing function,
such as changing volume and speed, deleting part of the sound.
• There are many freeware and shareware programs for sound
recording, editing and processing.
Computer Music
• Sounds, whether they come from nature or are created by people,
can be complicated because they contain many different pitches.
• It's relatively easy to record complicated sounds using digital
technology. But making these kinds of sounds from scratch, known as
synthesis, is harder.
• There's a more effective method to create great music, called MIDI
(Musical Instrument Digital Interface).
Computer MIDI
• Musical Instrument Digital Interface.
• It is a communication standard developed in the early 1980s for
electronic instruments and computers.
• It specifies the hardware connection between equipment as well as
the format in which the data are transferred between the equipment.
• Common MIDI devices include electronic music synthesizers,
modules.
General MIDI
• General MIDI is a standard specified by MIDI Manufacturers
Association.
• For a device to work with General MIDI, it needs to follow some rules:
❑It should have at least 24 different sounds it can make.
❑It can play sounds on 16 different channels.
❑It must be able to play 16 sounds at the same time with different kinds of
instruments.
❑It should have at least 128 ready-made sounds that can be used.
❑It has to work with certain controls.
MIDI Hardware
Step 1: Generation of a Sound Script Transcription from text to a sound script using
a library containing (language specific) letter ‐ to ‐phone rules. A dictionary of
exceptions is used for word with a non ‐standard pronunciation.
Step 2: Generation of Speech The sound script is used to drive the time ‐ or
frequency ‐dependent sound concatenation process.
Problem of speech synthesis
• Ambiguous pronunciation. In many languages, the pronunciation of
certain words depends on the context.
• Example: ‘lead’
• This is not so much of a problem for the German language
• It is a problem for the English language
• Anecdote by G. B. Shaw:
▪ if we pronounce “gh” as “f” (example: “laugh“)
▪ if we pronounce “o” as “i” (example: “women”)
▪ if we pronounce “ti” as “sh” (example: “nation”), then why don’t we write
“ghoti” instead of fish?
Speech Analysis
• Purpose of Speech Analysis:
▪ Who is speaking: speaker identification for security purposes
▪ What is being said: automatic transcription of speech into text
▪ How was a statement said: understanding psychological factors of a speech
pattern (was the speaker angry or calm, is he lying, etc).
• The primary goal of speech analysis in multimedia systems is to
correctly determine individual words (speech recognition).
Speech Recognition System
• Speech analysis is of strong interest for multimedia systems.
• By understanding how speech works and creating speech, we can
make different types of media changes.
• The main aim of studying speech is to figure out words correctly, but
it's not always 100% certain.
• Sometimes, things like background noise, the place where you are,
and how the speaker is feeling can affect this.
Speech Recognition System