0% found this document useful (0 votes)
7 views

chapter-2

Uploaded by

Estd 20xx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

chapter-2

Uploaded by

Estd 20xx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

Lecture 2 – Sound and Audio

Introduction
Basic Sound Concepts
Representation and formats
Basic Music (MIDI) Concepts
Devices
Messages
Standards and softwares
Speech:
Generation
Analysis and transmission

1
Basic Sound Concepts
Acoustics
 study of sound - generation, transmission and reception of
sound waves.
Sound is produced by vibration of matter.
 During vibration, pressure variations are created in the
surrounding air molecules.
 Pattern of oscillation creates a waveform
 the wave is made up of pressure differences.

 Waveform repeats the same shape at intervals called a


period.
 Periodic sound sources - exhibit more periodicity, more

musical - musical instruments, wind etc.


 Aperiodic sound sources - less periodic - unpitched

percussion, sneeze, cough.


2
Basic Sound Concepts
Sound Transmission
 Sound is transmitted by molecules bumping into each
other.
 Sound is a continuous wave that travels through air.

Sound is detected by measuring the pressure level


at a point.
Receiving
 Microphone in sound field moves according to the
varying pressure exerted on it.
 Transducer converts energy into a voltage level (i.e.
energy of another form - electrical energy)
Sending
 Speaker transforms electrical energy into sound waves.
3
Frequency of a sound wave
Frequency is the reciprocal value of the period.

period
Air
pressure
amplitude

time

4
Basic Sound Concepts
Wavelength is the distance travelled in one cycle
Frequency represents the number of periods in a
second (measured in hertz, cycles/second).
 Frequency is the reciprocal value of the period.
 Human hearing frequency range: 20Hz - 20Khz, voice is

about 500Hz to 2Khz.


Infrasound from 0 - 20 Hz
Human range from 20Hz - 20KHz
Ultrasound from 20kHz - 1GHz
Hypersound from 1GHz - 10THz

5
Basic Sound Concepts
Amplitude of a sound is the measure of the
displacement of the air pressure wave from its
mean or quiescent state.
Subjectively heard as loudness. Measured in
decibels.
0 db - essentially no sound heard
35 db - quiet home
70 db - noisy street
120db - discomfort

6
Computer Representation of Audio
A transducer converts pressure to voltage levels.
Convert analog signal into a digital stream by
discrete sampling.
 Discretization both in time and amplitude (quantization).
In a computer, we sample these values at intervals
to get a vector of values.
A computer measures the amplitude of the
waveform at regular time intervals to produce a
series of numbers (samples).

7
Computer Representation of Audio
Sampling Rate:
 rate at which a continuous wave is sampled (measured in

Hertz)
 CD standard - 44100 Hz, Telephone quality - 8000 Hz.

 Direct relationship between sampling rate, sound quality

(fidelity) and storage space.


 Question
 How often do you need to sample a signal to avoid losing

information?
 Answer
 To decide a sampling rate - must be aware of difference

between playback rate and capturing(sampling) rate.


 It depends on how fast the signal is changing. In reality - twice

per cycle (follows from the Nyquist sampling theorem).


8
Sampling
Sample
Height

samples

9
Quantization and Sampling
Sample
Height

0.75
0.5

0.25
samples

10
Audio Formats
 Audio formats are characterized by four parameters
 Sample rate: Sampling frequency
 Encoding: audio data representation
 -law encoding corresponds to CCITT G.711 - standard for

voice data in telephone companies in USA, Canada, Japan


 A-law encoding - used for telephony elsewhere.
 A-law and -law are sampled at 8000 samples/second with

precision of 12bits, compressed to 8-bit samples.


 Linear Pulse Code Modulation(PCM) - uncompressed audio

where samples are proportional to audio signal voltage.


 Precision: number of bits used to store audio sample
 -law and A-law - 8 bit precision, PCM can be stored at various

precisions, 16 bit PCM is common.


 Channel: Multiple channels of audio may be interleaved at
sample boundaries.
11
Audio Formats
Available on UNIX
au (SUN file format), wav (Microsoft
RIFF/waveform format), al (raw a-law), u (raw u-
law)…
Available on Windows-based systems (RIFF
formats)
wav, midi (file format for standard MIDI files), avi
RIFF (Resource Interchange File Format)
tagged file format (similar to TIFF).. Allows
multiple applications to read files in RIFF format
RealAudio, MP3 (MPEG Audio Layer 3)
12
Computer Representation of Voice
Best known technique for voice digitization is
pulse-code-modulation (PCM).
Consists of the 2 step process of sampling and
quantization.
Based on the sampling theorem.
 If voice data are limited to 4000Hz, then PCM samples
8000 samples per second which is sufficient for input
voice signal.
PCM provides analog samples which must be
converted to digital representation.
 Each of these analog samples must be assigned a binary
code. Each sample is approximated by being quantized.
13
Computer Representation of Music
 MIDI (Music Instrument Digital Interface)
 standard that manufacturers of musical instruments use
so that instruments can communicate musical
information via computers.
 MIDI has two distinct components
 Hardware

 Connects the equipments


 It specifies the physical connection between musical

instruments
 A MIDI port is built into an instrument so that MIDI Cable can

be plugged in to connect different instruments


 It deals with electronic signals that are sent over the cable

14
MIDI Contd…
Data Format
 Encodes the information through the hardware
 Data format doesn’t include individual music

samples but describes instrumental data format


 Encoding includes notion of beginning and end of

notes, fundamental frequency and other musical


information
 MIDI data format is digital and the data are

grouped into MIDI messages

15
MIDI Devices
Any musical instrument that satisfies both
components of MIDI specification
MIDI hardware includes
 Sound generator: to produce audio signal.
 Microprocessor: For processing of produced sound
 Keyboard :to have direct control over synthesizer.
 Control panel: for controlling functions that are not directly
concerned with notes and duration e.g. menu, volume
 Auxiliary controllers: to give more control over the notes
played on the keyboard. Very common are pitch bend and
modulation
 Memory : for storing.
 Sequencer: can store data, which is a computer application.
 Synthesizer: looks like a simple piano keyboard with a panel
full of buttons
16
MIDI Modes
There are two categories of MIDI modes, OMNI and
POLY, which can be combined four different ways:
 Mode 1 -- Omni On / Poly
 Mode 2 -- Omni On / Mono
 Mode 3 -- Omni Off / Poly
 Mode 4 -- Omni Off / Mono
The Omni (meaning "all") modes determine whether a
synthesizer will respond to incoming data on an
individual MIDI channel or to data on any channel.
In Omni On mode, a receiving instrument will play all
incoming MIDI information, regardless of the MIDI
channel.
In Omni Off mode, an instrument responds only to
information on the single channel to which it is set,
which is called an instrument's basic channel.
17
MIDI Messages
MIDI messages transmit information between
MIDI devices and determine what kind of musical
events can be passed between different devices
MIDI messages consist of Status byte and Data
byte
Status byte describe the kind of message
Data byte describe the message itself

18
MIDI Messages Contd…
There are two types of MIDI messages
Channel Message
 Goes only to specified devices
 Channel Voice Message
 Sends the actual performance data between MIDI devices

describing instrument action, controller action and the control


panel changes
 Describes music by defining pitch, amplitude, duration and

other sound information

 Channel Mode Message


 Determines the way receiving channel responds to the voice

messages
 Sets the channel reception mode, stops playing the fake notes

and affects the legal control of devices


19
MIDI Messages Contd…
System Message
 Goes to all the devices in the system
 System Real-time Message
 Short and simple one byte information used to

synchronize the timing of MIDI devices


 To avoid delays, they are transmitted in the middle of

other messages
 System Common Message
 Commands that prepare sequencers and synthesizers to

play music
 These messages are system generic

 System Exclusive Message


 MIDI manufacturer’s customized messages
 Specific to MIDI devices and the system

20
MIDI Standards
 MIDI reproduces traditional note length using
MIDI clock. Using a MIDI clock, a receiver can
synchronize with the clock cycle of the sender
 As an alternate, the SMPTE timing standard
(Society of Motion Picture and Television
Engineering) can be used.which is used for,set
of cooperating standards to label individual
frames of video or film with a time code
defined
 It was originally developed by NASA, which is
very precise
 21
MIDI Software
 4 major categories
 Music recording and performance applications
 Musical notations and printing application
 Synthesizer patch editors and librarians
 Music education application

 Current MIDI-based computer system is


interactive
 Processing chain of interactive computer music
system can be conceptualized in 3 stages
 Sensing stage
 Processing stage
 Response stage

22
Speech
Any sound that can be “generated”, “perceived”
and “Understood” naturally by humans and
artificially by machines .
Bears following properties
During certain interval of time speech signals show
periodic nature
The spectrum of speech signals show characteristic
maxima, which are 3-5 frequency bands

23
Speech Processing
Involves following processes
 Speech Generation
 Helmholtz built a mechanical vocal tract, coupling together several
mechanical resonators t generate sound
 Dudley produced first speech synthesizer through imitation of
mechanical vibration using electrical oscillation
 Speech generation has following basic requirements
 Generation of real-time speech signal
 Generation of natural and understandable speech signal

24
Speech Processing
 Vowels
 Created by free passage of air through the larynx and oral cavity
 a,e,i,o,u.
 Consonants
 Created by partial or complete obstruction of air through the larynx
and oral
 b,c,d,f,g,h,j,k,l,m,n,p,q,r,s,t,v,w,x,y,z

Methodology
 Time-dependent Concatenation
 Frequency dependent Concatenation

25
Fig:-Speech recognition and synthesis front ends

26
Time-dependent Concatenation
 Individual speech units are composed like building blocks,
where the composition can occur at different levels
 In the simplest case, the individual phones are understood as
speech units
 It is possible with just a few phones to create an unlimited
vocabulary.
 However, transitions between individual phones prove to be
extremely problematic.
 Therefore the phones in their environment are considered in
the second level.
 To make the transition problem easier, syllables are created.
 The speech is generated through the set of syllables.
 The best pronunciation is achieved through storage of the
whole word.

27
Speech Generation (Time dependent)
^
k r m
Crumb (Phone sound concatenation)

28
Frequency-dependent Concatenation
 Speech generation can also be based on a frequency
dependent sound concatenation eg formant synthesis
 Formants are frequency maxima in the spectrum of the
speech signal.
 Formant synthesis simulates the vocal tract though a filter.
 The characteristic values are the filter’s middle frequencies
an their bandwidths.
 A pulse signal with a frequency is chosen as a simulation of
voiced sound.
 On the other hand unvoiced sounds are created though a
noise generator
 New sound specific methods provide a sound concatenation
with combined time and frequency dependencies.
29
Speech Analysis
Speech Analysis

Who ? What ? How ?

Verification Identification Recognition Understanding

Research areas of speech analysis

30
Speech Analysis
Human speech has certain characteristics
determined by a speaker.
Speech analysis can then serve to analyze who is
speaking.
The computer identifies and verifies the speaker
using an acoustic fingerprint.
An Acoustic fingerprint is a digitally stored
speech probe of a person.

31
Speech Analysis
Another main task of speech analysis is to analyze
what has been said.
Based on speech sequence the corresponding text is
generated.
This can lead to a speech controlled typewriter, a
translation system or part of a workplace for the
handicapped.
Another area of speech analysis tries to research
speech pattern with respect to how a certain
statement was said.
Eg. A spoken sentence sounds differently if a person
is angry or calm.
The application of this research could be a lie
detector
32
Speech Analysis
Speech analysis is of strong interest for multimedia
system.
Together with speech synthesis, different media
transformations can be implemented.
The primary goal of speech analysis is to correctly
determine individual words with probability <=1.
A word is recognized only with a certain probability.
Here environmental noise, room acoustics and
speaker’s physical and psychological conditions play
an important role.

33
Speech Recognition
Sound pattern Syntax
Semantics
Word Models

Recognized Semantic
Acoustic and Syntactical speech Analysis
Speech phonetic Analysis
Analysis Understand
Speech

Components of Speech Recognition and Understanding

34
Speech Recognition
 In the first step, the principle is applied to a sound pattern and/or
word model.
 An acoustical and phonetical analysis is performed.
 In the second step, certain speech units go through syntactical
analysis: thereby, the errors of the previous step can be
recognized.
 Very often during the first step, no unambiguous decisions can be
made.
 In this case, syntactical analysis provides additional decision help
and result is a recognized speech.
 The third step deals with the semantics of the previously
recognized language.
 Here the decision errors of the semantics of the previously and
corrected with other analysis methods.
 Even today, this step is non-trivial to implement with current
methods known as Artificial Intelligence and neural nets research.
 The result of this step is an understood speech.
35
Speech Transmission
Analog speech signal
A / D Converter

Speech Analysis

Coded Speech

Reconstruction

Analog speech
D / A Converter signal

Components of a speech transmission system


36
Speech Transmission
The area of speech transmission deals with efficient
coding of the speech signal to allow speech/sound
transmission at low transmission rates over networks
The goal is to provide the receiver with the same
speech / sound quality as was generated at the sender
side.
Some principles that are connected to speech
generation and recognition
 Signal Form Coding
 Source Coding
 Recognition/Synthesis Methods
 Achieved Quality
37
Speech Transmission
Signal Form Coding
 This kind of coding considers no speech-specific properties
and parameters.
 Here the goal is to achieve the most efficient coding of the
audio signal.
 The data rate of a PCM coded stereo audio signal with CD
quality requirements is
44100 16 bit
Rate = 2* * = 1,411,200 bits/s
2 8 bit / byte
 Telephone quality, in comparison to CD-quality, needs only 64
Kbits/s.
 With Difference pulse code modulation (DPCM) data rate can be
lowered to 56Kbits/s with out a loss of quality
 Adaptive pulse code modulation (ADPCM) allows a further rate
reduction to 32 Kbits/s

38
Speech Transmission
Source Coding
 Parameterized system work with source coding algorithm
 The specific speech characteristics are used for data rate
reduction.
Recognition/Synthesis Methods
 There have been attempts to reduce the transmission rate
using pure recognition/synthesis methods.
 Speech analysis (recognition) follows on the sender side of a
speech transmission system and speech synthesis
(generation) follows on the receiver side.
Achieved Quality
 How to achieve the minimal data rate for a given quality in
transmission
 One can assume that for telephone quality, a date rate of 8
Kbits/s is sufficient.

39

You might also like