0% found this document useful (0 votes)
19 views

SP - 3301PPT

Uploaded by

nnaidu2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

SP - 3301PPT

Uploaded by

nnaidu2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 152

SPEECH

PROCESSING
DR K.KALYAN BABU
SYLLABUS
MODULE 1

• Purpose of speech is Communication. It can be represented in form of


Message or Information.
• Signal carrying message is called Acoustic wave form.
• Message information is converted into various forms.
• Message is converted into set of neural signals which control the
articulatory mechanism.(Lips, tongue and vocal chords)
• Information in speech is intrinsically discrete in nature.
• Symbols for which every sound are called Phonemes.
DIGITAL SPEECH PROCESSING
DIGITAL SPEECH PROCESSING

Bandlimited speech signals can be


represented by samples periodically
taken in time domain.
The above figure represents discrete
form of speech signal.

They are of waveforms and parametric


excitation.
APPLICATIONS OF
DIGITAL SPEECH
PROCESSING
DIGITAL TRANSMISSION AND STORAGE

ONE OF THE SPEECH PROCESSING THE PURPOSE OF VOCODER IS TO REDUCE THE NEED FOR BANDWIDTH
APPLICATIONS IN VOCODER IE VOICE BANDWIDTH REQUIRED FOR CONSERVATION IS REQUIRED INSPITE OF
CODER. TRANSMISSION OF SPEECH SIGNAL. INCREASED BANDWIDTH IN SATELLITE
COMMUNICATION, MICRO WAVE AND
OPTICAL COMMUNICATION.
SPEECH SYNTHESIS
SYSTEMS
SPEAKER VERIFICATION
AND IDENTIFICATION
• It notifies the authentication of Speaker.
• Verification seems to be the speaker he claims to
be.

• Identification is which speaker among ensemble


of speakers produced the given utterance of
speech.
• It is used in Forensic applications.
SPEECH
RECOGNITION
Conversion form an acoustic wave form to a
written equivalent of message.
The speech signal comprising with speech
synthesize systems for better low bitrate
communication.
AIDS TO HANDI
CAPPED
• This is better matched handicapped person
than is normally available.
• Variable rate of play back of prerecorded
tapes provide way for blind people go to any
desired place with the available speech
material.
• Design sensory aids and visual information
which can be useful to Deaf people.
ENHANCEMENT OF
SPEECH QUALITY
DIGITAL MODELS OF SPEECH:

These sounds and


Speech is composed These sounds are
transitions of sounds
of sequence of governed by rules of
is represented as a
sounds. language.
symbolic form.

The study of these


rules and implications
The study of sounds is
of human
called phonetics.
communication is
called linguistics.
HUMAN VOCAL
TRACT:
Vocal tract is opening between vocal chords.GLOTTIS is end of
Lips.
Pharynx is bridge between esophagus and mouth. It is also
called oral cavity.
DIGITAL MODELS OF
SPEECH
• VOCAL TRACT CONSISTS OF PHARYNX
and MOUTH. Or Oral Cavity.
• The length of vocal tract is 17 cms.
• The cross section of Vocal tract is 20
Cm2.
• Nasal tract starts at Valium and ends at
nostrils.
• When velum is lowered, nasal tract is
acoustically connected to vocal tract to
produce nasal sound of speech.
DIGITAL MODELS OF SPEECH PROCESSING:
DIAGRAM OF VOCAL SYSTEM:
• The above diagram consists of
Lungs, Bronchii, and trachea.
• Sub glottal system acts as
source of energy for
production of speech.
• Speech is defined as acoustic
wave that is radiated from this
system when air is expelled
from lungs, resulting in air
flow in constraint of vocal
tract.
SPECTROGRAM
OF “ SHOULD
WE CHASE”
VISIBLE
SPEECH
ACOUSTIC
PHENOTICS:

• Most languages like English is


composed of set of distinctive
sounds called phonemes.
• There are 42 phonemes which
include Vowels, dipthongs, Semi
vowels and consonents.
• There are set of American English
converted into phoneme classes.
• The major 4 classes are Vowels,
Dipthongs , Semi vowels and
Consonents.
• These are further divided into sub
classes.
SOUNDS ARE 2 A. CONTINUANT B. NON
TYPES: SOUND CONTINUANT
SOUND.
SOUNDS

• Continuant sound is produced


by Fixed vocal tract source.
• They comprise Vowels and
Fricatives(Both Voiced and
Unvoiced) and also Nasals.
• Remaining sounds are
produced by variant vocal
tract system.
• They include Semi vowels,
Dipthongs and Consonants.
VOWEL
TRIANGLE
FORMANT
FREQUENCIE
S FOR MALE
SPEAKERS
AMERICAN
ENGLISH
VOWELS
WAVE FORMS
DIPTHONGS

Although there is controversial definition of what is and


what is not dipthong is it is gliding from one
monosyllable to another monosyllable.

According to English literature there are six dipthongs.

A dipthong is produced by varying vocal tract between


vowel configurations.
DIPTHONG
CHARACTERIZATION

• Variation of two dipthongs for F1


First formant frequency and
Second formant frequency F2.
• F1=F2.
SEMIVOWEL
S
W, l, r, y are very difficult to
characterize.
A semivowel is defined as
vowel like nature gliding
transition in vocal tract.
NASALS
NASAL
CHARACTERI
ZATION
UNVOICED
FRICATIVES
VOICED
FRICATIVES
VOICED STOPS • THEY ARE PRODUCED BY TRANSIENT,
NON CONTINUANT SOUNDS WHICH ARE
PRODUCED BEHIND THE PRESSURE IN
ORAL CAVITY and SUDDENLY
RELEASING PRESSURE.
UN VOICED STOPS

• CHARACTERIZATION
• DEFINITION
SOUND
PROPAGATION –
ACOUSTIC THEORY
SPEECH MODELLING
SPEECH ACOUSTIC
PRODUCTION
UNIFORM LOSSLESS TUBE
UNIFORM LOSSLESS TUBE
UNIFORM
LOSSLESS
TUBE
• EQUATIONS
UNIFORM
LOSSLESS TUBE:
DIGITAL MODELS OF SPEECH

VOCAL COMPLETE
RADIATION EXCITATION
TRACT MODEL
VOCAL
TRACT
VOCAL TRACT
RADIATIO
N-
EXCITATIO
N
COMPLET
E MODEL
COMPLET
E MODEL
AUDITORY PERCEPTION
AUDITORY PERCEPTION
AUDITORY PERCEPTION
EAR
BIOLOG
Y:
MIDDLE EAR
INNER
EAR:
SEM OF EAR:HAIR CELLS
AUDITORY
PERCEPTIO
N:
AMPLITUD
E
VARIATION
S
BAND WIDTH
AUDIBILITY RANGE:
HEARING PERCEPTION
HEARING PERCEPTION
HEARING PERCEPTION
HEARING PERCEPTION
UNIT II

• TIME DEPENDENT FORM OF SPEECH PROCESSING


ADAPTIVE
QUANTIZER
TIME DOMAIN
PROCESSING OF SPEECH

• A sequence of speech -8000


samples/second.
• Properties of speech signal vary with Time.

• Excitation varies between Voiced And


Unvoiced speech.
• There is considerable variation of Peak
amplitude and there is considerable change
in frequency in voiced region.
• Time domain variation derives Pitch,
Intensity, Excitation and Formant
frequencies.
It is derived that Time domain processing
of speech signal are relatively slow wrt
time variation.

TIME DOMAIN This is called Short time variation of time.


PROCESSING
OF SPEECH The short segments are isolated and
processed separately.

These are called analysis frames which


overlap each other.
The result of processing these frames
can be single number or multiple
numbers.
TIME DOMAIN PROCESSING OF SPEECH

• These frames are dependent on Time which represent a good speech signal.
TIME DOMAIN
PROCESSING OF
SPEECH
• T is called Transformation.T[ ]
• W(n-m) is called window
sequence sample index in n.
TIME
DOMAIN
PROCESSIN
G
SHORT TIME ENERGY
AND MAGNITUDE

• It is known that amplitude of


unvoiced segments is much
lower than Voiced Segments.
WINDO
WING
RESPONS
ES:
RESPONSES
OF
RECTANGULA
R WINDOW
OF VARIOUS
LENGTHS:
HAMMING
WINDOW
RESPONS
E:
AVERAGE
MAGNITUDE
OF
RECTANGULA
R WINDOWS:
HAMMING
WINDOW:
AVERAGE
MAGNITUD
E:
ZERO
CROSSING
S:
BLOCK DIAGRAM OF ZERO
CROSSINGS:
ZERO
CROSSINGS
OF
UNVOICED
AND VOICED
SPEECH:
ZERO
CROSSING
S OF 3
DIFFERENT
UTTERENC
ES:
PITCH
PERIOD
ESTIMATIO
N:
RESPONS
ES OF
PPE:
RESULTS:
PPE
RESULTS:
AUTO
CORRELATIO
N
FUNCTION:
AUTO
CORRELATIO
N
AUTO
CORRELATIO
N:
AUTO
CORRELATIO
N OF
VOICED AND
UN VOICED
SPEECH:
HAMMING
WINDOW
AUTO
CORRELAT
ION:
AUTO
CORRELATION
OF
RECTANGULAR
WINDOW OF
DIFFERENT
LENGTHS:
AMDF
AMDF
CONTD…
AMDF PLOTS:
Major limitation of auto correlation is it uses too much of information in
speech signals.

PPE USING AUTO As a result Auto correlation function has many peaks.

CORRELATION
These peaks result in damped oscillations in response to vocal tract
oscillations.
It is found that peak at k=15 is greater than k=72.

K =auto correlation value.


PPE USING A.C

• To avoid this problem, the speech


signal has to be flattened to get the
useful speech pitch detector.
• This process is called center clipping.
CENTER
CLIPPING IN PPE
WITH A.C
CENTER
CLIPPING
CONTD….
PLOTS OF
CENTER
CLIPPING
DIGITAL REPRESENTATION OF SPEECH
INSTANTANEOUS QUANTIZATION
INSTANTANEOUS QUANTIZATION
3 BIT QUANTIZER

• I=B.Fs
INSTANTAN
EOUS
COMPANDI
NG
INSTANTANE
OUS
COMPANDING
Y(N)=LNX(N)
ADAPTI
VE
QUANTI
ZER
FEED FORWARD QUANTIZER
FEED FORWARD ADAPTIVE QUANTIZER
FEEDBACK
ADAPTATI
ON
3 BIT
ADAPTIVE
QUANTIZER
DIFFERENTIA
L
QUANTIZER
DIFFERENTIAL QUANTIZATION
DELTA
MODULATI
ON
DELTA MODULATION
DELTA
MODULATI
ON
DPCM
DPCM
ADAPTIVE DPCM
LINEAR FILTERING
FILTER BANK
HOMO
MORPHIC
SPEECH
PROCESSING
SHORT TIME CEPSTRA
SHORT TIME CEPSTRA
APPLICATIONS TO PITCH DETECTION

You might also like