0% found this document useful (0 votes)

107 views37 pages

Tied-State HMMs + Introduction To NN-based AMs

1. The document discusses acoustic modeling techniques using tied-state hidden Markov models (HMMs) and deep neural network (DNN)-based models. 2. It describes how triphone HMMs model each phone in the context of its left and right neighboring phones, but this results in a huge number of parameters. Parameter tying techniques like tied mixture models and state tying are used to share parameters across similar triphone models to address the data sparsity issue. 3. Decision trees are used to cluster states of triphone models derived from the same monophone that should have their parameters tied together, in one of the main steps of building a tied-state HMM system. The trees are built using linguistically

Uploaded by

Sammy K

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

107 views37 pages

Tied-State HMMs + Introduction To NN-based AMs

Uploaded by

Sammy K

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Acoustic Modeling:

Tied-state HMMs & DNN-based models

Lecture 7

CS 753
Instructor: Preethi Jyothi
Recall: Acoustic Model
Acoustic  Context  Pronunciation  Language 
Acoustic  Models Transducer Model Model Word 
Indices Triphones Monophones Words Sequence

a/a_b f1:ε f3:ε f4:ε f5:ε

f0:a+a+b
f2:ε f4:ε f6:ε

}
b/a_b
FST Union +
Closure
. Resulting
FST
.
. H
x/y_z
Triphone HMM Models
• Each phone is modelled in the context of its left and right neighbour phones

• Pronunciation of a phone is influenced by the preceding and succeeding phones.  

E.g. The phone [p] in the word “peek” : p iy k” vs. [p] in the word “pool” : p uw l

• Number of triphones that appear in data ≈ 1000s or 10,000s

• If each triphone HMM has 3 states and each state generates m-component GMMs  
(m ≈ 64), for d-dimensional acoustic feature vectors (d ≈ 40) with Σ having d2 parameters

• Hundreds of millions of parameters!  

• Insufficient data to learn all triphone models reliably. What do we do? Share parameters
across triphone models!
Parameter Sharing
• Sharing of parameters (also referred to as “parameter tying”) can be
done at any level:
• Parameters in HMMs corresponding to two triphones are said to be
tied if they are identical
Transition probs  
are tied i.e. t’i = ti
t1 t3 t5 t’1 t’3 t’5
t2 t4 t’2 t’4

State observation densities  

are tied

• More parameter tying: Tying variances of all Gaussians within a state, 

tying variances of all Gaussians in all states, tying individual Gaussians, etc.
1. Tied Mixture Models
• All states share the same Gaussians (i.e. same means and
covariances)

• Mixture weights are specific to each state

Triphone HMMs (No sharing)

Triphone HMMs (Tied Mixture Models)

2. State Tying
• Observation probabilities are shared across states which
generate acoustically similar data
b/a/k p/a/k b/a/g

Triphone HMMs (No sharing)

b/a/k p/a/k b/a/g

Triphone HMMs (State Tying)

Tied state HMMs
Four main steps in building a tied state HMM
system:
1. Create and train 3-state monophone
HMMs with single Gaussian
observation probability densities
2. Clone these monophone
distributions to initialise a set of
untied triphone models. Train them
using Baum-Welch estimation.
Transition matrix remains common
across all triphones of each phone.
3. For all triphones derived from the
same monophone, cluster states
whose parameters should be tied
together.
4. Number of mixture components in
each tied state is increased and
models re-estimated using BW
Image from: Young et al., “Tree-based state tying for high accuracy acoustic modeling”, ACL-HLT, 1994
Tied state HMMs
Four main steps in building a tied state HMM
system:
1. Create and train 3-state monophone
HMMs with single Gaussian
observation probability densities
2. Clone these monophone
distributions to initialise a set of
untied triphone models. Train them
using Baum-Welch estimation.
Transition matrix remains common
across all triphones of each phone.
3. For all triphones derived from the
same monophone, cluster states
whose parameters should be tied
together.
4. NumberWhich
of mixture components
states should be tied in
together?
each tied stateUse decision trees.
is increased and
models re-estimated using BW
Image from: Young et al., “Tree-based state tying for high accuracy acoustic modeling”, ACL-HLT, 1994
Decision Trees

Classification using a decision tree:

Begins at the root node: What property is satisfied?
Depending on answer, traverse to different branches

Shape?
Leafy Cylindrical
Oval
Spinach
Color? Green
Taste?

Sour Neutral White Snakegourd

Turnip
Color?
Tomato White Purple
Radish Brinjal
Decision Trees

• Given the data at a node, either declare the node to be a

leaf or find another property to split the node into branches.

• Important questions to be addressed for DTs:

1. How many splits at a node? Chosen by the user.
2. Which property should be used at a node for splitting?
One which decreases “impurity” of nodes as much as
possible.
3. When is a node a leaf? Set threshold in reduction in
impurity
Tied state HMMs
Four main steps in building a tied state HMM
system:
1. Create and train 3-state monophone
HMMs with single Gaussian
observation probability densities
2. Clone these monophone
distributions to initialise a set of
untied triphone models. Train them
using Baum-Welch estimation.
Transition matrix remains common
across all triphones of each phone.
3. For all triphones derived from the
same monophone, cluster states
whose parameters should be tied
together.
4. NumberWhich
of mixture components
states should be tied in
together?
each tied stateUse decision trees.
is increased and
models re-estimated using BW
Image from: Young et al., “Tree-based state tying for high accuracy acoustic modeling”, ACL-HLT, 1994
How do we build these phone DTs?
1. What questions are used? 
 
Linguistically-inspired binary questions: “Does the left or right phone come
from a broad class of phones such as vowels, stops, etc.?” “Is the left or
right phone [k] or [m]?”

2. What is the training data for each phone state, pj? (root node of DT)
How do we build these phone DTs?
1. What questions are used? 
 
Linguistically-inspired binary questions: “Does the left or right phone come
from a broad class of phones such as vowels, stops, etc.?” “Is the left or
right phone [k] or [m]?”

2. What is the training data for each phone state, pj? (root node of DT)
Training data for DT nodes
• Align training data, xi = (xi1, …, xiTi) i=1…N where xit ∈ ℝd ,
against a set of triphone HMMs
• Use Viterbi algorithm to find the best HMM state sequence
corresponding to each xi
• Tag each xit with ID of current phone along with left-context
and right-context

xit

{
{
{
sil/b/aa b/aa/g aa/g/sil
xit is tagged with ID aa2[b/g] i.e. xit is aligned with the second state
of the 3-state HMM corresponding to the triphone b/aa/g

• For a state j in phone p, collect all xit’s that are tagged with ID pj[?/?]
How do we build these phone DTs?
1. What questions are used? 
 
Linguistically-inspired binary questions: “Does the left or right phone come
from a broad class of phones such as vowels, stops, etc.?” “Is the left or
right phone [k] or [m]?”

2. What is the training data for each phone state, pj? (root node of DT) 
 
All speech frames that align with the jth state of every triphone HMM that
has p as the middle phone

3. What criterion is used at each node to find the best question to split the
data on?  
 
Find the question which partitions the states in the parent node so as to
give the maximum increase in log likelihood
Likelihood of a cluster of states

• If a cluster of HMM states, S = {s1, s2, …, sM} consists of M states

and a total of K acoustic observation vectors are associated with
S, {x1, x2 …, xK} , then the log likelihood associated with S is:
K X
X
L(S) = log Pr(xi ; µS , ⌃S ) s (xi )
i=1 s2S

• For a question q that splits S into Syes and Sno, compute the
following quantity:
q q
q = L(Syes ) + L(Sno ) L(S)

• Go through all questions, find Δq for each question q and choose

the question for which Δq is the biggest

• Terminate when: Final Δq is below a threshold or data associated

with a split falls below a threshold
Likelihood criterion
• Given a phonetic question, let the
initial set of untied states S be split
into two partitions Syes and Sno

• Each partition is clustered to form

a single Gaussian output
distribution with mean μSyes and
covariance ΣSyes

• Use the likelihood of the parent

state and the subsequent split
states to determine which question
a node should be split on

Image from: Young et al., “Tree-based state tying for high accuracy acoustic modeling”, ACL-HLT, 1994
Example: Phonetic Decision Tree (DT)
One tree is constructed for each state of each phone to cluster all the  
corresponding triphone states
DT for center  
state of [ow]
Head node Uses all training data  
aa/ow2/f, aa/ow2/s, 
aa/ow2/d, h/ow2/p, tagged as ow2[?/?]
aa/ow2/n, aa/ow2/g, Is left ctxt a vowel?
…
Yes No

Is right ctxt a
Is right ctxt nasal?
fricative?
No Yes
Yes No
Is right ctxt a Leaf E
Leaf A Leaf B glide? aa/ow2/n, 
aa/ow2/f,  aa/ow2/d,  aa/ow2/m,
aa/ow2/s, aa/ow2/g, Yes No …
… …
Leaf C Leaf D
h/ow2/l,  h/ow2/p, 
b/ow2/r, b/ow2/k,
… …
For an unseen triphone at test time

• Transition Matrix:
• All triphones of a given phoneme use the same
transition matrix common to all triphones of a phoneme
• State observation densities:
• Use the triphone identity to traverse all the way to a leaf
of the decision tree
• Use the state observation probabilities associated with
that leaf
That’s a wrap on HMM-based acoustic models
Acoustic  Context  Pronunciation  Language 
Acoustic  Models Transducer Model Model Word 
Indices Triphones Monophones Words Sequence

a/a_b f1:ε f3:ε f4:ε f5:ε

f0:a:a_b
f2:ε f4:ε f6:ε

}
b/a_b One 3-state   FST Union +
HMM for   Closure
each   Resulting
. tied-state  FST
triphone;
. parameters estimated  H
. using Baum-Welch 
algorithm
x/y_z
DNN-based acoustic models?
Acoustic  Context  Pronunciation  Language 
Acoustic  Models Transducer Model Model Word 
Indices Triphones Monophones Words Sequence
DAHL et al.: CONTEXT-DEPENDENT PRE-TRAINED DEEP NEURAL NETWORKS FOR LVSR 35
H
where is the state (senone) posterior probability esti-
mated from the DNN, is the prior probability of each state
Phone posteriors (senone) estimated from the training set, and is indepen-
dent of the word sequence and thus can be ignored. Although
dividing by the prior probability (called scaled likelihood

}
estimation by [38], [40], [41]) may not give improved recog-
nition accuracy under some conditions, we have found it to be
Can we use
very important in alleviating the label bias problem, especially
deep neural networks 
when the training utterances contain long silence segments.
Resulting
instead of HMMs to 
FST
B. learn mappings
Training   of CD-DNN-HMMs
Procedure
between acoustics  
CD-DNN-HMMs
H
and phones? can be trained using the embedded Viterbi
algorithm. The main steps involved are summarized in Algo-
rithm 1, which takes advantage of the triphone tying structures
and the HMMs of the CD-GMM-HMM system. Note that the
Fig. 1. Diagram of our hybrid architecture employing a deep neural network.
The HMM models the sequential property of the speech signal, and the DNN
logical triphone HMMs that are effectively equivalent are clus-
models the scaled observation likelihood of all the senones (tied tri-phone tered and represented by a physical triphone (i.e., several log-
states). The same DNN is replicated over different points in time. ical triphones are mapped to the same physical triphone). Each
Brief Introduction to Neural Networks
Feed-forward Neural Network

Output  
Layer

Input   Hidden  
Layer Layer
Feed-forward Neural Network 
Brain Metaphor

Single neuron

g
wi yi
xi (activation 
function)

yi=g(Σi wi xi)

Image from: https://upload.wikimedia.org/wikipedia/commons/1/10/Blausen_0657_MultipolarNeuron.png

Feed-forward Neural Network 
Parameterized Model

1
w13 3 w35
x1
w14
w23 5 a5

2 4 w45
x2 w24
Parameters of  
a5 = g(w35 ⋅ a3 + w45 ⋅ a4) the network: all wij 
= g(w35 ⋅ (g(w13 ⋅ a1 + w23 ⋅ a2)) +   (and biases not 
w45 ⋅ (g(w14 ⋅ a1 + w24 ⋅ a2))) shown here)

If x is a 2-dimensional vector and the layer above it is a 2-dimensional

vector h, a fully-connected layer is associated with:
h = xW + b
where wij in W is the weight of the connection between ith neuron in the
input row and jth neuron in the first hidden layer and b is the bias vector
Feed-forward Neural Network 
Parameterized Model

1
w13 3 w35
x1
w14
w23 5 a5

2 4 w45
x2 w24
a5 = g(w35 ⋅ a3 + w45 ⋅ a4)
= g(w35 ⋅ (g(w13 ⋅ a1 + w23 ⋅ a2)) +  
w45 ⋅ (g(w14 ⋅ a1 + w24 ⋅ a2)))
The simplest neural network is the perceptron:
Perceptron(x) = xW + b
A 1-layer feedforward neural network has the form:
MLP(x) = g(xW1 + b1) W2 + b2
Common Activation Functions (g)
Sigmoid: σ(x) = 1/(1 + e-x)

nonlinear activation functions

1.0
0.8
sigmoid
0.6
output
0.4
0.2
0.0

−10 −5 0 5 10
x
Common Activation Functions (g)
Sigmoid: σ(x) = 1/(1 + e-x)
Hyperbolic tangent (tanh): tanh(x) = (e2x - 1)/(e2x + 1)

nonlinear activation functions

1.0
tanh
sigmoid
0.5
output
0.0
−0.5
−1.0

−10 −5 0 5 10
x
Common Activation Functions (g)
Sigmoid: σ(x) = 1/(1 + e-x)
Hyperbolic tangent (tanh): tanh(x) = (e2x - 1)/(e2x + 1)
Rectified Linear Unit (ReLU): RELU(x) = max(0, x)
nonlinear activation functions

10
ReLU
tanh

8
sigmoid
6
output
4
2
0

−10 −5 0 5 10
x
Optimization Problem

• To train a neural network, define a loss function L(y,ỹ):  

a function of the true output y and the predicted output ỹ

• L(y,ỹ) assigns a non-negative numerical score to the neural

network’s output, ỹ

• The parameters of the network are set to minimise L over

the training examples (i.e. a sum of losses over different
training samples)

• L is typically minimised using a gradient-based method

Stochastic Gradient Descent (SGD)

SGD Algorithm

Inputs:  
Function NN(x; θ), Training examples, x1 … xn and  
outputs, y1 … yn and Loss function L.

do until stopping criterion 

Pick a training example xi, yi 
Compute the loss L(NN(xi; θ), yi) 
Compute gradient of L, ∇L with respect to θ 
θ ← θ - η ∇L  
done

Return: θ
Training a Neural Network

Define the Loss function to be minimised as a node L

Goal: Learn weights for the neural network which minimise L

Gradient Descent: Find ∂L/∂w for every weight w, and update it as  

w ← w - η ∂L/ ∂w

How do we efficiently compute ∂L/∂w for all w?

Will compute ∂L/∂u for every node u in the network!

∂L/∂w = ∂L/∂u ⋅ ∂u/∂w where u is the node which uses w

Training a Neural Network

New goal: compute ∂L/∂u for every node u in the network

Simple algorithm: Backpropagation

Key fact: Chain rule of differentiation

If L can be written as a function of variables v1,…, vn, which in turn

depend (partially) on another variable u, then

∂L/∂u = Σi ∂L/∂vi ⋅ ∂vi/∂u

Backpropagation
If L can be written as a function of variables v1,…, vn, which in turn
depend (partially) on another variable u, then
∂L/∂u = Σi ∂L/∂vi ⋅ ∂vi/∂u
L
Consider v1,…, vn as the
layer  
above u, Γ(u) v

Then, the chain rule gives

∂L/∂u = Σv ∈ Γ(u) ∂L/∂v ⋅ ∂v/∂u
Backpropagation
∂L/∂u = Σv ∈ Γ(u) ∂L/∂v ⋅ ∂v/∂u

Backpropagation
L Forward Pass
Base case: ∂L/∂L = 1
First, in a forward
For each u (top to pass, compute
bottom): v values of all nodes
For each v ∈ Γ(u): given an input 
Inductively, have  u (The values of each node
will be needed during
computed ∂L/∂v backprop)
Directly compute ∂v/∂u
Compute ∂L/∂u
Compute ∂L/∂w  
Where values computed in the
where ∂L/∂w = ∂L/∂u ⋅ ∂u/∂w forward pass may be needed
History of Neural Networks in ASR

• Neural networks for speech recognition were explored as

early as 1987

• Deep neural networks for speech

• Beat state-of-the-art on the TIMIT corpus [M09]
• Significant improvements shown on large-vocabulary
systems [D11]
• Dominant ASR paradigm [H12]

[M09] A. Mohamed, G. Dahl, and G. Hinton, “Deep belief networks for phone recognition,” NIPS Workshop on Deep Learning for Speech Recognition, 2009.

[D11] G. Dahl, D. Yu, L. Deng, and A. Acero, “Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition,” TASL 20(1

[H12] G. Hinton, et al., “Deep Neural Networks for Acoustic Modeling in Speech Recognition”, IEEE Signal Processing Magazine, 2012.
What’s new?

• Why have NN-based systems come back to prominence?

• Important developments
• Vast quantities of data available for ASR training
• Fast GPU-based training
• Improvements in optimization/initialization techniques
• Deeper networks enabled by fast training
• Larger output spaces enabled by fast training and
availability of data

Scrum For Services
No ratings yet
Scrum For Services
14 pages
Simulation of Digital Communication Systems Using Matlab
From Everand
Simulation of Digital Communication Systems Using Matlab
Mathuranathan Viswanathan
3.5/5 (22)
Pre-Midsem Revision: Instructor: Preethi Jyothi
No ratings yet
Pre-Midsem Revision: Instructor: Preethi Jyothi
35 pages
Improving The HMM Phone Models
No ratings yet
Improving The HMM Phone Models
14 pages
Learning Structured Models For Phone Recognition
No ratings yet
Learning Structured Models For Phone Recognition
9 pages
Sementation HTK
No ratings yet
Sementation HTK
3 pages
Hybrid/Tandem Models + Tdnns + Intro To RNNS: Instructor: Preethi Jyothi
No ratings yet
Hybrid/Tandem Models + Tdnns + Intro To RNNS: Instructor: Preethi Jyothi
23 pages
Deep Learning Techniques in Tandem With Signal Processing For Phonetic Segmentation
No ratings yet
Deep Learning Techniques in Tandem With Signal Processing For Phonetic Segmentation
31 pages
Lecture 2
No ratings yet
Lecture 2
21 pages
Forced Alignment and Speech Recognition Systems
No ratings yet
Forced Alignment and Speech Recognition Systems
32 pages
DOC-20250318-WA0029
No ratings yet
DOC-20250318-WA0029
24 pages
ASR System: Rashmi Kethireddy
No ratings yet
ASR System: Rashmi Kethireddy
28 pages
The HMM-based Speech Synthesis System (HTS) Version 2.0
No ratings yet
The HMM-based Speech Synthesis System (HTS) Version 2.0
6 pages
Lecture 9 PDF
No ratings yet
Lecture 9 PDF
42 pages
Speaker-Independent Phone Recognition Using Hidden Markov Models PDF
No ratings yet
Speaker-Independent Phone Recognition Using Hidden Markov Models PDF
8 pages
ai txt unit5
No ratings yet
ai txt unit5
7 pages
Automatic Speech Recognition: 2.1 Relevant Keywords From Probability Theory and Statistics
No ratings yet
Automatic Speech Recognition: 2.1 Relevant Keywords From Probability Theory and Statistics
14 pages
Speech Recognition Application
No ratings yet
Speech Recognition Application
13 pages
L4 Tagging
No ratings yet
L4 Tagging
107 pages
A Speaker Independent Continuous Speech Recognizer For Amharic
No ratings yet
A Speaker Independent Continuous Speech Recognizer For Amharic
5 pages
CS 224S/LING 281 Speech Recognition, Synthesis, and Dialogue
No ratings yet
CS 224S/LING 281 Speech Recognition, Synthesis, and Dialogue
78 pages
Redaction HTK Amazigh Speech
No ratings yet
Redaction HTK Amazigh Speech
15 pages
Hidden Markov Models in Speech Recognition: Wayne Ward
No ratings yet
Hidden Markov Models in Speech Recognition: Wayne Ward
35 pages
HMM Detailed
No ratings yet
HMM Detailed
41 pages
Assignment Submission Speech Recognition System Architectural Design
No ratings yet
Assignment Submission Speech Recognition System Architectural Design
5 pages
A Novel Voice Recognition Model Based On HMM and Fuzzy PPM
No ratings yet
A Novel Voice Recognition Model Based On HMM and Fuzzy PPM
4 pages
Theory, Algorithm, and System Development, Prentice Hall, Upper Saddle River
No ratings yet
Theory, Algorithm, and System Development, Prentice Hall, Upper Saddle River
12 pages
Abhijit Pradhan, Aswin Shanmugam S, Anusha Prakash, Kamakoti Veezhinathan, Hema Murthy
No ratings yet
Abhijit Pradhan, Aswin Shanmugam S, Anusha Prakash, Kamakoti Veezhinathan, Hema Murthy
5 pages
Unit 5 (Automatic Speech Recognition)
No ratings yet
Unit 5 (Automatic Speech Recognition)
13 pages
Write: Get Unlimited Access To The Best of Medium For Less Than $1/week
No ratings yet
Write: Get Unlimited Access To The Best of Medium For Less Than $1/week
19 pages
Speech Processing: Lecture 9 - Large Vocabulary Continuous Speech Recognition (LVCSR)
No ratings yet
Speech Processing: Lecture 9 - Large Vocabulary Continuous Speech Recognition (LVCSR)
32 pages
Draft Dbn4lvcsr Transaslp
No ratings yet
Draft Dbn4lvcsr Transaslp
13 pages
A Literature Survey of Speech Recognition and Hidden Markov Models
No ratings yet
A Literature Survey of Speech Recognition and Hidden Markov Models
6 pages
Automatic Arabic Pronunciation Scoring For Language Instruction
No ratings yet
Automatic Arabic Pronunciation Scoring For Language Instruction
6 pages
Adwait Naik - Antyplagiat Raport
No ratings yet
Adwait Naik - Antyplagiat Raport
31 pages
CCS369 - TSS-Unit 5
No ratings yet
CCS369 - TSS-Unit 5
23 pages
Speechrecognitionfinalpresentation 141124072610 Conversion Gate01
No ratings yet
Speechrecognitionfinalpresentation 141124072610 Conversion Gate01
30 pages
4 base
No ratings yet
4 base
4 pages
Poster AR VAE TTS
No ratings yet
Poster AR VAE TTS
1 page
5. PoSTagging-HMM
No ratings yet
5. PoSTagging-HMM
24 pages
Tutorial On Speech Recognition: Alex Acero Microsoft Research
No ratings yet
Tutorial On Speech Recognition: Alex Acero Microsoft Research
38 pages
Acoustic Modeling Using Deep Belief Networks: Abdel-Rahman Mohamed, George E. Dahl, and Geoffrey Hinton
No ratings yet
Acoustic Modeling Using Deep Belief Networks: Abdel-Rahman Mohamed, George E. Dahl, and Geoffrey Hinton
10 pages
Robust Speech Recognition Using Articulatory Information: Der Technischen Fakult at Der Universit at Bielefeld
100% (1)
Robust Speech Recognition Using Articulatory Information: Der Technischen Fakult at Der Universit at Bielefeld
148 pages
A Very Low Bit Rate Speech Coder Using HMM With Speaker Adaptation
No ratings yet
A Very Low Bit Rate Speech Coder Using HMM With Speaker Adaptation
4 pages
Presentation On Speech Recognition
No ratings yet
Presentation On Speech Recognition
11 pages
Statistical Speech Processing
No ratings yet
Statistical Speech Processing
51 pages
INTERSPEECH 2018 Glarner Slides
No ratings yet
INTERSPEECH 2018 Glarner Slides
24 pages
Tandem Connectionist Feature Extraction For Conventional HMM Systems
No ratings yet
Tandem Connectionist Feature Extraction For Conventional HMM Systems
1 page
Investigations Into Tandem Acoustic Modeling For The Aurora Task
No ratings yet
Investigations Into Tandem Acoustic Modeling For The Aurora Task
4 pages
Speech Recognition
No ratings yet
Speech Recognition
29 pages
Viva Speech
100% (1)
Viva Speech
4 pages
Voice Recognition
60% (5)
Voice Recognition
31 pages
Xiao Guest Lecture ASR
No ratings yet
Xiao Guest Lecture ASR
39 pages
HMM-DNN Speech Recognition Techniques: A Review: ISSN: 2230-9926
No ratings yet
HMM-DNN Speech Recognition Techniques: A Review: ISSN: 2230-9926
5 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
69 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
69 pages
Advances in Speech Transcription at IBM Under The DARPA EARS Program
No ratings yet
Advances in Speech Transcription at IBM Under The DARPA EARS Program
13 pages
MFCC Feature Extraction
No ratings yet
MFCC Feature Extraction
9 pages
TASLP2339736 Proof 2
No ratings yet
TASLP2339736 Proof 2
26 pages
COMMUNICATION SYSTEMS
From Everand
COMMUNICATION SYSTEMS
B.P. Lathi
No ratings yet
Restricted Boltzmann Machine: Fundamentals and Applications for Unlocking the Hidden Layers of Artificial Intelligence
From Everand
Restricted Boltzmann Machine: Fundamentals and Applications for Unlocking the Hidden Layers of Artificial Intelligence
Fouad Sabry
No ratings yet
Gans + Final Practice Questions: Instructor: Preethi Jyothi
No ratings yet
Gans + Final Practice Questions: Instructor: Preethi Jyothi
28 pages
End-To-End Neural Architectures For Asr: Instructor: Preethi Jyothi
No ratings yet
End-To-End Neural Architectures For Asr: Instructor: Preethi Jyothi
16 pages
Acoustic Feature Analysis For ASR: Instructor: Preethi Jyothi
No ratings yet
Acoustic Feature Analysis For ASR: Instructor: Preethi Jyothi
34 pages
Lecture10 PDF
No ratings yet
Lecture10 PDF
40 pages
Rnn-Based Ams + Introduction To Language Modeling: Instructor: Preethi Jyothi
No ratings yet
Rnn-Based Ams + Introduction To Language Modeling: Instructor: Preethi Jyothi
36 pages
PID Controller
No ratings yet
PID Controller
10 pages
Course Code and Title:: EEE 402: Control Systems
No ratings yet
Course Code and Title:: EEE 402: Control Systems
9 pages
Lecture 6 Heat Transactions
No ratings yet
Lecture 6 Heat Transactions
8 pages
T1-TECHNICAL Introduction To Software Engineering and Software Design
No ratings yet
T1-TECHNICAL Introduction To Software Engineering and Software Design
6 pages
Modeling, Simulation and Position Control of 3DOF Articulated Manipulator
No ratings yet
Modeling, Simulation and Position Control of 3DOF Articulated Manipulator
10 pages
End To End Testing Procedure
No ratings yet
End To End Testing Procedure
9 pages
BIO 211 Concepts of Ecology COURSE OUTLINE
No ratings yet
BIO 211 Concepts of Ecology COURSE OUTLINE
2 pages
Information Technology Project Management: by Jack T. Marchewka
No ratings yet
Information Technology Project Management: by Jack T. Marchewka
58 pages
MLOps From Model Centric To Data Centric AI
No ratings yet
MLOps From Model Centric To Data Centric AI
29 pages
Suplemento HVAC Farmacopea Mexico 2018
No ratings yet
Suplemento HVAC Farmacopea Mexico 2018
14 pages
Control Systems Engineering
No ratings yet
Control Systems Engineering
19 pages
Control of A Nonlinear Level Process Using Dynamic Matrix Controller
No ratings yet
Control of A Nonlinear Level Process Using Dynamic Matrix Controller
6 pages
Lec5.1 - Convolutional Networks For Images - Speech - and Time Series PDF
No ratings yet
Lec5.1 - Convolutional Networks For Images - Speech - and Time Series PDF
14 pages
FMEA
No ratings yet
FMEA
1 page
Unit 1.3 Agile Frameworks
No ratings yet
Unit 1.3 Agile Frameworks
30 pages
3ADW000194R0506 DCS800 - Hardware Manual - Es - e PDF
No ratings yet
3ADW000194R0506 DCS800 - Hardware Manual - Es - e PDF
124 pages
PID Controller
No ratings yet
PID Controller
4 pages
Energy Optimization of HVAC Systems in Commercial Buildings Considering Indoor Air Quality Management
No ratings yet
Energy Optimization of HVAC Systems in Commercial Buildings Considering Indoor Air Quality Management
11 pages
Model Free Adaptive Predictive Control
100% (1)
Model Free Adaptive Predictive Control
7 pages
Topic 05 All Possible Questions
No ratings yet
Topic 05 All Possible Questions
9 pages
DSP - Ece - 5th Sem (2mark Q&A)
No ratings yet
DSP - Ece - 5th Sem (2mark Q&A)
21 pages
A Pastoral Theology of Congregational Ca
No ratings yet
A Pastoral Theology of Congregational Ca
399 pages
ASHRAE Webcast Information
No ratings yet
ASHRAE Webcast Information
2 pages
Software Engineering Course Code: 210253
No ratings yet
Software Engineering Course Code: 210253
76 pages
Anjali's Resume
No ratings yet
Anjali's Resume
1 page
Sociotechnical Systems
No ratings yet
Sociotechnical Systems
24 pages
Event-Based PID Control:: Application To A Mini Quadrotor Helicopter
No ratings yet
Event-Based PID Control:: Application To A Mini Quadrotor Helicopter
12 pages
Unit 2 - 16 Mark
No ratings yet
Unit 2 - 16 Mark
35 pages
Performance and Decision Support System For Cane Sugar Production Process Control at Pt. Rajawali Ii, Jatitujuh Factory Unit, Majalengka
No ratings yet
Performance and Decision Support System For Cane Sugar Production Process Control at Pt. Rajawali Ii, Jatitujuh Factory Unit, Majalengka
12 pages

Tied-State HMMs + Introduction To NN-based AMs

Uploaded by

Tied-State HMMs + Introduction To NN-based AMs

Uploaded by

Acoustic Modeling:

Tied-state HMMs & DNN-based models

a/a_b f1:ε f3:ε f4:ε f5:ε

• Pronunciation of a phone is influenced by the preceding and succeeding phones.

• Number of triphones that appear in data ≈ 1000s or 10,000s

• Hundreds of millions of parameters!

State observation densities

• More parameter tying: Tying variances of all Gaussians within a state,

• Mixture weights are specific to each state

Triphone HMMs (No sharing)

Triphone HMMs (Tied Mixture Models)

Triphone HMMs (No sharing)

b/a/k p/a/k b/a/g

Triphone HMMs (State Tying)

Classification using a decision tree:

Sour Neutral White Snakegourd

• Given the data at a node, either declare the node to be a

• Important questions to be addressed for DTs:

• If a cluster of HMM states, S = {s1, s2, …, sM} consists of M states

• Go through all questions, find Δq for each question q and choose

• Terminate when: Final Δq is below a threshold or data associated

• Each partition is clustered to form

• Use the likelihood of the parent

a/a_b f1:ε f3:ε f4:ε f5:ε

Image from: https://upload.wikimedia.org/wikipedia/commons/1/10/Blausen_0657_MultipolarNeuron.png

If x is a 2-dimensional vector and the layer above it is a 2-dimensional

nonlinear activation functions

nonlinear activation functions

• To train a neural network, define a loss function L(y,ỹ):

• L(y,ỹ) assigns a non-negative numerical score to the neural

• The parameters of the network are set to minimise L over

• L is typically minimised using a gradient-based method

do until stopping criterion

Define the Loss function to be minimised as a node L

Goal: Learn weights for the neural network which minimise L

Gradient Descent: Find ∂L/∂w for every weight w, and update it as

How do we efficiently compute ∂L/∂w for all w?

Will compute ∂L/∂u for every node u in the network!

∂L/∂w = ∂L/∂u ⋅ ∂u/∂w where u is the node which uses w

New goal: compute ∂L/∂u for every node u in the network

Simple algorithm: Backpropagation

Key fact: Chain rule of differentiation

If L can be written as a function of variables v1,…, vn, which in turn

∂L/∂u = Σi ∂L/∂vi ⋅ ∂vi/∂u

Then, the chain rule gives

• Neural networks for speech recognition were explored as

• Deep neural networks for speech

• Why have NN-based systems come back to prominence?

You might also like

• Pronunciation of a phone is influenced by the preceding and succeeding phones.  

• Hundreds of millions of parameters!  

State observation densities  

• More parameter tying: Tying variances of all Gaussians within a state, 

• To train a neural network, define a loss function L(y,ỹ):  

do until stopping criterion 

Gradient Descent: Find ∂L/∂w for every weight w, and update it as