SlideShare a Scribd company logo
4
Most read
6
Most read
11
Most read
DEEP LEARNING FOR SPEECH
RECOGNITION
Anantharaman Palacode Narayana Iyer
JNResearch
ananth@jnresearch.com
15 April 2016
REFERENCES
AGENDA
 Types of Speech Recognition and applications
 Traditional implementation pipeline
 Deep Learning for Speech Recognition
 Future directions
SPEECH APPLICATIONS
 Speech recognition:
 Hands-free in a car
 Commands for Personal assistants – e.g Siri
 Gaming
 Conversational agents
 E.g. agent for flight schedule enquiry, bookings etc
 Speaker identification
 E.g Forensics
 Extracting emotions and social meanings
 Text to speech
TYPES OF RECOGNITIONTASKS
 Isolated word recognition
 Connected words recognition
 Continuous speech recognition (LVCSR)
 The above can be realized as:
 Speaker independent implementation
 Speaker dependent implementation
SPEECH RECOGNITION IS PROBABILISTIC
Steps:
 Train the system
 Cross validate, finetune
 Test
 Deploy
Speech Recognizer
(ASR)
Speech Signal
Probabilistic match
between input and a set
of words
ISOLATED WORD RECOGNITION
 From the audio signal generate features. MFCC or
Filter banks are quite common
 Perform any additional pre-processing
 Using a code book of a given size, convert these
features in to discrete symbols.This is the vector
quantization procedure that can be implemented
with k-means clustering
 Train HMM’s using BaumWelch algorithm
 For each word in the vocabulary, instantiate a HMM
 Intuitively choose the number of states
 The set of symbols are all valid values of the code
book
 Use the HMM to predict unseen input
HMM 1
HMM 2
HMM n
Argmax λ
P(O|λ)
Observations
Predicted
Word
CONTINUOUS SPEECH RECOGNITION
• ASR for continuous speech is
traditionally built using Gaussian
Mixture Models (GMM)
• The emission probability table that
we used for discrete symbols is now
replaced by GMM
• The parameters of this model are
learnt as a part of the training using
BaumWelch procedure
KNOWLEDGE INTEGRATION FOR SPEECH
RECOGNITION
Feature
Analysis
Unit
Matching
System
Lexical
Hypothesis
Syntactic
Hypothesis
Semantic
Hypothesis
Utterence
Verifier
Speech
Recognized utterance
Inventory of
speech
recognition
units
Word
Dictition
ary
Gramm
ar
Task
Model
SOME CHALLENGES
 We don’t know the number of words
 We don’t know the boundaries
 They are fuzzy and non unique
 ForV word reference patterns and L positions there are
exponential combinatorial possibilities
USING DEEP NETWORKS FOR ASR
 Replace the GMM with a
Deep Neural Networks that
directly provides the
likelihood estimates
 Interface the DNN with a
HMM decoder
 Issues:
 We still need the HMM with
its underlying assumptions
for tractable computation
EMERGINGTRENDS
 HMM-free ASRs
 Avoids phoneme prediction and hence the need to have a
phoneme database
 Active area of research
 Current state of the art adopted by the industry uses DNN-HMM
 Future ASRs are likely to be fully neural networks based

More Related Content

What's hot (20)

PPT
Speech Recognition in Artificail Inteligence
Ilhaan Marwat
 
PPT
Speech recognition
Charu Joshi
 
PPT
Speech Recognition
Hugo Moreno
 
PPTX
Speech recognition final presentation
himanshubhatti
 
PPSX
Speech recognition an overview
Varun Jain
 
PPTX
Speech Recognition Technology
SrijanKumar18
 
PPTX
Image compression standards
kirupasuchi1996
 
PPT
Speech Recognition System By Matlab
Ankit Gujrati
 
DOCX
Automatic Speech Recognition
International Islamic University
 
PPTX
Speech to text conversion
ankit_saluja
 
PDF
Noise Models
Sardar Alam
 
PDF
Speech emotion recognition
saniya shaikh
 
PPTX
Image feature extraction
Rushin Shah
 
PPTX
Sign Language Recognition based on Hands symbols Classification
Triloki Gupta
 
PPTX
Sign language recognizer
Bikash Chandra Karmokar
 
PPTX
Predictive coding
p_ayal
 
PDF
speech processing and recognition basic in data mining
Jimit Rupani
 
DOCX
A seminar report on speech recognition technology
SrijanKumar18
 
PPTX
Smoothing Filters in Spatial Domain
Madhu Bala
 
PPTX
Point processing
panupriyaa7
 
Speech Recognition in Artificail Inteligence
Ilhaan Marwat
 
Speech recognition
Charu Joshi
 
Speech Recognition
Hugo Moreno
 
Speech recognition final presentation
himanshubhatti
 
Speech recognition an overview
Varun Jain
 
Speech Recognition Technology
SrijanKumar18
 
Image compression standards
kirupasuchi1996
 
Speech Recognition System By Matlab
Ankit Gujrati
 
Automatic Speech Recognition
International Islamic University
 
Speech to text conversion
ankit_saluja
 
Noise Models
Sardar Alam
 
Speech emotion recognition
saniya shaikh
 
Image feature extraction
Rushin Shah
 
Sign Language Recognition based on Hands symbols Classification
Triloki Gupta
 
Sign language recognizer
Bikash Chandra Karmokar
 
Predictive coding
p_ayal
 
speech processing and recognition basic in data mining
Jimit Rupani
 
A seminar report on speech recognition technology
SrijanKumar18
 
Smoothing Filters in Spatial Domain
Madhu Bala
 
Point processing
panupriyaa7
 

Viewers also liked (20)

PDF
Overview of TensorFlow For Natural Language Processing
ananth
 
PDF
Word representation: SVD, LSA, Word2Vec
ananth
 
PDF
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...
Universitat Politècnica de Catalunya
 
PDF
Convolutional Neural Networks: Part 1
ananth
 
PDF
Natural Language Processing: L03 maths fornlp
ananth
 
PDF
Natural Language Processing: L02 words
ananth
 
PDF
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
ananth
 
PDF
An overview of Hidden Markov Models (HMM)
ananth
 
PDF
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
Universitat Politècnica de Catalunya
 
PDF
Natural Language Processing: L01 introduction
ananth
 
PPTX
A Simple Introduction to Word Embeddings
Bhaskar Mitra
 
PDF
L05 language model_part2
ananth
 
PDF
Machine Learning Lecture 3 Decision Trees
ananth
 
PDF
Recurrent Neural Networks, LSTM and GRU
ananth
 
PDF
67 Weeks of TensorFlow
Altoros
 
PDF
Speech recognition project report
Sarang Afle
 
PPTX
Reasoning Over Knowledge Base
Shubham Agarwal
 
PDF
사회 연결망의 링크 예측
Kyunghoon Kim
 
PPTX
Multi Object Tracking | Presentation 2 | ID 103001
Md. Minhazul Haque
 
PDF
Overview Of Video Object Tracking System
Editor IJMTER
 
Overview of TensorFlow For Natural Language Processing
ananth
 
Word representation: SVD, LSA, Word2Vec
ananth
 
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...
Universitat Politècnica de Catalunya
 
Convolutional Neural Networks: Part 1
ananth
 
Natural Language Processing: L03 maths fornlp
ananth
 
Natural Language Processing: L02 words
ananth
 
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
ananth
 
An overview of Hidden Markov Models (HMM)
ananth
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
Universitat Politècnica de Catalunya
 
Natural Language Processing: L01 introduction
ananth
 
A Simple Introduction to Word Embeddings
Bhaskar Mitra
 
L05 language model_part2
ananth
 
Machine Learning Lecture 3 Decision Trees
ananth
 
Recurrent Neural Networks, LSTM and GRU
ananth
 
67 Weeks of TensorFlow
Altoros
 
Speech recognition project report
Sarang Afle
 
Reasoning Over Knowledge Base
Shubham Agarwal
 
사회 연결망의 링크 예측
Kyunghoon Kim
 
Multi Object Tracking | Presentation 2 | ID 103001
Md. Minhazul Haque
 
Overview Of Video Object Tracking System
Editor IJMTER
 
Ad

Similar to Deep Learning For Speech Recognition (20)

PPTX
lec26_audio.pptx
Karimdabbabi
 
ODP
Speech totext
Raaj Tilak Sarma
 
PPT
Asr
kkkseld
 
PDF
Mjfg now
Prabha P
 
PDF
Review On Speech Recognition using Deep Learning
IRJET Journal
 
PDF
Deep Learning in practice : Speech recognition and beyond - Meetup
LINAGORA
 
PDF
The Main Concepts of Speech Recognition
子毅 楊
 
PPT
scribgy.ppt
NishanthNayakaNR
 
PDF
BTP paper
Tanmay Chatterjee
 
PDF
Kc3517481754
IJERA Editor
 
PPT
Asr
kkkseld
 
PPTX
Speech-Recognition.pptx
JyothiMedisetty2
 
PPTX
Vladyslav Hamolia "How to choose ASR (automatic speech recognition) system"
Lviv Startup Club
 
PPTX
Seminar
Akash Prajapati
 
PPT
speech recognition system of modern world.ppt
dushyantsinghkurmi85
 
PDF
Deep Learning for Speech Recognition - Vikrant Singh Tomar
WithTheBest
 
PDF
A survey on Enhancements in Speech Recognition
IRJET Journal
 
PPTX
Speech recognition An overview
sajanazoya
 
lec26_audio.pptx
Karimdabbabi
 
Speech totext
Raaj Tilak Sarma
 
Asr
kkkseld
 
Mjfg now
Prabha P
 
Review On Speech Recognition using Deep Learning
IRJET Journal
 
Deep Learning in practice : Speech recognition and beyond - Meetup
LINAGORA
 
The Main Concepts of Speech Recognition
子毅 楊
 
scribgy.ppt
NishanthNayakaNR
 
Kc3517481754
IJERA Editor
 
Asr
kkkseld
 
Speech-Recognition.pptx
JyothiMedisetty2
 
Vladyslav Hamolia "How to choose ASR (automatic speech recognition) system"
Lviv Startup Club
 
speech recognition system of modern world.ppt
dushyantsinghkurmi85
 
Deep Learning for Speech Recognition - Vikrant Singh Tomar
WithTheBest
 
A survey on Enhancements in Speech Recognition
IRJET Journal
 
Speech recognition An overview
sajanazoya
 
Ad

More from ananth (15)

PDF
Generative Adversarial Networks : Basic architecture and variants
ananth
 
PDF
Convolutional Neural Networks : Popular Architectures
ananth
 
PDF
Foundations: Artificial Neural Networks
ananth
 
PDF
Overview of Convolutional Neural Networks
ananth
 
PDF
Artificial Intelligence Course: Linear models
ananth
 
PDF
An Overview of Naïve Bayes Classifier
ananth
 
PDF
Mathematical Background for Artificial Intelligence
ananth
 
PDF
Search problems in Artificial Intelligence
ananth
 
PDF
Introduction to Artificial Intelligence
ananth
 
PDF
Machine Learning Lecture 2 Basics
ananth
 
PDF
Introduction To Applied Machine Learning
ananth
 
PDF
MaxEnt (Loglinear) Models - Overview
ananth
 
PDF
L06 stemmer and edit distance
ananth
 
PDF
L05 word representation
ananth
 
PDF
Deep Learning Primer - a brief introduction
ananth
 
Generative Adversarial Networks : Basic architecture and variants
ananth
 
Convolutional Neural Networks : Popular Architectures
ananth
 
Foundations: Artificial Neural Networks
ananth
 
Overview of Convolutional Neural Networks
ananth
 
Artificial Intelligence Course: Linear models
ananth
 
An Overview of Naïve Bayes Classifier
ananth
 
Mathematical Background for Artificial Intelligence
ananth
 
Search problems in Artificial Intelligence
ananth
 
Introduction to Artificial Intelligence
ananth
 
Machine Learning Lecture 2 Basics
ananth
 
Introduction To Applied Machine Learning
ananth
 
MaxEnt (Loglinear) Models - Overview
ananth
 
L06 stemmer and edit distance
ananth
 
L05 word representation
ananth
 
Deep Learning Primer - a brief introduction
ananth
 

Recently uploaded (20)

PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
July Patch Tuesday
Ivanti
 
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 

Deep Learning For Speech Recognition

  • 1. DEEP LEARNING FOR SPEECH RECOGNITION Anantharaman Palacode Narayana Iyer JNResearch [email protected] 15 April 2016
  • 3. AGENDA  Types of Speech Recognition and applications  Traditional implementation pipeline  Deep Learning for Speech Recognition  Future directions
  • 4. SPEECH APPLICATIONS  Speech recognition:  Hands-free in a car  Commands for Personal assistants – e.g Siri  Gaming  Conversational agents  E.g. agent for flight schedule enquiry, bookings etc  Speaker identification  E.g Forensics  Extracting emotions and social meanings  Text to speech
  • 5. TYPES OF RECOGNITIONTASKS  Isolated word recognition  Connected words recognition  Continuous speech recognition (LVCSR)  The above can be realized as:  Speaker independent implementation  Speaker dependent implementation
  • 6. SPEECH RECOGNITION IS PROBABILISTIC Steps:  Train the system  Cross validate, finetune  Test  Deploy Speech Recognizer (ASR) Speech Signal Probabilistic match between input and a set of words
  • 7. ISOLATED WORD RECOGNITION  From the audio signal generate features. MFCC or Filter banks are quite common  Perform any additional pre-processing  Using a code book of a given size, convert these features in to discrete symbols.This is the vector quantization procedure that can be implemented with k-means clustering  Train HMM’s using BaumWelch algorithm  For each word in the vocabulary, instantiate a HMM  Intuitively choose the number of states  The set of symbols are all valid values of the code book  Use the HMM to predict unseen input HMM 1 HMM 2 HMM n Argmax λ P(O|λ) Observations Predicted Word
  • 8. CONTINUOUS SPEECH RECOGNITION • ASR for continuous speech is traditionally built using Gaussian Mixture Models (GMM) • The emission probability table that we used for discrete symbols is now replaced by GMM • The parameters of this model are learnt as a part of the training using BaumWelch procedure
  • 9. KNOWLEDGE INTEGRATION FOR SPEECH RECOGNITION Feature Analysis Unit Matching System Lexical Hypothesis Syntactic Hypothesis Semantic Hypothesis Utterence Verifier Speech Recognized utterance Inventory of speech recognition units Word Dictition ary Gramm ar Task Model
  • 10. SOME CHALLENGES  We don’t know the number of words  We don’t know the boundaries  They are fuzzy and non unique  ForV word reference patterns and L positions there are exponential combinatorial possibilities
  • 11. USING DEEP NETWORKS FOR ASR  Replace the GMM with a Deep Neural Networks that directly provides the likelihood estimates  Interface the DNN with a HMM decoder  Issues:  We still need the HMM with its underlying assumptions for tractable computation
  • 12. EMERGINGTRENDS  HMM-free ASRs  Avoids phoneme prediction and hence the need to have a phoneme database  Active area of research  Current state of the art adopted by the industry uses DNN-HMM  Future ASRs are likely to be fully neural networks based