0% found this document useful (0 votes)

86 views

Speech Processing 15-492/18-492: Speech Recognition Template Matching

Template matching is a simple speech recognition technique that compares an input audio sample against stored templates. Dynamic time warping (DTW) allows the templates and samples to be of different durations by warping them for alignment. DTW works well for small vocabularies (<20 words) but larger vocabularies require extending the template model, such as stringing phoneme templates together. Reliability can be improved by averaging over multiple template examples and using distance metrics like Mahalanobis that account for variance.

Uploaded by

Shobhit Pradhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

86 views

Speech Processing 15-492/18-492: Speech Recognition Template Matching

Uploaded by

Shobhit Pradhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Speech Processing 15-492/18-492

Speech Recognition Template matching

Speech Recognition by Templates

A little history Matching Templates DTW (Dynamic Time Warping) Beyond template matching

Radio Rex (1922)

Toys always lead technology Call Rex and he comes out of his kennel

(Crystalradio.com and Rhys Jones)

Toy ASRTricks
Radio Rex
Recognizes vowel formants in EH

Voice activated toy train

Multilingual stop/go hashire/tomate

Toys pets dont need perfect ASR

Template Matching
Record templates from user
Store in library

Record ASR example

Compare against each library template

Select closest example For example

On a voice dialing system

Voice Dialing System

Library
Mom Dad Bob Marios Pizza

Lets Go Bus Information System

Matching in Time Domain

Duration
Will discriminate some examples But Mom, Bob and Dad will be confused

What about spectral properties

Matching in Frequency Domain

Mom

Bob

Different deliveries
We change durations
Two utterances are never the same

When it fails we change our delivery

Become more articular clearer

Dynamic Time Warping

Template

Sample Speech

DTW algorithm
Template
i i-1 j-1 j

Sample
For each square Dist(template[i],sample[j]) + smallest_of (Dist(template[i-1],sample[j]) Dist(template[i],sample[j-1]) Dist(template[i-1],sample[j-1]) Remember which choice your took (count path)

Multiple Templates
Compare against each Find closest Need to normalize scores
(divide by length of matches)

Matching Templates
Template Library Sample Word0 Word1 Word2

For Word in Templates Score = dtw(Template[Word], Sample); if (Score < BestScore) BestWord = Word; DoAction(Action[BestWord])

DTW issues
What happens with no-matches
Need to deal with none of the above

What happens with more templates

Harder to choose between Once variance greater than differences

Choose templates that are very different

DTW/Template Applications
Voice dialer Simple command and control Speaker ID

Speaker ID
Template Library Sample Speaker0 Speaker1 Speaker2

For Speaker in Templates Score = dtw(Template[Speaker], Sample); if (Score < BestScore) BestSpeaker = Speaker;

DTW
Advantages
Works well for small number of templates (<20) Language independent Speaker specific Easy to train (end user controls it)

Disadvantages
Limited number of templates Speaker specific Need actual training examples

More reliable matching

Distance metric
Euclidean

But some distances are bigger than others

Silence is pretty similar Fricatives are quite larger
A longer fricative might give large score A longer vowel might give smaller score

More reliable matching

Having multiple template examples
Individual matches or Average them together

DTW align all of the examples Collect statistics as a Gaussian

Mean and standard deviation for each coeff

More reliable distances

Instead of Euclidean distance
Doesnt care about the standard deviation

Use Mahalanobis distance

Care about means and standard deviation

Extending Template matching

String word templates together
Need to find word segmentation Word0 Word1 Word2

But there are many words

Extending template model

String phoneme templates together
A template model for each phoneme Sample k ae t Phoneme Templates Phone0 Phone1 Phone2

Summary
Speech Recognition by Templates
Good for simple small vocabulary tasks

Dynamic Time Warping (DTW)

Can match different durational examples

Averaging over multiple models Distance metrics

Euclidean vs Mahalanobis

141 Exam TSPSC
No ratings yet
141 Exam TSPSC
2 pages
Lecture 9 - Speech Recognition
No ratings yet
Lecture 9 - Speech Recognition
65 pages
All The Unit Ethics in Business As Handouts
No ratings yet
All The Unit Ethics in Business As Handouts
7 pages
Chapter 3
No ratings yet
Chapter 3
9 pages
Speech Recognition Project
No ratings yet
Speech Recognition Project
0 pages
Speech Recognition
No ratings yet
Speech Recognition
40 pages
Speech Recognition Using Matrix Comparison: Vishnupriya Gupta
No ratings yet
Speech Recognition Using Matrix Comparison: Vishnupriya Gupta
3 pages
Dynamic Programming and Single Word Recognizers (Part 1)
No ratings yet
Dynamic Programming and Single Word Recognizers (Part 1)
25 pages
Field Evaluation of Text-Dependent Speaker Recognition in An Access Control Application
No ratings yet
Field Evaluation of Text-Dependent Speaker Recognition in An Access Control Application
4 pages
DTW Pso 04530541
No ratings yet
DTW Pso 04530541
6 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
35 pages
Posterior Based
No ratings yet
Posterior Based
11 pages
Speech Recognition Seminar
No ratings yet
Speech Recognition Seminar
19 pages
_speech recognition system
No ratings yet
_speech recognition system
12 pages
final slide
No ratings yet
final slide
18 pages
A First Step Towards Text-Independent Voice Conversion: ISCA Archive
No ratings yet
A First Step Towards Text-Independent Voice Conversion: ISCA Archive
4 pages
Feature Extraction Using PCA
No ratings yet
Feature Extraction Using PCA
36 pages
Effect of Dynamic Time Warping On Alignment of Phrases and Phonemes
No ratings yet
Effect of Dynamic Time Warping On Alignment of Phrases and Phonemes
6 pages
Fyp Proposal
No ratings yet
Fyp Proposal
4 pages
Speaker Recognition
No ratings yet
Speaker Recognition
29 pages
Speechrecognitionfinalpresentation 141124072610 Conversion Gate01
No ratings yet
Speechrecognitionfinalpresentation 141124072610 Conversion Gate01
30 pages
Voice Recognition Using Matlab
100% (1)
Voice Recognition Using Matlab
10 pages
Working of A Voice Recognition System
No ratings yet
Working of A Voice Recognition System
2 pages
Am PDF
No ratings yet
Am PDF
11 pages
Lecture 1
No ratings yet
Lecture 1
48 pages
Speech Recognition Using Matlab: Objective
No ratings yet
Speech Recognition Using Matlab: Objective
2 pages
Unsupervised Pattern Discovery in Speech
No ratings yet
Unsupervised Pattern Discovery in Speech
15 pages
Design and Implementation
No ratings yet
Design and Implementation
74 pages
Robust DTW-based Recognition Algorithm For Hand-Held Consumer Devices
No ratings yet
Robust DTW-based Recognition Algorithm For Hand-Held Consumer Devices
11 pages
Proposal of An Intelligent Speech Recognition System: November 2012
No ratings yet
Proposal of An Intelligent Speech Recognition System: November 2012
7 pages
Speech Recognition: Lecture 11: Advanced Topics
No ratings yet
Speech Recognition: Lecture 11: Advanced Topics
35 pages
A Template Matching Approach For Detecting Pronunciation Mismatch
No ratings yet
A Template Matching Approach For Detecting Pronunciation Mismatch
6 pages
VLSI
No ratings yet
VLSI
14 pages
Speech Recognition Technology in A Ubiquitous Computing Environment
No ratings yet
Speech Recognition Technology in A Ubiquitous Computing Environment
24 pages
Voice Technology Seminar
No ratings yet
Voice Technology Seminar
35 pages
9 Speech Recognition
No ratings yet
9 Speech Recognition
26 pages
AI Speech Recognition Document
No ratings yet
AI Speech Recognition Document
26 pages
Makalah Speech Recognition
No ratings yet
Makalah Speech Recognition
15 pages
Enhanced Speech Recognition Using ADAG SVM Approach
No ratings yet
Enhanced Speech Recognition Using ADAG SVM Approach
5 pages
Speech Recognition Using Neural Networks: A. Types of Speech Utterance
No ratings yet
Speech Recognition Using Neural Networks: A. Types of Speech Utterance
24 pages
Is 2016 7737405
No ratings yet
Is 2016 7737405
6 pages
Comparative Analysis of Automatic Speech Recognition Techniques
No ratings yet
Comparative Analysis of Automatic Speech Recognition Techniques
8 pages
Effect of MFCC Based Features For Speech Signal Alignments
No ratings yet
Effect of MFCC Based Features For Speech Signal Alignments
7 pages
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
No ratings yet
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
6 pages
Effect of MFCC Based Features For Speech Signal Alignments
No ratings yet
Effect of MFCC Based Features For Speech Signal Alignments
7 pages
Sita#1part2 Merged
No ratings yet
Sita#1part2 Merged
61 pages
Isolated Word Recognition Using Dynamic Time Warping
No ratings yet
Isolated Word Recognition Using Dynamic Time Warping
75 pages
AJSAT Vol.5 No.2 July Dece 2016 pp.23 30
No ratings yet
AJSAT Vol.5 No.2 July Dece 2016 pp.23 30
8 pages
A Study On Speech Recognition Using Dynamic Time Warping: CS 525: Project Presentation Palden Lama and Mounika Namburu
No ratings yet
A Study On Speech Recognition Using Dynamic Time Warping: CS 525: Project Presentation Palden Lama and Mounika Namburu
23 pages
DSP Lab
No ratings yet
DSP Lab
44 pages
Robust Speech Recognition Using Articulatory Information: Der Technischen Fakult at Der Universit at Bielefeld
100% (1)
Robust Speech Recognition Using Articulatory Information: Der Technischen Fakult at Der Universit at Bielefeld
148 pages
CASSI Speech Recognition
No ratings yet
CASSI Speech Recognition
14 pages
Lectures 1 Rabiner Speech Processing
No ratings yet
Lectures 1 Rabiner Speech Processing
77 pages
Write: Get Unlimited Access To The Best of Medium For Less Than $1/week
No ratings yet
Write: Get Unlimited Access To The Best of Medium For Less Than $1/week
19 pages
Editor - in - Chief,+1437 Article+Text 5727 1 4 20190718
No ratings yet
Editor - in - Chief,+1437 Article+Text 5727 1 4 20190718
3 pages
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
No ratings yet
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
6 pages
Sign Language From Spotting Fingerspelled Words
No ratings yet
Sign Language From Spotting Fingerspelled Words
18 pages
Term Paper ECE-300 Topic: - Speech Recognition
No ratings yet
Term Paper ECE-300 Topic: - Speech Recognition
14 pages
Visual Word: Unlocking the Power of Image Understanding
From Everand
Visual Word: Unlocking the Power of Image Understanding
Fouad Sabry
No ratings yet
Bridge the Gap: A Beginner's Guide to Remote ADR and Source-Connect
From Everand
Bridge the Gap: A Beginner's Guide to Remote ADR and Source-Connect
Nikki Myers
No ratings yet
C# Data Structures and Algorithms: Harness the power of C# to build a diverse range of efficient applications
From Everand
C# Data Structures and Algorithms: Harness the power of C# to build a diverse range of efficient applications
Marcin Jamro
No ratings yet
Listen To What You Wrote! Text-To-Speech for Writers and Others
From Everand
Listen To What You Wrote! Text-To-Speech for Writers and Others
Mitch Sexton
No ratings yet
Implementing Domain-Specific Languages with Xtext and Xtend - Second Edition
From Everand
Implementing Domain-Specific Languages with Xtext and Xtend - Second Edition
Lorenzo Bettini
4/5 (1)
Data-Efficient Multilingual Learning
No ratings yet
Data-Efficient Multilingual Learning
14 pages
Test - Trematodes Pictures - Quizlet
No ratings yet
Test - Trematodes Pictures - Quizlet
3 pages
07 De2
No ratings yet
07 De2
5 pages
Module 1 Flang1 Spanish
No ratings yet
Module 1 Flang1 Spanish
6 pages
LP Literary Devices
No ratings yet
LP Literary Devices
11 pages
Shoot For The SSTaRS Handout
No ratings yet
Shoot For The SSTaRS Handout
4 pages
Eng 121 Main - 0
No ratings yet
Eng 121 Main - 0
163 pages
09 Chapter 3
No ratings yet
09 Chapter 3
67 pages
Grade 3 English Grammar
No ratings yet
Grade 3 English Grammar
8 pages
Ponyatie Semanticheskoe Pole
No ratings yet
Ponyatie Semanticheskoe Pole
4 pages
Hennessy - The Complexities of Cixious and Ecriture Feminine PDF
No ratings yet
Hennessy - The Complexities of Cixious and Ecriture Feminine PDF
3 pages
UNIT 10 Science & Progress: SPEAKING Comparing Past and Present
No ratings yet
UNIT 10 Science & Progress: SPEAKING Comparing Past and Present
3 pages
Estructura de Un Ensayo de Causa y Efecto
100% (1)
Estructura de Un Ensayo de Causa y Efecto
6 pages
SOAL ENGLISH KELAS 3
No ratings yet
SOAL ENGLISH KELAS 3
7 pages
Advanced Unit 01b PDF
No ratings yet
Advanced Unit 01b PDF
2 pages
Grammar Worksheet 2: For Origin
No ratings yet
Grammar Worksheet 2: For Origin
1 page
"Let Them Snuff Out The Moon": Faiz Ahmed Faiz's Prison Lyrics in Dast-E Saba
No ratings yet
"Let Them Snuff Out The Moon": Faiz Ahmed Faiz's Prison Lyrics in Dast-E Saba
26 pages
Q2e rw3 Vocab
No ratings yet
Q2e rw3 Vocab
2 pages
Beacon Light Academy: List of Books For Class VII Session 2020-2021 Subjects Name of Books Publishers Stationery
No ratings yet
Beacon Light Academy: List of Books For Class VII Session 2020-2021 Subjects Name of Books Publishers Stationery
1 page
Grammar Be Used To and Get Used To en Ingles Ejercicos
No ratings yet
Grammar Be Used To and Get Used To en Ingles Ejercicos
5 pages
SE-Series-User-manual-EN_241119_172635
No ratings yet
SE-Series-User-manual-EN_241119_172635
24 pages
k4 Group Lesson Plan-Compressed
No ratings yet
k4 Group Lesson Plan-Compressed
26 pages
Reading Comprehension Exercise # 222
No ratings yet
Reading Comprehension Exercise # 222
2 pages
1984 Essay A
No ratings yet
1984 Essay A
2 pages
I. Write Questions About The Italicized Parts of The Following Sentences
No ratings yet
I. Write Questions About The Italicized Parts of The Following Sentences
3 pages
The Last Train Whole Group Literacy - Kaatje Harrison v2
No ratings yet
The Last Train Whole Group Literacy - Kaatje Harrison v2
8 pages
Culture:: UNIT 1: Cultural Values and Issues
No ratings yet
Culture:: UNIT 1: Cultural Values and Issues
7 pages

Speech Processing 15-492/18-492: Speech Recognition Template Matching

Uploaded by

Speech Processing 15-492/18-492: Speech Recognition Template Matching

Uploaded by

Speech Processing 15-492/18-492

Speech Recognition Template matching

Speech Recognition by Templates

Radio Rex (1922)

(Crystalradio.com and Rhys Jones)

Voice activated toy train

Toys pets dont need perfect ASR

Record ASR example

Select closest example For example

Voice Dialing System

Lets Go Bus Information System

Matching in Time Domain

What about spectral properties

Matching in Frequency Domain

When it fails we change our delivery

Dynamic Time Warping

What happens with more templates

Choose templates that are very different

More reliable matching

But some distances are bigger than others

More reliable matching

DTW align all of the examples Collect statistics as a Gaussian

More reliable distances

Use Mahalanobis distance

Extending Template matching

But there are many words

Extending template model

Dynamic Time Warping (DTW)

Averaging over multiple models Distance metrics

You might also like