0% found this document useful (0 votes)
264 views

Speech Processing

This document provides an overview and introduction to a course on speech processing. It outlines the following key points: - The course will cover fundamental concepts in speech production, perception, analysis, recognition, synthesis and modification. It will introduce mathematical foundations and computational methods for processing speech signals. - Students will learn to analyze, visualize and manipulate speech signals, as well as build a complete speech recognition system. - The course will meet three times a week and include homework assignments, exams and a group project. Grading will be based on homework, a project, a midterm and a final exam. - Topics will include speech analysis techniques, speech and speaker recognition using Hidden Markov Models, and speech synthesis

Uploaded by

selvaraj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
264 views

Speech Processing

This document provides an overview and introduction to a course on speech processing. It outlines the following key points: - The course will cover fundamental concepts in speech production, perception, analysis, recognition, synthesis and modification. It will introduce mathematical foundations and computational methods for processing speech signals. - Students will learn to analyze, visualize and manipulate speech signals, as well as build a complete speech recognition system. - The course will meet three times a week and include homework assignments, exams and a group project. Grading will be based on homework, a project, a midterm and a final exam. - Topics will include speech analysis techniques, speech and speaker recognition using Hidden Markov Models, and speech synthesis

Uploaded by

selvaraj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

L1: Course introduction

Course introduction
Course logistics
Course contents

Introduction to Speech Processing | Ricardo Gutierrez-Osuna | CSE@TAMU

Course introduction
What is speech processing?
The study of speech signals and their processing methods
Speech processing encompasses a number of related areas

Speech recognition: extracting the linguistic content of the speech signal


Speaker recognition: recognizing the identity of speakers by their voice
Speech coding: compression of speech signals for telecommunication
Speech synthesis: computer-generated speech (e.g., from text)
Speech enhancement: improving intelligibility or perceptual quality of
speech signals
The music carried on until
mju:zk kr[i,]d n ntl
after midnight and then the
:ft mdnat[|, ]n[d] en[|, ]
drummers became tired and
drmz b[,]kem tad[|, ]n[d]
the dancers became cold.
d:nsz b[,]kem kld|

Introduction to Speech Processing | Ricardo Gutierrez-Osuna | CSE@TAMU

Applications of speech processing

Human computer interfaces (e.g., speech I/O, affective)


Telecommunication (e.g., speech enhancement, translation)
Assistive technologies (e.g., blindness/deafness, language learning)
Audio mining (e.g., diarization, tagging)
Security (e.g., biometrics, forensics)

Related disciplines

Digital signal processing


Natural language processing
Machine learning
Phonetics
Human computer interaction
Perceptual psychology

Introduction to Speech Processing | Ricardo Gutierrez-Osuna | CSE@TAMU

The course objectives are to familiarize students with


Fundamental concepts of speech production and speech perception
Mathematical foundations of signal processing and pattern
recognition
Computational methods for speech analysis, recognition, synthesis,
and modification

As outcomes, students will be able to


Manipulate, visualize, and analyze speech signals
Perform various decompositions, codifications, and modifications of
speech signals
Build a complete speech recognition system using state of the art tools

Introduction to Speech Processing | Ricardo Gutierrez-Osuna | CSE@TAMU

Course logistics
Class meetings
MWF 9:10-10:00am
HRBB 126

Course prerequisites
ECEN 314 or equivalent, or permission of the instructor
Basic knowledge of signals and systems, linear algebra, and probability
and statistics
Programming experience in a high-level language is required

Textbook
The course will not have an official textbook and instead will be based
on lecture slides developed by the instructor from several sources
Additional course materials may be found in the course website
http://courses.cs.tamu.edu/rgutier/csce689_s11/

Introduction to Speech Processing | Ricardo Gutierrez-Osuna | CSE@TAMU

Recommended references
J. Holmes & W. Holmes, Speech Synthesis and Recognition, 2nd Ed,
CRC Press, 2001 (available online at TAMU libraries)
P. Taylor, Text-to-speech synthesis, Cambridge University Press, 2009
L. R. Rabiner and R. W. Schafer, Introduction to Digital Speech
Processing, Foundations and Trends in Signal Processing 1(12), 2007
B. Gold and N. Morgan, Speech and Audio Signal Processing:
Processing and perception of speech and music, Wiley, 2000
T. Dutoit and F. Marques, Applied signal processing, a Matlab-based
proof-of-concept, Springer, 2009
J. Benesty, M. M. Sondhi, and Y. Huang (Eds.), Springer Handbook of
Speech Processing, 2008 (available online at TAMU libraries)
X. Huang, A. Acero and H.-W. Hon, Spoken Language Processing,
Prentice Hall, 2001

Introduction to Speech Processing | Ricardo Gutierrez-Osuna | CSE@TAMU

Grading
Homework assignments
Three assignments, roughly every 2-3 weeks
Emphasis on implementation of material presented in class
Must be done individually

Tests
Midterm and final exam
Closed-books, closed notes (cheat-sheet allowed)

Project
Team-based, in groups of up to 3 people
Three types: application of existing tools, development of new tools,
design of new algorithms
Homework
Project
Midterm
Final Exam

Weight (%)
40
30
15
15

Introduction to Speech Processing | Ricardo Gutierrez-Osuna | CSE@TAMU

Course contents

Introduction (3 lectures)

Mathematical foundations (4 lectures)

Short-time Fourier analysis and synthesis


Linear prediction of speech
Source estimation
Cepstral analysis

Speech and speaker recognition (6 lectures)

Signals and transforms


Digital filters
Probability, statistics and estimation theory
Pattern recognition principles

Speech analysis and coding (4 lectures)

Course introduction
Speech production and perception
Organization of speech sounds

Template matching
Hidden Markov models
Refinements for HMMs
Large vocabulary continuous speech recognition
The HTK speech recognition system
Speaker recognition

Speech synthesis and modification (4 lectures)

Text-to-speech front-end
Text-to-speech back-end
Prosodic modification of speech
Voice conversion

Introduction to Speech Processing | Ricardo Gutierrez-Osuna | CSE@TAMU

Tentative schedule*
Week
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

Date
1/17
1/19
1/24
1/26
1/31
2/2
2/7
2/9
2/14
2/16
2/21
2/23
2/28
3/2
3/7
3/9
3/14
3/16
3/21
3/23
3/28
3/30
4/4
4/6
4/11
4/13
4/18
4/20
4/25
4/27
5/2
5/4
5/9

Classroom meeting
No class (MLK day)
Course introduction
Speech production and perception
Organization of speech sounds
Signals and transforms
Digital filters
Short-time Fourier analysis and synthesis
Linear prediction of speech
Source estimation
Cepstral analysis
Probability, statistics, and estimation theory
Pattern recognition principles
Template matching
Hidden Markov models
Review/catch-up day
Midterm exam
Spring Break
Spring Break
Refinements for HMMs
Large vocabulary continuous speech recognition
HTK speech recognition system
Speaker recognition
Speech synthesis (front-end)
Speech synthesis (back end)
Review/catch-up day
Proposal presentations
Prosodic modification of speech
Voice conversion
Review/catch-up day
Final exam
Prep day (no class)
Reading day (no class)
Project presentations
(8:00AM - 10:00PM)

Introduction to Speech Processing | Ricardo Gutierrez-Osuna | CSE@TAMU

Materials due

HW1 assigned

HW1 due
HW2 assigned

HW2 due

HW3 assigned

HW3 due
Project proposal

Project report

*This timeline assumes MW meeting times

You might also like