0% found this document useful (0 votes)
78 views23 pages

Speaker Recognation System Srs

The document discusses speaker recognition systems. It begins by defining speaker recognition as the process of automatically identifying or verifying a speaker based on information in their speech waves. It then discusses the objectives of extracting, characterizing, and recognizing speaker identity from speech signals. The document outlines the basic steps of a speaker recognition system as voice recording, feature extraction, pattern matching, and decision making. It focuses on Mel Frequency Cepstral Coefficients (MFCC) for feature extraction and Gaussian Mixture Models (GMM) for pattern matching. The document also provides details about its own experimental methodology using the TIMIT database, MFCC features, and GMM modeling in Matlab.

Uploaded by

Levko Dovgan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views23 pages

Speaker Recognation System Srs

The document discusses speaker recognition systems. It begins by defining speaker recognition as the process of automatically identifying or verifying a speaker based on information in their speech waves. It then discusses the objectives of extracting, characterizing, and recognizing speaker identity from speech signals. The document outlines the basic steps of a speaker recognition system as voice recording, feature extraction, pattern matching, and decision making. It focuses on Mel Frequency Cepstral Coefficients (MFCC) for feature extraction and Gaussian Mixture Models (GMM) for pattern matching. The document also provides details about its own experimental methodology using the TIMIT database, MFCC features, and GMM modeling in Matlab.

Uploaded by

Levko Dovgan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

asadthomas@gmail.

com
[email protected]

SPEAKER RECOGNATION
SYSTEM (SRS)
MD. ASAD
[email protected]
RIYA BHADRA
[email protected]
IASNLP-2015, IIIT Hyderabad
Introduction

• Speaker Recognition: It is the process of automatically recognizing


(identify & verify) who is speaking on the basis of individual information
that exist in speech waves.
Objectives and aims
• To extract, characterize, and recognize the information about a
speaker identity.

• To building a robust system to identify and verify a speaker


accurately.
Automatically extract information transmitted
in speech signal
Application of speaker recognition

• SR uses are voice dialling, banking by telephone, telephone


shopping, database access services, information services,
voice mail, security control for confidential information
areas, and remote access to computers.

• Some systems use "anti-speaker" techniques such as 


cohort models.
Development of Speaker Recognition Systems

• The first type of speaker recognition machine using


spectrograms of voices was invented in the 1960’s. It was
called voiceprint analysis or visible speech.

• Since the mid-1980s, this field has been steadily getting


matured that commercial applications of SR have been
increasing, and many companies currently offer this
technology.
Speech processing taxonomy
Principles of Speaker Recognition

Two applications:

• Speaker Identification and


• Speaker Verification
There exist two types of speaker recognition:

• Text dependent (restrained)
• Text independent (unrestrained)

Text dependent recognition has better performance for


subjects that cooperate. But text independent voice
recognition is more flexible that it can be used for non-
cooperating individuals.
• Close Set
• Open Set
Speaker Recognition

• Basically identification or authentication using speaker


recognition consists of four steps:

1. Voice Recording
2. Feature Extraction
3. Pattern Matching
4. Decision (accept / reject)
Feature Extraction

• Feature extraction is to convert speech waveform to


some type of parametric representation. This sub-
process is the key part in front-end processing, and
always be viewed as a ‘replacer’ of front-end
processing
• Models used for feature extraction are LPCCs, MFCCs
etc…
Pattern Matching

• Pattern matching is the actual comparisson of the extracted


frames with known speaker models (or templates), this results
in a matching score which quantifies the similarity in between
the voice recording and a known speaker model. Pattern
matching is often based on Hidden Markov Models (HMMs),
a statistical model which takes into account the underlying
variations and temporal changes of the accoustic pattern.
• Models used for Pattern Matching are VQ, NN,
HMM,GMM etc…
Speaker Recognition

• Data Base using = TIMIT


• Feature extraction = MFCCs
• Pattern matching = GMM
• Tool used = Mat-Lab
WHY MFCCs?

Mel-frequency Cepstrum Coefficients:


• Until now, Mel-frequency cepstral coefficients (MFCC) are the best
known and most commonly used features for not only speech
recognition, but speaker recognition as well. The computation of
MFCC is based on the short-term analysis and it is similar to the
computation of Cepstral Coefficients. The significant difference lays
on the usage of critical bank filters to realize mel-frequency
warping. The critical bandwidths with frequency are based on the
human ears perception.
• A mel is a unit of measure based on the human ear’s perceived
frequency.
Intoduction to GMM

• Gaussian • Mixture Model


“Gaussian is a characteristic symmetric “mixture model is a probabilistic model
“bell carve” shape that quickly falls off which assumes the underlying data to
towards 0 (practically)” belong to a mixture distribution”
Why GMM?

• Classification paradigms used in SRS during the past 20


years VQ, NN, HMM and GMM represent Vector
Quantization, Neutral Network, Hidden Markov Model and
Gaussian Mixture Model respectively. A continuous ergodic
HMM method is superior to a discrete ergodic HMM
method and that a continuous ergodic HMM method is as
robust as a VQ-based method when enough training data is
available. However, when little data is available, the VQ-
based method is more robust than a continuous HMM
method.
EXPERIMENTAL METHODOLOGY

Dataset Description
• TIMIT Database.
• Total Number of speakers= 98
• Female speakers= 48
• Male Speakers= 50
• Total sentences= 10
• Trained Data= 8 sentences for each speaker
• Testing Data= 2 sentences for each speaker
Analysis Tool
• Matlab
Result
References:

1. Reynolds, D. A and Rose, R. C. 1995. “Robust Text- Independent


Speaker Identification Using Gaussian Mixture Speaker Models”,
IEEE Trans. on Speech and Audio Processing, vol.3, No.1, pp.72-
83,
2. Panda, A. K & Sahoo, A. K. 2011. Study of Speaker Recognition
System. Thesis NIT, Rourkela.
3. Ling Feng, “Speaker Recognition”, Kgs. Lyngby 2004
Question?

You might also like