0% found this document useful (0 votes)

60 views

Speech Recognition Using A DSP: Lunds Universitet

This document describes a student project to implement speech recognition on a digital signal processor (DSP) in two parts. The first part developed and tested the speech recognition algorithm (SRA) in MATLAB. The second part translated the MATLAB code to C and implemented it on the DSP. The SRA divides speech into 20ms segments, calculates reflection coefficients using the Schur algorithm to model each segment, and compares models to a database to identify words. Environmental noise is reduced using a combined high-pass and pre-emphasis filter on the sampled speech signal before modeling. The DSP has limited memory and processing power, requiring the SRA to be optimized for real-time usage with minimal resources.

Uploaded by

Tint Swe Oo

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views

Speech Recognition Using A DSP: Lunds Universitet

Uploaded by

Tint Swe Oo

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Lunds universitet

Algorithms in Signal Processors

ETIN80

Speech Recognition using a DSP

Authors
Johannes Koch, elt13jko
Olle Ferling, tna12ofe
Johan Persson, elt13jpe
Johan Östholm, elt13jos
Date
2017-03-05
Contents
1 Introduction 2

2 Theory 2
2.1 Basic Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Unvoiced and voiced speech . . . . . . . . . . . . . . . . . . . . . . . . 2
2.3 Reflection coefficients and the Schur Algorithm . . . . . . . . . . . . . 2
2.4 DSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.5 Filtering the sampled signal . . . . . . . . . . . . . . . . . . . . . . . . 3

3 Method 6
3.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3.1 C-Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 Result 10

5 Discussion and conclusion 11

Koch, Ferling, Persson, Östholm

1 Introduction
The purpose of this project was to research, design and implement a speech recogni-
tion algorithm on a ADSP-21262 SHARC digital signal processor (DSP). The original
goal was to be able to distinguish two spoken words from each other and to match the
recordings with a member of a previously built database. Later on, the database was
extended to contain four words instead of two. Another objective was to validate if a
recorded word is a good enough match with the library. Since the DSP had hardware
limitations and an unfamiliar programming platform, the project was split into two
main parts.

The first part consisted of implementing the speech recognition algorithm, hereby de-
noted as SRA, in Matlab. Compared to the DSP, Matlab allows the use of seemingly
infinite memory and processing power. Matlab also gives the benefit of having access
to the entire signal that needs processing at all times. The DSP, on the other hand,
runs in real-time and has limited memory, meaning no data can be stored. Part one of
the project consisted of making sure the SRA design worked when being simulated in
Matlab.

Part two consisted of translating the Matlab code into C code and making sure the
algorithms still worked despite the limitations of the DSP.

2 Theory

2.1 Basic Speech Recognition

A basic speech recognition algorithm can be split into three states: listening, processing,
and matching. In the listening phase, the DSP analyses the present audio signal to
determine if speech is present. When speech is detected, the DSP starts to process
the information to describe the speech in a compact way. A common way to describe
speech is to divide into subsets of 20 ms and build mathematical models of each subset.
When the processing is done, the DSP continues to the last phase, which compares the
speech model to a pre-defined database. Each of these phases will be more thoroughly
described later on in this report.

2.2 Unvoiced and voiced speech

Speech can be categorized into unvoiced and voiced speech. Unvoiced speech is the
noise-like parts of a word containing consonants, and the tonal voiced speech containing
the vowels. As an example, take the word "ask", where "a" is tonal, and "sk" is noisy.
In this project we assume that the reflection coefficients for the voiced speech will result
in dominating reflection coefficients.

2.3 Reflection coefficients and the Schur Algorithm

As mentioned above, a common way to describe speech is to split the speech in subsets
of 20 ms each and mathematically describe each subset. For a short time as 20 ms,
speech can be viewed as a wide-sense stationary stochastic process and can thus be
fairly well described with an direct-form FIR-model (AR-model). This is a process that
looks like

yt = −a1 yt−1 − a2 yt−2 − ... − ap yt−p + et (1)

Page 2
Koch, Ferling, Persson, Östholm

where et is white Gaussian noise, the filter koefficients are represented by ai , i =

1, 2, . . . , p and yt is the output at time t. Of course, more coefficients means better
abilities to describe the speech-subset. However, a more crude model with fewer coeffi-
cients will describe only the vital parts of the word. This makes matching speech from
another person easier, since the vital parts describing a spoken word is similar from
person to person.
Another way of describing the direct-form FIR-model is with a set of reflection coeffi-
cients {K} of the lattice filter. These reflection coefficients are closely related to the
AR-coefficients, and are calculated recursively. Linear prediction is used to calculate
the optimal model describing the speech. The reflection coefficients can be calculated
directly from the autocorrelation of the subset-signal using the Schur algorithm. The
Schur algorithm is fast and will produce exactly the same result as the Levinson-Durbin
algorithm, but without having to compute the AR-coefficients1 .
The resulting N reflection coefficients describe the speech-subset. Doing this for each
subset will produce a matrix that describe the most vital parts of the spoken word, and
can be used to compare to other matrices in order to match and validate what word
was spoken.

2.4 DSP

A Digital Signal Processor (DSP) is a purpose-built device specialized in processing data

which is continuously supplied. The DSP used in this project was a Sharc ADSP-21262
fitted on a custom made PCB. This DSP has two parallel buses on which messages
can be sent, allowing for high-speed processing. The DSP has hardware support for
addition, subtraction and multiplication. Division is emulated and therefore substan-
tially slower than the other mathematical operations. When possible, multiplication by
a constant, e.g. scale = 1/q should be used instead of dividing repeatedly. The total
memory of the DSP is limited to 4 × 128 kB all in all. This puts restrictions on the
algorithms that should run on the device, as these cannot use too much memory. An
average of all reflection coefficients describing a word is stored in a 10 × 10 array on the
DSP in order to avoid noise having a too large effect on the database.

2.5 Filtering the sampled signal

Digital filters are applied to boost and cut certain characteristics of the sampled signal
to make for better mathematical models. To deal with this, two different filters are
needed. Environmental noise causing mic bias, i.e. a door being shut, are low in
frequency but high in energy and are dealt with by a high pass filter. Much of the
information in human speech lies in the higher frequencies. In order to boost the lower
energy levels of high frequencies present in the human speech, a pre-emphasis filter is
used. A pre-emphasis filter boosts the high frequencies while attenuating the lower
ones, which flattens the spectrum. It just so happens that the filter characteristics of
the high pass and the pre-emphasis filter combined can be reproduced using only one
filter in order to conserve resources. As displayed in figures 1 to 3 below, combining the
characteristics of the magnitude response of the high pass filter and the pre-emphasis
filter will result in the combined filter shown in figure 3 . The combined filter have
a damping effect on low frequencies of the high pass filter, and the boosting effect of
high frequencies from the pre-emphasis filter. Figure 4 below display the magnitude
response of the used filter, and it’s easy to see the similarities to the combined filter.
1
J. Proakis, Digital Signal Processing. Principles, Algorithms and Applications, Pearson Education
Limited, United States of America, 2014, p. 859-862

Page 3
Koch, Ferling, Persson, Östholm

Figure 1: Magnitude response of the high pass filter

Figure 2: Magnitude response of the pre-emphasis filter

Figure 3: Magnitude response of the two filters above combined

Page 4
Koch, Ferling, Persson, Östholm

Figure 4: Magnitude response of the used filter

Page 5
Koch, Ferling, Persson, Östholm

3 Method

3.1 Design

In figure 5, the general design structure of the speech recognition algorithm is shown.
First of all, the recorded audio needs to be analysed to determine if speech is present.
The algorithm that determines if speech is present listens on the background noise on
the channel and sets a threshold for when the level is breached. To avoid spikes in the
audio channel originating from e.g. a cup of coffee that is placed on the same table
as the microphone, the algorithm requires the signal to maintain its energy for a short
period of time. To avoid microphone bias from e.g. a pressure wave originating from
a door being closed, a high pass filter was implemented before the speech-detection
algorithm.
Reflection coefficients are always calculated, independent of whether speech has been
detected or not. The difference is that if no speech is detected, the reflection coefficients
of the three previous audio blocks are stored in a circular buffer. When finally speech is
detected, the buffers are stored as the first columns in a coefficient matrix. This is done
to avoid relevant data being lost due to a recording not starting in the beginning of a
word. When speech has been detected, there is an algorithm that checks whether or not
the speech has ended. This is done in a similar fashion as when detecting speech, with
some additional features that prevents the algorithm to cut in a "natural pause" in a
word. The DSP then continues to calculate reflection coefficients and storing them in
the coefficient matrix. Ten coefficients are calculated, which is a common number that
will describe the vital parts of the speech well enough without the risk of over-fitting.
Finally, when speech has ended, the following three coefficient vectors are stored in
the matrix. Three coefficient vectors in the beginning and end is intended both as a
safeguard to catch all the speech, but also as simple windowing to apply weight on
the middle parts. Now the DSP will compress the matrix from a 10 × N matrix to
a 10 × 10 matrix. This is done to allow for comparison between the recorded speech
and the entries of the database. The compression-function was implemented in such
way that it for the first 10 − u, u = mod(N, 10) columns in the compressed matrix
N
calculates the row-wise average of K columns, where K = f loor( 10 ), in the original
matrix. These average values are then saved in the new, compressed, matrix. The
remaining u columns in the compressed matrix are calculated as the row-wise average
of K + 1 columns in the original 10 × N matrix.
The compressed matrix is compared to the 10×10 matrices in the database by evaluating
X
k 2
ek = (Rij − Dij ) (2)
i,j

where Dk , k = 1, 2, 3, 4 represent the different database matrices, and where R repre-

sents the recorded and compressed matrix at hand. The matrix Dk that results in the
smallest ek will be regarded as the best match.

Figure 5: Flow chart of the general design. y = yes, n = no

Page 6
Koch, Ferling, Persson, Östholm

3.2 Simulation

To verify that the previous mentioned design worked as intended, simulations in Matlab
were made. As a first step the reflection coefficients for a pre-recorded speech sample
was calculated using Matlab’s built-in xcorr- and Schurrc-functions. It should be men-
tioned that the xcorr-function returns the auto-correlation sequence of the supplied
signal and that this sequence is used in the Schur algorithm. It was suitable to use 10
reflection coefficients for each block of data, each block being 20 ms of speech. With a
sample frequency of 16 kHz each block became 320 samples. Next step was to record
an audio sample. Since Matlab supports audio recording, a microphone was used to
record speech. The recording was then divided into N blocks, each block of 320 values,
using Matlab’s buffer-function.

For each block of data, a pre-emphasis filter was applied. This filter was a FIR-filter
with coefficients 1 and −1, corresponding to a high-pass filter. Matlab’s filter-function
was used to implement this. A vector of 10 reflections coefficients for each block was
then calculated. This was done for all N blocks of data. Each vector of coefficients was
then saved in a 10×N matrix, where N is determined by the length of the recorded audio
signal. To make the implementation in C easier, Matlab’s built in Schurrc-function was
replaced by a self-written function which had the same functionality as the built-in one.

Since recordings of speech will have some silence before and after the speech, a function
that detects the start and the end of the spoken word and removes the silent parts,
was written. This function was then split into two separate functions, one that removes
the "silent" samples before the speech and one that removes the "silent" samples after
the speech. The reason that this is important is that the "silent" samples contains no
information that is of interest when the reflections coefficients are calculated .
When the samples containing no information have been removed, the data is divided
and reflection coefficients calculated as described in the first paragraph. To make com-
parison between two matrices possible, all the matrices have to be compressed to the
same size. This was achieved by implementing a function that compresses a 10 × N -
matrix to a 10 × 10-matrix, more thoroughly described in the design section of this
report.

After this was done, a database of three words was created; this was changed to
four words during the implementation on the DSP. The words of the database, dur-
ing the simulation, were: "Mikael", "Swartling" and "hundratjugosju". At first, only
one recording of each word was used to build the database. This was to test that the
recording worked as intended. Later this was replaced with the average of 12 recordings
per word, 3 recordings per group member. The averaging was done on the compressed
10 × 10 matrices generated at every recording. The average was then saved in the
database. If only one database was to be used, averaging gave better result for multiple
persons than just a database built from one person.

3.3 Implementation

3.3.1 C-Code

The DSP was programmed using the C language. A DSP library was supplied, in which
functions to facilitate the set-up and use of the DSP were included. Some macros that
can be considered part of the design were also part of the library. Notably, the sampling
frequency was set to Fs = 16000 Hz and the number of samples to be included in each
data block was picked as 320. In this way, each block of data will correspond to 20 ms
of speech, for reasons explained previously in this report.
To avoid overflowing the limited memory of the DSP (which would cause the speech
recognition algorithm to crash), only static memory allocation was used in the imple-

Page 7
Koch, Ferling, Persson, Östholm

mentation. This is a feasible approach because the nature of the program design. All
matrices (multi-dimensional arrays) that the SRA uses are of pre-defined sizes. The
only thing one cannot be certain of at compilation time, is how long the spoken word
that should be recognized will be. Thus, the matrix containing the reflection coefficients
of the recorded word is of unknown size. One solution to this problem is to allocate
a two-dimensional array which is sufficiently large for any possible word. The imple-
mentation described in this report makes the assumption that no word will be longer
than 6 seconds, implying that the array containing the speech recording should be a
10 × 300 array of float type. Thus, all memory in use will be known at compilation time
and static memory allocation can be used. The advantage of this approach is that no
program crash due to memory overflow is possible when the SRA runs. If the program
tries to use more memory than available, this will be recognized during compilation
and produce errors. This is preferable to the case of dynamic memory allocation, where
overflowing the memory will not be caught at compilation time.
The database was built using the averaging technique described previously in this re-
port. Every member of the project group was allowed to make three recording of the
same word (e.g. "Mikael"). The average of the twelve resulting reflection coefficient
matrices was then saved in a database.c-file as a 10 × 10 matrix. To be able to use the
database in the main program, the corresponding header file was simply included in
the main program.
The DSP has four different buttons, which can be used to send interrupts to the DSP.
In this project, the buttons were used to
1. Set the DSP in different speech recognition modes
2. Record a database
In this project, the four buttons were used to set the DSP in one of four pre-defined
recognition modes. These modes were
BUTTON1 Validation ON/OFF.
BUTTON2 Use Johannes’ personal database.
BUTTON3 Use Johan Persson’s personal database.
BUTTON4 Use the common (averaged) database. Standard case.
As shown above, the DSP can either run with validation mode on or off. If, for instance,
button 2 has been pressed (and the DSP is thus running with the database of Johannes)
and button 1 is pressed, the DSP will continue to run with Johannes’ database but with
a new validation state. When the DSP boots up, it will use the averaged database and
no validation as default.
When building the database, the DSP was set to start listening for speech only when
any of the keys on the DSP was pressed. This was done in order to re-initialize the
DSP between the recordings described in the design section above. The need for a key-
press introduced a debouncing phenomenon. When debouncing occurs, one key-press
is interpreted as numerous by the processor. This in turn gives numerous interrupts
if all keyboard interrupts are handled directly in the keyboard interrupt function. A
feasible solution to this problem is to set a timer in the keyboard interrupt handler and
handle the key-press in the timer interrupt handler once the timer expires. By using
this approach, any debouncing in mechanical buttons will be allowed to die out before
the timer expires and only one keyboard interrupt will be handled. The length of the
timer, tdelay , has to be chosen depending on what hardware is being used. Admittedly,
this approach introduces a delay in the total processing time equalling the length of
the timer interrupt. However, the delay is introduced before any recording takes place,
implying that the speaker has to wait for tdelay seconds. The actual processing time is
not affected. If the timer is set to a reasonably low value (e.g. tdelay = 100 ms), the
damage is quite small. The person using the application will most likely not be able to
notice the small delay.

Page 8
Koch, Ferling, Persson, Östholm

The DSP used in this project has six LEDs, which are used to indicate different functions
and results to the user. Basic functionalities that the DSP indicates include
LED1 DSP status. LED on indicates DSP on and listening for speech.
LED2 DSP status. Speech detected. Recording.
LED3 Word match. Detected word is "Mikael".
LED4 Word match. Detected word is "Swartling".
LED5 Word match. Detected word is "Screen".
LED6 Word match. Detected word is "Black-board".
To compare two matrices, equation 2 was used to match each word in the database with
the recording. In the case where no validation was used, the word with the smallest
error was considered as the correct word and therefore considered to be a match. When
validation was used, the best match was also required to have a sum of squared errors
less than a pre-defined value for the word to be considered a match. If the threshold
was violated, the DSP output "no match" instead.

Page 9
Koch, Ferling, Persson, Östholm

4 Result
In Matlab the algorithms worked just as intended, and the program was able to distin-
guish the right word most of the times. It seemed to match roughly the same amount of
times regardless of which of the four recorded words in the database that was spoken.
The amount of matches seemed to be the same no matter whether or not the person
speaking was part of creating the database. However, the smallest value of the error
described by equation 2, which is used to determine what word was being said, tended
to vary. The smallest recorded error was around 0.9, and the largest error still yielding
a correct match was somewhere around 7. The big difference made validation difficult,
even though the program still managed to find a correct match.
In a conducted test, three of the group members said each member-word of the database
ten times and the match frequency was noted. They can be seen in the table 1 below.

Mikael Swartling Screen Blackboard Total

Correct matches 22/30 24/30 29/30 30/30 105/120
Incorrect match 8/30 6/30 1/30 0/30 15/120

Table 1: The amount of correct matches/tries without validation

Further on, a test where a group member using his own database and validation mode
ON was performed. In this test, the four words of the database was repeated ten times
each. Also, the words "Jacket" and "DSP" was repeated ten times to test the validation
threshold. The results are shown in table 2 below.

Mikael Swartling Screen Blackboard Jacket DSP

Correct matches 10/10 4/10 4/10 10/10 0/10 0/10
Incorrect match 0/10 0/10 0/10 0/10 0/10 0/10
No match 0/10 6/10 6/10 0/10 10/10 10/10

Table 2: The amount of correct matches/tries using validation on personal database.

Page 10
Koch, Ferling, Persson, Östholm

5 Discussion and conclusion

The result of the project was very satisfying; we managed to accomplish more than the
basic requirements. These were to make a program that could distinguish between two
words for multiple speakers, our speech recognition program can distinguish between
four words for multiple speakers with satisfying results.
As can be seen in table 2, we never get an incorrect match, which is the most important
thing in speech recognition. However, by choosing a threshold giving raise to this
result will cause some of the words in the database to not be recognized as a word in
the database. Different words give different magnitudes of errors, making it hard to
implement a common threshold level. Setting a separate threshold for each word can
be done with some more fine tuning. Of course, there might be a better way to set a
threshold than just a error magnitude threshold.
It is also possible that different matrix compression techniques would generate a better
result, or that different sizes of the compressed matrix would work better for our pur-
poses. The reason that we chose the previous mentioned method to compress a 10 × N
matrix was because of its simplicity. The size of the compressed matrix was chosen to
10 × 10 since it was desirable to use as little memory as possible, and the size 10 × 10
was considered a reasonable size. This might however not be the optimal size, nor the
method to compress the 10 × N matrix.
Also worth mentioning is that at first, we only had three words in the database. This
amount was later increased to four words, the reason being that one of the three words
was quite hard to identify and often resulted in incorrect matches being displayed. The
reason behind the problems might the word length, or that the structure of the words
are too similar. In any case, the word was changed and two other words were entered
into the database instead. This change resulted in better matching even though the
size of the database was increased.
When discussing the words that were used in the database, it should be mentioned that
the SRA relies on two parameters for correct identification of a recording. These are the
vowels of the detected word as well as on the position of these vowels within the word.
The reflection coefficients are very much depending on what vowels are being spoken.
Consonants contains less information, and reflection coefficients describing consonants
are less prominent than reflection coefficients of vowels. Perhaps using other words as
our database would have increased the hit rate using validation methods, but this was
nothing we investigated further due to lack of time. We’ve also come to the conclusion
that accentuation matters. Since the vowels are the information containing part of a
word, and the vowel is spoken differently depending on your accent, it is quite hard to
build a database that works for a common speaker. This was especially apparent when
a group member tried to use the database of another person, as this resulted in more
errors by the SRA.

Page 11

DM-MICA RENTOMOJO Shivram Kerkar
No ratings yet
DM-MICA RENTOMOJO Shivram Kerkar
7 pages
Wind Analysis For Open Structure (Solar Panels) Based On ASCE 7-16 Input Data
No ratings yet
Wind Analysis For Open Structure (Solar Panels) Based On ASCE 7-16 Input Data
8 pages
Fundamentals of Communication Systems
From Everand
Fundamentals of Communication Systems
Janak Sodha
No ratings yet
Multimedia Programming Using Max/MSP and TouchDesigner
From Everand
Multimedia Programming Using Max/MSP and TouchDesigner
Patrik Lechner
5/5 (3)
Fourier Analysis-A Signal Processing Approach PDF
100% (3)
Fourier Analysis-A Signal Processing Approach PDF
365 pages
Catalogues Kerger 2019
100% (1)
Catalogues Kerger 2019
257 pages
The Modern Law Firm How To Thrive in An Era of Rapid Technological Change (Heinan Landa) English - 2020 (Z-Library)
No ratings yet
The Modern Law Firm How To Thrive in An Era of Rapid Technological Change (Heinan Landa) English - 2020 (Z-Library)
118 pages
Rabiner & Juang - Fundamentals of Speech Recognition
100% (2)
Rabiner & Juang - Fundamentals of Speech Recognition
277 pages
Discrete-Time Processing of Speech Signals (IEEE Press Classic Reissue) PDF
No ratings yet
Discrete-Time Processing of Speech Signals (IEEE Press Classic Reissue) PDF
919 pages
Project Definitions
No ratings yet
Project Definitions
29 pages
DAFX: Digital Audio Effects
From Everand
DAFX: Digital Audio Effects
Udo Zölzer
3.5/5 (2)
The Mystical Aspect of Colors and The Spheres
No ratings yet
The Mystical Aspect of Colors and The Spheres
8 pages
Speech Recognition Using Matrix Comparison: Vishnupriya Gupta
No ratings yet
Speech Recognition Using Matrix Comparison: Vishnupriya Gupta
3 pages
Good Matter
No ratings yet
Good Matter
57 pages
Audio Signal Processing Audio Signal Processing
No ratings yet
Audio Signal Processing Audio Signal Processing
31 pages
Assamese Numeral Corpus For Speech Recognition Using ANN: Master of Science
No ratings yet
Assamese Numeral Corpus For Speech Recognition Using ANN: Master of Science
58 pages
Lecture 1
No ratings yet
Lecture 1
48 pages
Final Thesis Speech Recognition
No ratings yet
Final Thesis Speech Recognition
45 pages
Speech Recognition (Dr. M. Sabarimalai Manikandan
No ratings yet
Speech Recognition (Dr. M. Sabarimalai Manikandan
2 pages
Reconocimiento de Voz - MATLAB
No ratings yet
Reconocimiento de Voz - MATLAB
5 pages
Digital Signal Processing "Speech Recognition": Paper Presentation On
No ratings yet
Digital Signal Processing "Speech Recognition": Paper Presentation On
12 pages
Speaker Recognition System
No ratings yet
Speaker Recognition System
7 pages
Theory and Application of Digital Speech Processing by L. R. Rabiner and R. W. Schafer
No ratings yet
Theory and Application of Digital Speech Processing by L. R. Rabiner and R. W. Schafer
35 pages
Speech Recognition Using DSP PDF
No ratings yet
Speech Recognition Using DSP PDF
32 pages
Speaker Recognition
100% (1)
Speaker Recognition
15 pages
The Voice Recognition (
No ratings yet
The Voice Recognition (
6 pages
DSP Final Softcopy 444
No ratings yet
DSP Final Softcopy 444
3 pages
Discrete Time Processing of Speech Signa
No ratings yet
Discrete Time Processing of Speech Signa
12 pages
Super Listener: 2. Signal Processing
No ratings yet
Super Listener: 2. Signal Processing
4 pages
CCS369 - TSS-Unit 5
No ratings yet
CCS369 - TSS-Unit 5
23 pages
Word Recognition Device: C.K. Liang & Oliver Tsai ECE 345 Final Project TA: Inseop Lee Project Number: 22
No ratings yet
Word Recognition Device: C.K. Liang & Oliver Tsai ECE 345 Final Project TA: Inseop Lee Project Number: 22
21 pages
EE264 Final Project Report: Echai@stanford - Edu
No ratings yet
EE264 Final Project Report: Echai@stanford - Edu
17 pages
Term Paper ECE-300 Topic: - Speech Recognition
No ratings yet
Term Paper ECE-300 Topic: - Speech Recognition
14 pages
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
No ratings yet
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
6 pages
Speech Recognition Project
No ratings yet
Speech Recognition Project
33 pages
Performance Evaluation of MLP For Speech Recognition in Noisy Environments Using MFCC & Wavelets
No ratings yet
Performance Evaluation of MLP For Speech Recognition in Noisy Environments Using MFCC & Wavelets
5 pages
Application of Adaptive Filters
No ratings yet
Application of Adaptive Filters
12 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
35 pages
Speech Recognition With Artificial Neural Networks
No ratings yet
Speech Recognition With Artificial Neural Networks
6 pages
Implementation of Adaptive Filtering Algorithm For Speech Signal On FPGA
No ratings yet
Implementation of Adaptive Filtering Algorithm For Speech Signal On FPGA
5 pages
Speech Recognition Seminar
No ratings yet
Speech Recognition Seminar
19 pages
Artificial Intelligence For Speech Recognition
No ratings yet
Artificial Intelligence For Speech Recognition
9 pages
11-Speech Encryption and Decryption
No ratings yet
11-Speech Encryption and Decryption
13 pages
Basic Course Material Winter 2015
100% (1)
Basic Course Material Winter 2015
19 pages
Application of Deep Learning-based Speech Signal p
No ratings yet
Application of Deep Learning-based Speech Signal p
6 pages
Study On Speech Recognition Method of Artificial Intelligence Deep Learning
No ratings yet
Study On Speech Recognition Method of Artificial Intelligence Deep Learning
6 pages
Speech Recognition1
No ratings yet
Speech Recognition1
24 pages
Is 2016 7737405
No ratings yet
Is 2016 7737405
6 pages
Speech Recognition Algo
No ratings yet
Speech Recognition Algo
17 pages
Speech Recognition: A Complete Perspective: Ashok Kumar, Vikas Mittal
No ratings yet
Speech Recognition: A Complete Perspective: Ashok Kumar, Vikas Mittal
6 pages
Speech Recognition UTHM
No ratings yet
Speech Recognition UTHM
30 pages
Speech Generation
No ratings yet
Speech Generation
11 pages
Artificial Intelligence and Its Applicat
No ratings yet
Artificial Intelligence and Its Applicat
4 pages
Self Learning Speaker Identification A System For PDF
No ratings yet
Self Learning Speaker Identification A System For PDF
185 pages
Seminar Presentation: Topic: Speech Recognition
No ratings yet
Seminar Presentation: Topic: Speech Recognition
26 pages
Digital Signal Processing for Audio Applications: Volume 2 - Code
From Everand
Digital Signal Processing for Audio Applications: Volume 2 - Code
Anton R Kamenov
5/5 (1)
Digital Signal Processing for Audio Applications: Volume 1 - Formulae
From Everand
Digital Signal Processing for Audio Applications: Volume 1 - Formulae
Anton R Kamenov
No ratings yet
Human Visual System Model: Understanding Perception and Processing
From Everand
Human Visual System Model: Understanding Perception and Processing
Fouad Sabry
No ratings yet
Handbook of Time Series Analysis: Recent Theoretical Developments and Applications
From Everand
Handbook of Time Series Analysis: Recent Theoretical Developments and Applications
Björn Schelter
No ratings yet
Programming Concepts in Java
From Everand
Programming Concepts in Java
Robert Burns
No ratings yet
Audio Visual Speech Recognition: Advancements, Applications, and Insights
From Everand
Audio Visual Speech Recognition: Advancements, Applications, and Insights
Fouad Sabry
No ratings yet
Programming Concepts in Python
From Everand
Programming Concepts in Python
Robert Burns
No ratings yet
Handbook of Holographic Interferometry: Optical and Digital Methods
From Everand
Handbook of Holographic Interferometry: Optical and Digital Methods
Thomas Kreis
No ratings yet
The Satisfiability Problem: Algorithms and Analyses
From Everand
The Satisfiability Problem: Algorithms and Analyses
Uwe Schöning
No ratings yet
Advanced Plasma Technology
From Everand
Advanced Plasma Technology
Riccardo d'Agostino
No ratings yet
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
From Everand
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
Robert Johnson
No ratings yet
သီတင္းကြၽတ္လျပည့္ေန႔ PDF
No ratings yet
သီတင္းကြၽတ္လျပည့္ေန႔ PDF
76 pages
Advanced Answering A Question - A Great Accomplishment PDF
No ratings yet
Advanced Answering A Question - A Great Accomplishment PDF
1 page
အထင္မွား အျမင္မွား PDF
100% (1)
အထင္မွား အျမင္မွား PDF
38 pages
015-ေဗာဇၩဂၤသုတ္ အႏွစ္ခ်ဳပ္ PDF
No ratings yet
015-ေဗာဇၩဂၤသုတ္ အႏွစ္ခ်ဳပ္ PDF
73 pages
(Judy L. Hasday) Aung San Suu Kyi (Modern Peacemak PDF
No ratings yet
(Judy L. Hasday) Aung San Suu Kyi (Modern Peacemak PDF
129 pages
016 ဝဋ္ေၾကြးမ်ား PDF
No ratings yet
016 ဝဋ္ေၾကြးမ်ား PDF
68 pages
006 အနတၱလကၡဏသုတ္အႏွစ္ခ်ဳပ္ PDF
0% (1)
006 အနတၱလကၡဏသုတ္အႏွစ္ခ်ဳပ္ PDF
53 pages
020 ဓမၼေၾကးမုံ PDF
100% (1)
020 ဓမၼေၾကးမုံ PDF
85 pages
012 ဘဝဝန္ထုပ္ဝန္ပိုးမ်ား PDF
No ratings yet
012 ဘဝဝန္ထုပ္ဝန္ပိုးမ်ား PDF
48 pages
015-ေဗာဇၩဂၤသုတ္ အႏွစ္ခ်ဳပ္
100% (1)
015-ေဗာဇၩဂၤသုတ္ အႏွစ္ခ်ဳပ္
73 pages
041-ေမတၱာႏွလုံး အစဥ္သုံး
No ratings yet
041-ေမတၱာႏွလုံး အစဥ္သုံး
48 pages
024-ဟိုဘက္ကမ္းက ၿငိမ္းခ်မ္းတယ္
No ratings yet
024-ဟိုဘက္ကမ္းက ၿငိမ္းခ်မ္းတယ္
60 pages
အထင္မွား အျမင္မွား
No ratings yet
အထင္မွား အျမင္မွား
76 pages
Voice Controlled Home Automation
No ratings yet
Voice Controlled Home Automation
4 pages
Future Tense Exercise 16
No ratings yet
Future Tense Exercise 16
1 page
Speech To Text
No ratings yet
Speech To Text
6 pages
Ra 10575
No ratings yet
Ra 10575
8 pages
Yu Cheats Bot Nitro Type
No ratings yet
Yu Cheats Bot Nitro Type
32 pages
ECB - Financial Stability Review May 2015
No ratings yet
ECB - Financial Stability Review May 2015
173 pages
Amot Uk Iso 9001
No ratings yet
Amot Uk Iso 9001
1 page
Motivation
No ratings yet
Motivation
27 pages
Trigana
No ratings yet
Trigana
2 pages
HORIZON BQ470 - e
No ratings yet
HORIZON BQ470 - e
5 pages
Bayabas
No ratings yet
Bayabas
5 pages
MSDS CR 3S 鋁觸媒
No ratings yet
MSDS CR 3S 鋁觸媒
5 pages
Boston Consulting Group: Portfolio Analysis Matrix
No ratings yet
Boston Consulting Group: Portfolio Analysis Matrix
16 pages
An Overview of Chelioscopy & Palatoscopy: An Aid in Forensic Investigation
No ratings yet
An Overview of Chelioscopy & Palatoscopy: An Aid in Forensic Investigation
7 pages
CSS 11 - Module 4 - OH&S Guidelines Wiring Diagram - No Answer
No ratings yet
CSS 11 - Module 4 - OH&S Guidelines Wiring Diagram - No Answer
13 pages
Final Math Problem Set
No ratings yet
Final Math Problem Set
12 pages
LO ECP English-Version
No ratings yet
LO ECP English-Version
2 pages
Supermarket Management Design Report
No ratings yet
Supermarket Management Design Report
37 pages
Forouzan6e ch01 PPTs Accessible
No ratings yet
Forouzan6e ch01 PPTs Accessible
77 pages
Day 5 7 Reiki Recipes Ebook
No ratings yet
Day 5 7 Reiki Recipes Ebook
8 pages
DLL Els Quarter 1 Week 1
No ratings yet
DLL Els Quarter 1 Week 1
4 pages
Introduction of Blu-Ray Disc
No ratings yet
Introduction of Blu-Ray Disc
17 pages
Aluminium Powder
No ratings yet
Aluminium Powder
5 pages
Probability Distribution PDF
No ratings yet
Probability Distribution PDF
44 pages
TY Sem 5 & 6 Provisional
No ratings yet
TY Sem 5 & 6 Provisional
1 page
The Concept of Love in Islam in The Quran
80% (5)
The Concept of Love in Islam in The Quran
13 pages
Edwards. Shondrell MGT 6685 Phizer Case Analysis
100% (1)
Edwards. Shondrell MGT 6685 Phizer Case Analysis
19 pages
TCOM 111 The Teacher and The Community, School Culture
No ratings yet
TCOM 111 The Teacher and The Community, School Culture
31 pages

Speech Recognition Using A DSP: Lunds Universitet

Uploaded by

Speech Recognition Using A DSP: Lunds Universitet

Uploaded by

Lunds universitet

Algorithms in Signal Processors

Speech Recognition using a DSP

5 Discussion and conclusion 11

2.1 Basic Speech Recognition

2.2 Unvoiced and voiced speech

2.3 Reflection coefficients and the Schur Algorithm

yt = −a1 yt−1 − a2 yt−2 − ... − ap yt−p + et (1)

where et is white Gaussian noise, the filter koefficients are represented by ai , i =

A Digital Signal Processor (DSP) is a purpose-built device specialized in processing data

2.5 Filtering the sampled signal

Figure 1: Magnitude response of the high pass filter

Figure 2: Magnitude response of the pre-emphasis filter

Figure 3: Magnitude response of the two filters above combined

Figure 4: Magnitude response of the used filter

where Dk , k = 1, 2, 3, 4 represent the different database matrices, and where R repre-

Figure 5: Flow chart of the general design. y = yes, n = no

Mikael Swartling Screen Blackboard Total

Table 1: The amount of correct matches/tries without validation

Mikael Swartling Screen Blackboard Jacket DSP

Table 2: The amount of correct matches/tries using validation on personal database.

5 Discussion and conclusion

You might also like