Proj_Audio
Proj_Audio
ON
1|Page
CERTIFICATE
2|Page
ACKNOWLEDGEMENTS
I would also like to thank all the people at the A.K.Choudhury School of
Information Technology for accommodating me and their support during the
tenure of my project.
3|Page
ABSTRACT
This project aims to come up with novel algorithms and efficient hardware
implementations of signal processing front ends for automatic speech
recognition. The initial phase of the project has been completed. The
architectures of various Field Programmable Devices were studied and the
architecture of the Xilinx Field Programmable Gate Arrays (FPGA) was
analyzed in detail. After having learnt to use the Xilinx System Generator Tool
for DSP, I am currently implementing existing signal processing algorithms on
FPGA. The Hardware Description Language being used is VHDL. In the future,
we plan to implement several novel algorithms related to feature extraction on
hardware and determine their advantages.
4|Page
TABLE OF CONTENTS
Introduction................................................................................................6
Programmable Logic..................................................................................7
Architecture of FPGAs.............................................................................10
Special FPGA functions...........................................................................11
Xilinx System Generator Tool for DSP...................................................11
Speech Recognition Frontend..................................................................12
Proposed Feature Extraction Algorithm..................................................12
Hardware Implementation of Audio Processing Algorithms.................14
Conclusion and Future Work...................................................................15
References................................................................................................16
5|Page
INTRODUCTION
Audio Processing has been an active area of research for many years. Audio
Processing, especially speech recognition, finds widespread applications in
voice dialing on cell phones, robotics and desktop software packages. Although
commercial speech recognizers are still extremely limited in their abilities, the
very best research systems are now approaching their ultimate goal—large-
vocabulary, continuous, speaker independent, real-time recognition. These
systems are large vocabulary in that they can handle on the order of 60,000
words; continuous in that they recognize natural human speech, spoken without
deliberate pauses; and speaker-independent in that no user-specific training is
required [1]. Unfortunately, these systems are also extremely computationally
intensive, requiring the full processing resources of a modern desktop machine
in order to run in real-time. This completely rules out high-quality speech
recognition for many of the applications where one might want it most—in
particular, for mobile applications.
6|Page
Programmable Logic
Programmable logic can be defined as an integrated circuit that can be
programmed /reprogrammed with a digital logic of a certain level. The concept
of programmable logic first originated in the late 70s and the field is constantly
growing thereafter. Some of the advantages that programmable logic offers are:
Reconfigurable
Flexible to changes
Processors
Instruction Flexibility
90% Area Overhead
(Cache , Predictions)
FPGA
Device-wide flexibility
99% Area Overhead
(Configuration)
ASIC
No Flexibility
20% Area Overhead
(Testing)
8|Page
CPLD — a more Complex PLD that consists of an arrangement of multiple
SPLD-like blocks on a single chip. Alternative names sometimes adopted for
this style of chip are Enhanced PLD (EPLD), Super PAL, Mega PAL, and
others.
PLDs had the option of being programmed in batches in a factory or in the field
(field programmable). However, one of their disadvantages was that
programmable logic was hard-wired between logic gates.
9|Page
Architecture of FPGAs
The field programmable gate arrays are mainly composed of:
y
a
b N Input
LUT
SET
c MUX D Q q
d
CLR Q
clk
rst
10 | P a g e
Special FPGA Functions
Modern FPGAs often include higher level functionality embedded into silicon.
Some of the specialized functions include:
Internal SRAM
Embedded CPUs
PLLs
These functions are directly implemented (not through LUTs, as is the case with
normal combinational circuits).
The DSP block is a highly useful function that is extensively used in my project.
Various signal processing algorithms like Fast Fourier Transform, Filtering,
MAC blocks etc. can be easily and efficiently implemented through this block.
One of the most useful tools for the implementation of DSP algorithms on
Xilinx FPGAs is the Xilinx System Generator Tool.
Develop highly parallel systems with the industry’s most advanced FPGAs
Provide system modeling and automatic code generation from Simulink® and
MATLAB
Integrate RTL, embedded, IP, MATLAB and hardware components of a DSP
system
11 | P a g e
Some of its key features are:
DSP modelling
Bit and cycle accurate floating and fixed-point implementation
Automatic code generation of VHDL or Verilog from Simulink
Hardware co-simulation
Xilinx Power Analyzer (XPA) Integration
Hardware / software co-design of embedded systems
12 | P a g e
13 | P a g e
Hardware implementation of Audio
Processing Algorithms
The acoustic front end is all DSP. Therefore, DSP algorithms such as FFT,
DCT, filtering, windowing etc. are often encountered. The System Generator
tool comes in handy in these situations and hence is extensively used. The
behavioural, structural and dataflow styles of modelling of the VHSIC
Hardware Description Language (VHDL) are often combined in the various
designs. Several improvements have been suggested and work on them is
underway. These improvements include:
1) Use of the same FFT block for calculating the FFT (Needed for MFCC) as
well as Auto-Correlation Coefficients (needed for LPC) employing the Wiener–
Khinchin theorem, thus saving considerable amount of Hardware.
3) Use of rectangular filter bank for MFCC, which will completely avoid
multiplication and will require less memory when compared to the existing
designs.
4) Use of weighted sum of MFCC and LPCC for template matching using
HMM / DTW / ANN. Results from the software implementation of the design
in Matlab have shown that at approx. 0.6*MFCC + 0.4*LPCC, the maximum
accuracy is achieved. Therefore, this particular weighted sum of the coefficients
is implemented using a carry-save adder.
14 | P a g e
Conclusion and Future Work
This project on the Hardware implementation of Audio Processing Algorithms,
though still in its initial stage, has helped me a lot in understanding the internal
architecture of FPGAs and the implementation of various Signal Processing
Algorithms on them. I have also learnt the basics of VHDL and have become
familiar with the Xilinx System Generator Tool for DSP. In future, we intend to
implement on hardware, several novel algorithms related to Speech Feature
Extraction that I have developed and determine their advantages over the
existing architectures.
15 | P a g e
References
[1] Lin, Edward C., Kai Yu, Rob A. Rutenbar, and Tsuhan Chen. "Moving
speech recognition from software to silicon: the in silico vox project." In
INTERSPEECH. 2006.
[3] Bhasker, Jayaram, and Jayaram Bhasker. A VHDL Primer. Prentice Hall
PTR, 1999.
[4] Brown, Stephen, and Jonathan Rose. "FPGA and CPLD architectures: A
tutorial." Design & Test of Computers, IEEE 13, no. 2 (1996): 42-57.
[5] Xilinx System Generator for DSP User Guide, r10.1.1, April 2008
16 | P a g e