Project in DSP Using Python
Project in DSP Using Python
Recognition of words
This will sound familiar to anyone who has owned a smartphone in the last
decade. I can’t remember the last time I took the time to type out the entire
query on Google Search. I simply ask the question – and Google lays out the
entire weather pattern for me.
It saves me a ton of time and I can quickly glance at my screen and get back
to work. A win-win for everyone! But how does Google understand what I’m
saying? And how does Google’s system convert my query into text on my
phone’s screen?
This is where the beauty of speech-to-text models comes in. Google uses a
mix of deep learning and Natural Language Processing (NLP) techniques to
parse through our query, retrieve the answer and present it in the form of both
audio and text.
The same speech-to-text concept is used in all the other popular speech
recognition technologies out there, such as Amazon’s Alexa, Apple’s Siri, and
so on. The semantics might vary from company to company, but the overall
idea remains the same.
So in this article, I will walk you through the basics of speech recognition
systems (AKA an introduction to signal processing). We will then use this as
the core when we implement our own speech-to-text model from scratch in
Python.