Work 3
Work 3
GROUP NUMBER 15
• History of voice recognition
• Voice recognition is the process of converting a voice into digital data. The
technology first appeared about 50 years ago, but it has become really popular in
recent years. In this article, we will look at what this technology is and how it works.
We will tell you how it is used in some industries and introduce you to some well-
known voice/speech recognition solutions
• Voice recognition technology has grown exponentially over the past five decades.
Dating back to 1976, computers could only understand slightly more than 1,000
words. That total jumped to roughly 20,000 in the 1980s as IBM continued to
develop voice recognition technology.
• During this past decade, several other technology leaders have developed
more sophisticated voice recognition software, such as Amazon Alexa, for
example. Released in 2014, Amazon Alexa also acts as a personal assistant
that responds to voice commands. Currently, voice recognition software is
available for Windows, Mac, Android, iOS and Windows phone devices
What is about a voice recognition
• Voice or speaker recognition is the ability of a program to identify a person based on their unique
voiceprint. It works by scanning the speech and establishing a match with the desired voiceprint. The
development of AI opened up extensive opportunities for this subfield of computer science. It enables
us to interact with machines without touching them. It is growing rapidly, and developers are finding
more and more ways to apply it in various fields.
• Voice or speaker recognition is the ability of a machine or program to receive and interpret dictation or
to understand and perform spoken commands.
• Voice recognition systems let consumers interact with technology simply by speaking to it, enabling
hands-free requests, reminders and other simple tasks.
• Voice recognition can identify and distinguish voices using automatic speech recognition (ASR)
software programs. Some ASR programs require users first train the program to recognize their voice
for a more accurate speech-to-text conversion. Voice recognition systems evaluate a voice's
frequency, accent and flow of speech.
IS THERE ANY DIFFERENCE BETWEEN VOICE RECOGNITION AND SPEECH RECOGNITION
• It is essential to understand the differences between these two things. The purpose of voice recognition
is to identify the voice owner. Speech recognition's purpose is to identify the words of the speaker. In
the first case, the program needs a unique voiceprint of the speaker for comparison. In the second case,
the program needs a huge dictionary to identify the speaker's words.
• While speech recognition translates anyone’s voice, voice recognition is a biometric system that
recognizes and authenticates a specific user’s voice.
• It analyzes the unique features of a person’s voice, including pitch, tone, and rhythm, to create a unique
voiceprint for identification.
• This technology is often used for security purposes, such as unlocking mobile devices or accessing
systems.
• Although voice recognition and speech recognition are referred to interchangeably, they aren't the same,
and a critical distinction must be made. Voice recognition identifies the speaker, whereas speech
recognition evaluates what is said.
TYPES OF VOICE RECOGNITION SYSTEMS
• Voice recognition has two categories, they are:
Text-Dependent — The system is trained to recognize predetermined voice
passphrases by the speaker;
Text Independent — It doesn't require predetermined passphrases. The
subject of the analysis is conversational speech.
TYPES OF SPEECH RECOGNITION SYSTEMS
We can classify Automatic Speech Recognition (ASR) into different categories.
First of all, it relies on the speaker. From this side, two types are known, they are:
Speaker Dependent — The program is trained to recognize a specific voice, similar to voice
recognition. The speaker must “talk” to the program and give it the ability to analyze the voice.
Such systems are easier to implement. They provide high accuracy in speech recognition;
Speaker Independent — This type of speech recognition software has wider usage. It doesn't
require training to analyze the voice. The emphasis is on the speaker's word recognition.
Typical examples of such programs are IVR systems.
The other method of categorization is based on how the user speaks. Those categories are:
Discrete Speech Recognition — ASR applications have used this method since the early
versions. Тhe speaker must pronounce each word separately, inserting pauses between them.
With such programs, it is more difficult to work. It isn't easy to ensure the frequency of spoken
words;
Continuous Speech Recognition — This is a relatively new method of ASR and requires more
effort to develop. The speaker's speech rate is close to normal in this case.
• How does voice recognition work?
Voice recognition uses technology to evaluate the biometrics of your voice.
That includes the frequency and flow of your voice, as well as your accent.
Every word you speak is broken up into segments of several tones. This is then
digitised and translated to create your own unique voice template.
Artificial intelligence, deep learning, and machine learning are the forces
behind speech recognition. Artificial intelligence is used to understand the
colloquialisms, abbreviations, and acronyms we use. Machine learning then
pieces together the patterns and develops from this data using neural networks.
Voice recognition software on computers requires analog audio to be
converted into digital signals, known as analog-to-digital (A/D) conversion.
For a computer to decipher a signal, it must have a digital database of words or
syllables as well as a quick process for comparing this data to signals.
• A voice recognition program runs many times faster if the entire vocabulary can be loaded into RAM
compared to searching the hard drive for some of the matches. Processing speed is critical, as it affects
how fast the computer can search the RAM for matches.
• Audio also must be processed for clarity, so some devices may filter out background noise. In some
voice recognition systems, certain frequencies in the audio are emphasized so the device can recognize
a voice better.
• Voice recognition systems analyze speech through one of two models: the hidden Markov model and
neural networks. The hidden Markov model breaks down spoken words into their phonemes, while
recurrent neural networks use the output from previous steps to influence the input to the current step.
• As uses for voice recognition technology grow and more users interact with it, the organizations
implementing voice recognition software will have more data and information to feed into
neural networks for voice recognition systems. This improves the capabilities and accuracy of voice
recognition products.
• The popularity of smartphones opened up the opportunity to add voice recognition technology into
consumer pockets, while home devices -- such as Google Home and Amazon Echo -- brought voice
recognition technology into living rooms and kitchens.
Voice recognition uses
• The uses for voice recognition have grown quickly as AI,
machine learning and consumer acceptance have matured. Examples
of how voice recognition is used include the following:
Virtual assistants. Siri, Alexa and Google virtual assistants all
implement voice recognition software to interact with users. The way
consumers use voice recognition technology varies depending on the
product. But they can use it to transcribe voice to text, set up
reminders, search the internet and respond to simple questions and
requests, such as play music or share weather or traffic information.
Smart devices. Users can control their smart homes – including smart
thermostats and smart speakers -- using voice recognition software.
Automated phone systems. Organizations use voice recognition with
their phone systems to direct callers to a corresponding department by
saying a specific number.
Conferencing. Voice recognition is used in live captioning a speaker
so others can follow what is said in real time as text.
Bluetooth. Bluetooth systems in modern cars support voice recognition to help
drivers keep their eyes on the road. Drivers can use voice recognition to perform
commands such as "call my office."
Dictation and voice recognition software. These tools can help users dictate and
transcribe documents without having to enter text using a physical keyboard or
mouse.
Government. The National Security Agency has used voice recognition systems
dating back to 2006 to identify terrorists and spies or to verify the audio of anyone
speaking.
Voice recognition advantages and disadvantages
Voice recognition offers numerous benefits:
Consumers can multitask by speaking directly to their voice assistant
or other voice recognition technology.
Users who have trouble with sight can still interact with their devices.
Machine learning and sophisticated algorithms help voice recognition
technology quickly turn spoken words into written text.
This technology can capture speech faster than some users can type.
This makes tasks like taking notes or setting reminders faster and
more convenient.
Increases the productivity of businesses;
Automates the interaction between the businesses and customers;
Adds an extra security level;
Captures speech faster than a human can type;
Helps people with disabilities;
i. Acoustic modeling makes it possible to distinguish between the voice signal and the
phonemes(a unit of sound). Hidden Markov Model (HMM) is a common acoustic modeling
approach. Other approaches use deep neural networks or convolutional neural networks, etc.;
ii. The pronunciation model defines how phonemes can be combined to make words;
iii. Language modeling is a discipline that helps distinguish between words and phrases that
sound the same.
• After recording the speech, the noise is cleared, and the useful signal is filtered from the
recording. Тhe record is divided into small fragments. After that, each fragment is passed
through the acoustic model. These fragments are compared to the phonemes, an initially built
statistical model that describes the pronunciation of each sound in speech. Based on these
matches, words are collected from phonemes. Тhe efficiency of finding words strongly depends
on the size of the pre-prepared phoneme database
Challenges of Voice Recognition Technology
• However, these methods are not fool-proof and may only work effectively in
some situations. Therefore, it is essential to use speech recognition technology in
a controlled and quiet environment to ensure optimal performance.
Language and Accent Barriers
• While speech recognition systems have come a long way in accurately recognizing
spoken language, they still need help understanding accents and dialects that deviate
significantly from the standard language models they were trained on.
• One of the primary privacy concerns with speech recognition is data collection and storage.
Voice recordings may contain sensitive information, and the storage and use of these
recordings can pose a risk to user privacy if not handled correctly.
• Moreover, speech recognition technology may also face security challenges related to
malicious attacks or breaches that could compromise sensitive data.
• For instance, a hacker could gain access to a voice-controlled device or system and use it to
gather information, such as login credentials or financial information.
• To address these challenges, developers of speech recognition technology must incorporate
privacy and security features in their products, such as encryption, secure data storage, and
user control over data collection and deletion.
THANKS