0% found this document useful (0 votes)
49 views22 pages

Work 3

The document discusses the history and development of voice recognition technology over the past 50 years. It provides details on early voice recognition systems from the 1950s and discusses major advances and applications through today. The document also explains the differences between voice recognition and speech recognition as well as how voice recognition systems work.

Uploaded by

mwarubwaj000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views22 pages

Work 3

The document discusses the history and development of voice recognition technology over the past 50 years. It provides details on early voice recognition systems from the 1950s and discusses major advances and applications through today. The document also explains the differences between voice recognition and speech recognition as well as how voice recognition systems work.

Uploaded by

mwarubwaj000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

GROUP ASSIGNMENT NO 3

GROUP NUMBER 15
• History of voice recognition
• Voice recognition is the process of converting a voice into digital data. The
technology first appeared about 50 years ago, but it has become really popular in
recent years. In this article, we will look at what this technology is and how it works.
We will tell you how it is used in some industries and introduce you to some well-
known voice/speech recognition solutions
• Voice recognition technology has grown exponentially over the past five decades.
Dating back to 1976, computers could only understand slightly more than 1,000
words. That total jumped to roughly 20,000 in the 1980s as IBM continued to
develop voice recognition technology.

• In 1952, Bell Laboratories invented AUDREY -- the Automatic Digit Recognizer


-- which could only understand the numbers zero through nine. In the early to
mid-1970s, the U.S. Department of Defense started contributing toward speech
recognition system development, funding the Defense Advanced Research
Projects Agency Speech Understanding Research. Harpy, developed by Carnegie
Mellon, was another voice recognition system at the time and could recognize up
to 1,011 words.
• The company Dragon in 1990 launched the first speaker recognition
product for consumers, Dragon Dictate. This was later replaced by Dragon
NaturallySpeaking from Nuance Communications. In 1997, IBM
introduced IBM ViaVoice, the first voice recognition product that could
recognize continuous speech.
• Apple introduced Siri in 2011, and it's still a prominent voice recognition
assistant. In 2016, Google launched its Google Assistant for phones. Voice
recognition systems can be found in devices including phones, smart
speakers, laptops, desktops and tablets as well as in software like Dragon
Professional and Philips SpeechLive.

• During this past decade, several other technology leaders have developed
more sophisticated voice recognition software, such as Amazon Alexa, for
example. Released in 2014, Amazon Alexa also acts as a personal assistant
that responds to voice commands. Currently, voice recognition software is
available for Windows, Mac, Android, iOS and Windows phone devices
What is about a voice recognition

• Voice or speaker recognition is the ability of a program to identify a person based on their unique
voiceprint. It works by scanning the speech and establishing a match with the desired voiceprint. The
development of AI opened up extensive opportunities for this subfield of computer science. It enables
us to interact with machines without touching them. It is growing rapidly, and developers are finding
more and more ways to apply it in various fields.
• Voice or speaker recognition is the ability of a machine or program to receive and interpret dictation or
to understand and perform spoken commands.
• Voice recognition systems let consumers interact with technology simply by speaking to it, enabling
hands-free requests, reminders and other simple tasks.
• Voice recognition can identify and distinguish voices using automatic speech recognition (ASR)
software programs. Some ASR programs require users first train the program to recognize their voice
for a more accurate speech-to-text conversion. Voice recognition systems evaluate a voice's
frequency, accent and flow of speech.
IS THERE ANY DIFFERENCE BETWEEN VOICE RECOGNITION AND SPEECH RECOGNITION
• It is essential to understand the differences between these two things. The purpose of voice recognition
is to identify the voice owner. Speech recognition's purpose is to identify the words of the speaker. In
the first case, the program needs a unique voiceprint of the speaker for comparison. In the second case,
the program needs a huge dictionary to identify the speaker's words.

• While speech recognition translates anyone’s voice, voice recognition is a biometric system that
recognizes and authenticates a specific user’s voice.

• It analyzes the unique features of a person’s voice, including pitch, tone, and rhythm, to create a unique
voiceprint for identification.

• This technology is often used for security purposes, such as unlocking mobile devices or accessing
systems.

• Although voice recognition and speech recognition are referred to interchangeably, they aren't the same,
and a critical distinction must be made. Voice recognition identifies the speaker, whereas speech
recognition evaluates what is said.
TYPES OF VOICE RECOGNITION SYSTEMS
• Voice recognition has two categories, they are:
 Text-Dependent — The system is trained to recognize predetermined voice
passphrases by the speaker;
 Text Independent — It doesn't require predetermined passphrases. The
subject of the analysis is conversational speech.
TYPES OF SPEECH RECOGNITION SYSTEMS
We can classify Automatic Speech Recognition (ASR) into different categories.
First of all, it relies on the speaker. From this side, two types are known, they are:
 Speaker Dependent — The program is trained to recognize a specific voice, similar to voice
recognition. The speaker must “talk” to the program and give it the ability to analyze the voice.
Such systems are easier to implement. They provide high accuracy in speech recognition;
 Speaker Independent — This type of speech recognition software has wider usage. It doesn't
require training to analyze the voice. The emphasis is on the speaker's word recognition.
Typical examples of such programs are IVR systems.
The other method of categorization is based on how the user speaks. Those categories are:
 Discrete Speech Recognition — ASR applications have used this method since the early
versions. Тhe speaker must pronounce each word separately, inserting pauses between them.
With such programs, it is more difficult to work. It isn't easy to ensure the frequency of spoken
words;
 Continuous Speech Recognition — This is a relatively new method of ASR and requires more
effort to develop. The speaker's speech rate is close to normal in this case.
• How does voice recognition work?
Voice recognition uses technology to evaluate the biometrics of your voice.
That includes the frequency and flow of your voice, as well as your accent.
Every word you speak is broken up into segments of several tones. This is then
digitised and translated to create your own unique voice template.
Artificial intelligence, deep learning, and machine learning are the forces
behind speech recognition. Artificial intelligence is used to understand the
colloquialisms, abbreviations, and acronyms we use. Machine learning then
pieces together the patterns and develops from this data using neural networks.
Voice recognition software on computers requires analog audio to be
converted into digital signals, known as analog-to-digital (A/D) conversion.
For a computer to decipher a signal, it must have a digital database of words or
syllables as well as a quick process for comparing this data to signals.
• A voice recognition program runs many times faster if the entire vocabulary can be loaded into RAM
compared to searching the hard drive for some of the matches. Processing speed is critical, as it affects
how fast the computer can search the RAM for matches.
• Audio also must be processed for clarity, so some devices may filter out background noise. In some
voice recognition systems, certain frequencies in the audio are emphasized so the device can recognize
a voice better.

• Voice recognition systems analyze speech through one of two models: the hidden Markov model and
neural networks. The hidden Markov model breaks down spoken words into their phonemes, while
recurrent neural networks use the output from previous steps to influence the input to the current step.

• As uses for voice recognition technology grow and more users interact with it, the organizations
implementing voice recognition software will have more data and information to feed into
neural networks for voice recognition systems. This improves the capabilities and accuracy of voice
recognition products.
• The popularity of smartphones opened up the opportunity to add voice recognition technology into
consumer pockets, while home devices -- such as Google Home and Amazon Echo -- brought voice
recognition technology into living rooms and kitchens.
Voice recognition uses
• The uses for voice recognition have grown quickly as AI,
machine learning and consumer acceptance have matured. Examples
of how voice recognition is used include the following:
 Virtual assistants. Siri, Alexa and Google virtual assistants all
implement voice recognition software to interact with users. The way
consumers use voice recognition technology varies depending on the
product. But they can use it to transcribe voice to text, set up
reminders, search the internet and respond to simple questions and
requests, such as play music or share weather or traffic information.
 Smart devices. Users can control their smart homes – including smart
thermostats and smart speakers -- using voice recognition software.
 Automated phone systems. Organizations use voice recognition with
their phone systems to direct callers to a corresponding department by
saying a specific number.
 Conferencing. Voice recognition is used in live captioning a speaker
so others can follow what is said in real time as text.
 Bluetooth. Bluetooth systems in modern cars support voice recognition to help
drivers keep their eyes on the road. Drivers can use voice recognition to perform
commands such as "call my office."
 Dictation and voice recognition software. These tools can help users dictate and
transcribe documents without having to enter text using a physical keyboard or
mouse.
 Government. The National Security Agency has used voice recognition systems
dating back to 2006 to identify terrorists and spies or to verify the audio of anyone
speaking.
Voice recognition advantages and disadvantages
Voice recognition offers numerous benefits:
 Consumers can multitask by speaking directly to their voice assistant
or other voice recognition technology.
 Users who have trouble with sight can still interact with their devices.
 Machine learning and sophisticated algorithms help voice recognition
technology quickly turn spoken words into written text.
 This technology can capture speech faster than some users can type.
This makes tasks like taking notes or setting reminders faster and
more convenient.
 Increases the productivity of businesses;
 Automates the interaction between the businesses and customers;
 Adds an extra security level;
 Captures speech faster than a human can type;
Helps people with disabilities;

Helps control your home devices;

Assists drivers with in-car ASR systems and more.


Some disadvantages of the technology include the following:
 Background noise can produce false input.
 While accuracy rates are improving, all voice recognition systems and
programs make errors.
 There's a problem with words that sound alike but are spelled
differently and have different meanings -- for example, hear and here.
This issue might be largely overcome using stored contextual
information. However, this requires more RAM and faster processors.
 Systems can't fully recognize speech if the speaker speaks quickly
and not clearly;
 Large vocabularies are required to improve recognition accuracy;
 Each language requires separate training for ASR;
 Businesses can collect and use the user's voice data without their
permission;Time and financial costs are high;
 ASR software consumes a lot of memory and requires a large amount
of RAM.
Modern ASR systems are based on three models: acoustic, pronunciation, and language

i. Acoustic modeling makes it possible to distinguish between the voice signal and the
phonemes(a unit of sound). Hidden Markov Model (HMM) is a common acoustic modeling
approach. Other approaches use deep neural networks or convolutional neural networks, etc.;
ii. The pronunciation model defines how phonemes can be combined to make words;
iii. Language modeling is a discipline that helps distinguish between words and phrases that
sound the same.
• After recording the speech, the noise is cleared, and the useful signal is filtered from the
recording. Тhe record is divided into small fragments. After that, each fragment is passed
through the acoustic model. These fragments are compared to the phonemes, an initially built
statistical model that describes the pronunciation of each sound in speech. Based on these
matches, words are collected from phonemes. Тhe efficiency of finding words strongly depends
on the size of the pre-prepared phoneme database
Challenges of Voice Recognition Technology

Accuracy and Precision

• Voice recognition faces challenges in both accuracy and precision.


Accuracy refers to how well the software recognizes spoken words
and transcribes them correctly. In contrast, precision refers to how
well the software can distinguish between similar-sounding words or
phrases.

• For example, if someone says “there” instead of “their,” the software


must be able to recognize the correct word based on the context of the
sentence. This requires a high level of precision.
Noise and Disturbances
• Background noise, such as traffic, construction work, or conversations in the
vicinity, can interfere with the user’s voice signal, making it difficult for the
software to distinguish the spoken words.

• Similarly, disturbances in the environment, such as a sudden loud noise, can


cause errors in the speech recognition process.

• To overcome these challenges, speech recognition software uses various


techniques, such as noise cancellation algorithms, to filter out background noise
and enhance the accuracy of the user’s voice signal.

• However, these methods are not fool-proof and may only work effectively in
some situations. Therefore, it is essential to use speech recognition technology in
a controlled and quiet environment to ensure optimal performance.
Language and Accent Barriers
• While speech recognition systems have come a long way in accurately recognizing
spoken language, they still need help understanding accents and dialects that deviate
significantly from the standard language models they were trained on.

• This can be particularly problematic in multicultural or multilingual environments where


different accents and dialects are prevalent.

• For example, an English-speaking speech recognition system trained in American


English may have difficulty accurately recognizing the accents of speakers from other
English-speaking countries, such as the United Kingdom, Australia, or India.
• In addition, speech recognition systems may also struggle with languages that have
unique phonetic features or use tonal distinctions, such as Mandarin or Cantonese.
• These languages require more advanced language models and algorithms to recognize
spoken words and phrases accurately.
Privacy and Security
• Speech recognition systems often process sensitive and personal information, such as
passwords, credit card numbers, and private conversations. Therefore, protecting users’ data
privacy and preventing unauthorized access is crucial.

• One of the primary privacy concerns with speech recognition is data collection and storage.
Voice recordings may contain sensitive information, and the storage and use of these
recordings can pose a risk to user privacy if not handled correctly.

• Moreover, speech recognition technology may also face security challenges related to
malicious attacks or breaches that could compromise sensitive data.

• For instance, a hacker could gain access to a voice-controlled device or system and use it to
gather information, such as login credentials or financial information.
• To address these challenges, developers of speech recognition technology must incorporate
privacy and security features in their products, such as encryption, secure data storage, and
user control over data collection and deletion.
THANKS

You might also like