0% found this document useful (0 votes)
16 views

sem5_synopsis

The document is a synopsis report for a mini project titled 'VoiceMate: AI-powered personal assistant' submitted by students of Shivajirao S. Jondhale College of Engineering. It covers the project's objectives, literature survey, proposed system, and the significance of voice assistants in enhancing user interaction through AI and NLP technologies. The report also highlights the limitations of existing voice assistant systems and the potential for future improvements.

Uploaded by

Shivraj Chavan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

sem5_synopsis

The document is a synopsis report for a mini project titled 'VoiceMate: AI-powered personal assistant' submitted by students of Shivajirao S. Jondhale College of Engineering. It covers the project's objectives, literature survey, proposed system, and the significance of voice assistants in enhancing user interaction through AI and NLP technologies. The report also highlights the limitations of existing voice assistant systems and the potential for future improvements.

Uploaded by

Shivraj Chavan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Synopsis Report On

VoiceMate:AI-powered personal assistant

Submitted in partial fulfillment of the requirements of

T.E ARTIFICIAL INTELLIGENCE & MACHINE


LEARNING ENGINEERING
By

Snehal Barkale 02
Shivraj Chavan 06
Omkar Chendge 07

Name of the Mentor


Prof. Anita Shirture

Department of Artificial Intelligence & Machine


Learning
Shivajirao S. Jondhale College of Engineering Dombivli
(E)

University of Mumbai
(AY 2024-25)
CERTIFICATE
This is to certify that the Synopsis on Mini Project entitled “VoiceMate:AI-
powered personal assistant” is a bonafide work of Snehal Barkale (02), Shivraj
Chavan (06), Omkar Chendge (07) submitted to the University of Mumbai in
partial fulfillment for TE (Artificial Intelligence & Machine Learning Engineering)
semester V during the academic year 2024-25 as prescribed by University of
Mumbai.

Mentor
Prof. Anita Shirture

Prof Anita Shirture Dr. Renuka Deshpande Dr. Pramod Rodge

Project Coordinator Head of Department Principal


Mini Project Approval

This Synopsis on Mini Project entitled “VoiceMate:AI-powered personal


assistant” by Snehal Barkale (02), Shivraj Chavan (06), Omkar Chendge (07) is
approved for T.E. (Artificial Intelligence & Machine Learning Engineering) for
the academic year 2024-25.

Examiners

1………………………………………
(Internal Examiner Name & Sign)

2…………………………………………
(External Examiner name & Sign)

Date:

Place:
Contents
Abstract i

Acknowledgments ii

List of Abbreviations iii

List of Figures iii

1 Introduction 1
1.1 Introduction
1.2 Motivation
1.3 Problem Statement & Objectives
1.4 Organization of the Report

2 Literature Survey 4

2.1 Survey of Existing System/SRS


2.2 Limitation Existing system or Research gap
2.3 Mini Project Contribution

3 Proposed System (e.g. New Approach of Data Summarization) 17

3.1 Introduction
3.2 Architecture/ Framework
3.3 Algorithm and Process Design
3.4 Details of Hardware & Software
3.4 Experiment and Results for Validation and Verification
3.5 Analysis
3.6 Conclusion and Future work.

4 References 24
Abstract

A voice assistant is a type of artificial intelligence (AI) software application or


virtual assistant that is designed to respond to voice commands and interact with
users using natural language processing (NLP) technology. Voice assistants are
typically integrated into various devices and platforms, such as smartphones,
smart speakers, tablets, and even certain appliances, to provide users with hands-
free access to information, perform tasks, and control connected devices. The rise
of voice assistants represents a significant advancement in artificial intelligence
and human-computer interaction. Virtual assistants are designed to mimic human
interactions, enabling users to engage in natural conversations with these digital
entities. They can perform a wide range of tasks, including setting reminders,
scheduling appointments, answering questions, managing emails, and even
controlling smart home devices. Their adaptability and versatility make them an
indispensable tool for both individual users and businesses. Virtual assistants are
also becoming increasingly integrated into various devices and platforms,
including smartphones, smart speakers, and chatbots. They can understand user
preferences and tailor responses to specific needs, which fosters a more user
centric experience. This abstract delves into the technologies that power virtual
assistants, including machine learning, deep learning, and data analytics, which
enable them to continuously improve their performance and expand their
capabilities.
Acknowledgment

We sincerely wish to thank the project guide Prof. Anita Shirture for her
encouraging and inspiring guidance helped us to make our project a success. Our
project guide makes us endure with her expert guidance, kind advice and timely
motivation which helped us to determine our project.

We would like to thank our project coordinator Prof. Anita Shirture for all the
support we needed from her for our project.

We also express our deepest thanks to our HOD Dr. Renuka Deshpande who’s
benevolent helps us making available the computer facilities to us for project in
our laboratory and making it true success. Without her kind and keen co-operation
our project would have been stifled to standstill.

Lastly, we would like to thank our college principal Dr. Pramod Rodge for
providing lab facilities and permitting to go on with our project. We would also
like to thank our colleagues who helped us directly or indirectly during our project.
List of figures

Figure no. Title Page no.

3.2 Architecture 11

3.3 Algorithm 12

List of Abbreviations

1.AI Artificial intelligence

2.NLP Natural language processing

3.IOT Internet of Things

4.WI Web Intelligence

5.IROS Intelligent Robots and Systems

6. GUI Graphical User Interface

7. ICASSP International Conference on Acoustics, Speech and Signal


Processing
8. ASR Automatic Speech Recognition

9. NLG Natural Language Generation

10. STT Speech-to-Text

11. NLTK Natural Language Processing toolkit

12. SMTP Simple Mail Transfer Protocol


1. Introduction

1.1 Introduction
A voice assistant is a type of artificial intelligence (AI) software application or
virtual assistant. In the fast-paced world of today, the demand for efficiency and
convenience has led to the rise of virtual assistants, revolutionizing the way we
interact with technology and manage our daily tasks. A virtual assistant is a
computer program or application that uses artificial intelligence (AI) and natural
language processing (NLP) to provide users with a wide range of services and
support, often mimicking the role of a human personal assistant. These digital
companions have transformed the way we work, stay organized, and access
information. The concept of a virtual assistant can be traced back to the advent of
speech recognition and text-to-speech technology. Over the years, advancements
in machine learning, data analytics, and AI have allowed virtual assistants to
become increasingly sophisticated and versatile. These digital helpers are now
integrated into various devices and platforms, including smartphones, smart
speakers, smartwatches, and even cars, making them accessible to a wide range
of users.

Virtual assistants come in various forms and are often tailored to specific
applications and ecosystems. Some of the most popular virtual assistants include
Apple's Siri, Amazon's Alexa, Google Assistant, and Microsoft's Cortana. These
platforms can perform a multitude of tasks, such as answering questions, setting
reminders, sending messages, playing music, providing directions, and
controlling smart home devices.

The future of virtual assistants is incredibly promising. As AI technology


continues to evolve, virtual assistants are expected to become more personalized
and context-aware, providing users with increasingly tailored and proactive
assistance. They are likely to play a pivotal role in the development of smart cities,
healthcare, education, and various other sectors, making our lives more efficient
and convenient.
1.2 Motivation
Voice assistants are increasingly popular due to the convenience and efficiency
they offer to users. They allow hands-free operation, making it easy to perform
tasks like setting reminders, sending messages, or controlling smart home devices
using just voice commands. This convenience is especially valuable for
multitasking, as users can interact with technology while cooking, driving, or
working. Additionally, voice assistants enhance accessibility for people with
disabilities, such as those with limited mobility or vision impairments, enabling
them to interact with devices more easily.

For many, voice assistants also offer a personalized experience, adapting to


individual preferences by suggesting content, music, or news based on user
behaviour. Their integration with smart home devices further drives their appeal,
as they act as a central hub for controlling lights, thermostats, or security systems.

From the perspective of developers and companies, voice assistants provide a


powerful platform to engage users. They help businesses foster greater brand
loyalty by embedding services into users' daily routines, while also collecting
valuable data on user preferences and behaviours. This data can be used to improve
products or offer personalized services. Additionally, voice assistants open new
monetization avenues, such as voice commerce and subscription-based services.
As a frontier of natural user interfaces, voice technology represents the future of
human-computer interaction, offering intuitive, seamless engagement and driving
advancements in artificial intelligence.

1.3 Problem Statement


Design and develop a basic voice assistant that can recognize user commands and
perform simple tasks using natural language processing (NLP). The system should
be able to understand spoken commands, process them, and provide appropriate
responses or actions.
1.4 Organization of Report
When organizing a report for a voice assistant project, it's important to ensure the
structure is clear and flows logically to cover all relevant aspects comprehensively.
The report should begin with a Title Page, which includes the project title, the
names of the team members, the submission date, and the organization or
institution involved. Following this, a Table of Contents should be provided to list
all sections and subsections with corresponding page numbers, allowing readers to
easily navigate through the document.
In the Introduction, provide background information on voice assistant technology
and explain its growing relevance in today’s digital world. Clearly state the
Problem Statement of the project, detailing what the voice assistant is intended to
achieve and the problems it aims to solve.

We have also included Flowcharts to visually represent processes like voice


recognition, task execution, and response generation.
In the Development Process section, describe the methodology used to guide the
project, such as Agile or Waterfall. Explain the steps taken during development,
from planning to implementation and testing. Include any challenges faced during
the development phase and how they were addressed.
The report should conclude with a Testing and Evaluation section that discusses
the methods used to test the voice assistant's performance, user satisfaction, and
overall functionality. Include the results of these tests, as well as feedback gathered
during user testing. After this, a Conclusion and Future Work section should
summarize the project’s outcomes, the lessons learned, and any potential
improvements or features that could be added in future iterations.
Finally, include a References section for citing any sources, research, or technical
materials used throughout the report.
2. Literature Survey

2.1 Survey of Existing System/SRS

A survey of existing voice assistant systems focuses on the comparison of


popular voice assistants such as Amazon Alexa, Google Assistant, Apple Siri,
and Microsoft Cortana. These systems are evaluated based on their
technological foundations, user experience, conversational intelligence, and
applications across different domains. The key components of these systems
include natural language processing (NLP), speech recognition, and machine
learning algorithms, all of which enable them to interpret and respond to user
queries effectively.
The comparative analysis of these systems shows that each voice assistant
excels in different domains. Amazon Alexa dominates the smart home market,
offering the largest number of third-party integrations and custom skills. Google
Assistant leads in conversational intelligence and contextual understanding,
making it superior for general knowledge queries and dynamic conversation
handling. Apple Siri stands out for its focus on user privacy and device
integration, while Microsoft Cortana has refocused on enterprise solutions
rather than consumer voice services.
Assistant generally outperforms its competitors with superior natural language
understanding and contextual awareness. However, all systems face challenges
in handling conversational depth and multi-turn interactions. Privacy and
security are crucial concerns, especially for Alexa and Google Assistant, which
engage in continuous listening and data collection, prompting ongoing debates
about ethical data usage. Overall, each voice assistant serves unique user needs,
with Alexa leading in smart home automation, Google Assistant in productivity,
Siri in privacy, and Cortana in enterprise solutions.
SR AUTHOR NAME PAPER NAME DESCRIPTION
NO
1 Petukhova, Volha; "Conversational This paper discusses the
Bunt, Harry; Agents: Goals, challenges faced in natural
McGlashan, Scott; Technologies, and language processing (NLP) and
Sitter, Nathaniel; Challenges" human-computer interaction
Alexandersson, Jan (HCI).
2 Ramírez-Alcaraz, "Personal The paper examines voice
Marcos; Berrocal, Assistant for the assistants such as Amazon Alexa
Jesús; Merino, Internet of Things and Google Assistant.
Patricia; Canal, Era"
Carlos
3 Fang, Yiling; Ma, "Natural This paper explores deep
Kevin; Ng, Andrew Language learning techniques used in NLP
Processing and to build more sophisticated
Machine Learning virtual assistants.
in the
Development of
Virtual
Assistants"
4 Hoy, Matthew B. "The Anatomy of This paper delves into
Voice-Based improving the interaction
Assistants: User between users and systems like
Interaction and Siri, Alexa, and Cortana.
Machine Learning
Perspectives"
5 Moore, Robert J.; "Voice User The paper provides a
Arar, R. Interface Design: comparative analysis of popular
A Comparative voice assistants Alexa, Google
Analysis of Assistant, and Siri.
Amazon Alexa,
Google Assistant,
and Apple Siri"
6 Arora, Manan; "Improving User This study focuses on improving
Shah, Jignesh Experience in the user experience in voice-
Voice-Activated activated AI systems by
AI Systems incorporating personalization
through features.
Personalization"
7 Lau, Josephine; "Privacy in This paper addresses privacy
Zimmerman, Voice-Activated concerns related to the use of
Benjamin; Schaub, Digital voice assistants, evaluating
Florian Assistants" potential risks like continuous
listening and data usage.
8 Parker, Jeffrey; "Ethical Issues in This paper investigates the
Harper, Richard the Use of Voice ethical issues surrounding the
Assistants: Case use of voice assistants, using
Study of Alexa" Amazon Alexa as a case study.
9 Singh, Karan; "Advances in The paper explores how voice
Ramesh, V. Voice Assistants assistants are being integrated
for Healthcare into healthcare systems and
Applications" improving accessibility for
individuals with disabilities.
10 Xu, Qianli; John, "Multimodal This research focus on how
Ramesh Interaction in visual feedback, combined with
Voice Assistants: voice commands, enhances the
Enhancing User user experience.
Experience
through Visual
Feedback"

2.2 Limitations of existing systems

1. Privacy and Security Concerns

 Always Listening: Voice assistants require constant activation or are


always listening for wake words, which raises privacy concerns. Users
are concerned about unintended recordings or potential data leaks.
 Data Collection: These systems often send voice data to cloud servers for
processing, and some users worry about how their personal information
is stored, shared, and used by companies.

2. Lack of Personalization

 Generic Responses: Although some voice assistants attempt to


personalize interactions by recognizing user preferences or patterns, they
are still largely designed to offer one-size-fits-all responses
 Limited Adaptation to User Behaviour: Voice assistants generally don’t
learn and adapt significantly from user interactions. Personalization
remains superficial, often confined to reminders, preferences for apps
rather than building a deep understanding of the user’s lifestyle.
3. High Resource Consumption
 Processing Power: Voice assistants require significant computing power,
both locally (on devices like smartphones and smart speakers) and in the
cloud. This demand increases when performing complex tasks like voice
recognition, natural language understanding, and real-time responses.
 Battery Drain on Devices: Devices with voice assistants, especially
smartphones, can experience higher battery usage due to the constant
listening mode, which keeps microphones active in the background,
waiting for activation commands.

4. Limited Customization and Control

 Inflexibility in Commands: Most voice assistants offer a set of


predefined commands and responses. Users cannot extensively
customize how the assistant responds to specific queries or alter its
behaviour beyond basic settings.
 Lack of User-Controlled Features: Advanced customization options are
often locked behind developer tools, making it difficult for average users
to tailor the voice assistant to their exact needs. For example, users may
not be able to change how tasks are performed or create custom
workflows without significant technical know-how.

5. Cost Barrier for Advanced Capabilities

 Premium Features Locked Behind Paywalls: Some advanced


functionalities, such as integrating voice assistants with home
automation systems, require purchasing premium hardware or software.
For instance, unlocking certain smart home capabilities may require
expensive hub devices or subscription services.
 High Initial Setup Costs: Setting up a comprehensive smart home with
voice assistant integration can be costly. Smart speakers, lights,
thermostats, security systems, and other devices that are compatible with
voice assistants often come with a high price tag.
2.3 Mini project contribution

The voice assistance has been developed for users for educational, business and
for personal use. It has achieved the objectives and scope that were stated in this
project the project will achieve some of the below objectives:

It will have a proper Graphical User Interface (GUI). It can open chrome,
YouTube, Wikipedia, all windows applications, etc to search information and read
2 or 3 lines for the user from Wikipedia. It can open power point presentation. It
can tell us the current time. It can send mails, SMS. It can make phone calls. It can
play online music. It can predict weather. It will have a chat history keeping
feature. It will have a Face authentication system which will allow the program to
run only when it detects a face.
3. Proposed system

3.1 Introduction

Virtual assistant is software program that helps you ease your day-to-day tasks,
such as showing weather forecasting, playing music, etc. They can take commands
as voice or text. Voice based intelligent assistant need an invoking words or wake
words to active the listener, followed by commands. For my project the wake, up
word is “SOFIA”. Our voice assistant is designed to be used efficiently for all
users. This personal assistant software improves user’s productivity by managing
day to day tasks & providing information from online sources to users.

3.2 Architecture / Framework

Creating a voice assistant involves several components, including speech


recognition, natural language processing, and interaction design. Here's a
high-level architecture for a voice assistant project:

• Audio Input/Output:

o Microphone: To capture user voice commands.

o Speaker: To provide audio responses.

• Speech Recognition:

o Use a Speech-to-Text engine (ASR - Automatic Speech Recognition)


to convert spoken words into text. Popular ASR engines include
Google's Speech-to-Text, Microsoft Azure Speech Service, or open-
source solutions like Mozilla Deep-Speech.
• Natural Language Processing (NLP):

o Intent Recognition: Identify the user's intent from the transcribed


text. This involves understanding what the user wants to do or know.
o Named Entity Recognition (NER): Identify important entities like
dates, locations, and proper nouns in the user's command.

o Dialog Management: Keep track of the conversation context,


including the user's previous requests and responses.
• Knowledge Base and Data Sources:

o Store information or connect to external data sources to provide


answers to user queries. This can include APIs, databases, or web
scraping.
• Response Generation:

o Use the NLP results and the context to generate a meaningful


response.

o You can use pre-defined templates for common responses or


generate responses dynamically using NLG (Natural Language
Generation) techniques.
• Text-to-Speech (TTS):

o Convert the generated text response into speech using a Text-to-


Speech engine. Examples include Amazon Polly, Google Text-to-
Speech, or opensource TTS solutions.
• User Interface:

o Choose the platform for your voice assistant. It can be a mobile app,
a web-based interface, a smart speaker, or a custom hardware device.
The diagram shows the main process flow of how Voice Assistant works.
Fig 3.2 Voice Assiatant Framework

3.3 Algorithm and Process Design

 Module 1: Speech to text

In this module, the person’s commands are converted from speech to text using the
Google Speech to Text Cloud API. Google Speech to Text Cloud API transcribes
the speech file using the most advanced deep learning neural network algorithms
for automatic speech recognition (ASR) and returns the text statement. Google
Speech to Text Cloud API is one of the simplest methods for recognizing speech
and can analyse up to 1 min of voice data.

 Module 2: Text analysis and classification

This module is responsible for understanding the correct command from the text
generated by the Google API and then confirming it with a human to execute the
desired action. Because of the uncertainties in human language, it is extremely
challenging to create software that correctly ascertains the text’s intended meaning,
so NLP is used in this module for manipulating and recognizing the text. NLP
deconstructs the text into small units to assist the computer in understanding the
ingesting text. Different libraries and algorithms are proposed for NLP, such as the
Natural Language Processing toolkit (NLTK).
 Module 3: Command execution

Command execution in a voice assistant follows a series of steps, starting with


voice input capture, where the user speaks a command into a microphone-equipped
device (e.g., "Turn on the living room lights"). This spoken input is then converted
into text through a Speech-to-Text (STT) engine. Once the speech is converted to
text, the system processes it using Natural Language Processing (NLP). The NLP
system performs intent recognition to understand the action the user wants to
perform (e.g., turning on a device) and entity extraction to identify any specific
details or parameters related to the command (e.g., identifying the "living room"
and "lights" as key details).

Fig 3.3 Voice Assistant Flowchart

3.4 Details of Hardware and Software

Software Requirements:

• Operating System
• Speech Recognition Software
• Text-to-Speech (TTS) Software
• Local Databases
• Development Environment

Hardware Requirements:

• Microphone
• Speakers
• Processing unit (CPU/GPU)
• Storage (HDD/SSD)
• Memory (RAM)
• Power supply

3.5 Experiment and actual results

The expected result of our project is we will be developing a voice assistant that
will be useful in educational purposes, business, personal use, etc.

• It can open google


• It can open YouTube
• It can search on Wikipedia and read 2 lines for the user
• It can play music
• It can send mails
• It can tell current time.
• It can open our presentation.
• It will have a proper Graphical User Interface (GUI).
• It can open all windows applications.
• It can make phone calls.
• It can predict weather.
• It will have a chat history keeping feature.
• It will have a Face authentication system.
Fig 3.5 GUI of Voice Assistant

3.6 Analysis

As we know Python is a suitable language for scriptwriters and developers. The


query for the assistant can be manipulated as per the user’s need.

Modules needed

 Pyttsx3: This module is used for the conversion of text to speech in a


program it works offline. To install this module type the below command in
the terminal.
 Wikipedia: As we all know Wikipedia is a great source of knowledge to get
information from Wikipedia or to perform a Wikipedia search. To install this
module type the below command in the terminal.
 Speech Recognition: Since we’re building an application of voice assistant,
one of the most important things in this is that your assistant recognizes your
voice (means what you want to say/ ask). To install this module type the
below command in the terminal.
 Web browser: To perform Web Search. This module comes built-in with
Python.
 Datetime: Date and Time are used to showing Date and Time. This module
comes built-in with Python.
 Smtplib: Simple Mail Transfer Protocol (SMTP) is used as a protocol to
handle the email transfer using Python. It is used to route emails between
email servers. It is an application layer protocol which allows to users to
send mail to another

We will set our engine to Pyttsx3 which is used for text to speech in Python and
sapi5 is a Microsoft speech application platform interface we will be using this for
text to speech function.

You can change the voice Id to “0” for the Male voice while using assistant here
we are using a Female voice i.e “1” for all text to speech.

3.6 Conclusion

In conclusion, voice assistants represent a transformative technology that has


reshaped the way we interact with devices, access information, and accomplish
tasks. Their significance lies in their ability to provide accessibility, convenience,
and efficiency in a wide range of applications. From simplifying daily tasks and
enhancing productivity to improving accessibility for individuals with disabilities,
voice assistants have become an integral part of our lives. As voice assistant
technology continues to advance, we can expect even more innovative
applications and improvements in natural language understanding, making these
virtual assistants increasingly valuable in our homes, workplaces, and
communities. Voice assistants are not merely a technological convenience; they
are a powerful tool that fosters accessibility, personalization, and safety while
driving innovation in AI and human-computer interaction. Their continued
evolution promises a future where seamless voice-powered interactions will
further enrich our lives and redefine how we interact with the digital world.
References

[1] Shaughnessy, IEEE, Interacting with Computers by Voice: Automatic Speech


Recognition and Synthesis proceedings of the IEEE, vol. 91, no. 9, september
2003.

[2] Patrick Nguyen, Georg Heigold, Geoffrey Zweig, Speech Recognition with
Flat Direct Models, IEEE Journal of Selected Topics in Signal Processing,
2010

[3] Mackworth (2019-2020), Python code for voice assistant: Foundations of


Computational Agents- David L. Poole and AlanK. Mackworth.

[4] Nil Goksel, CanbekMehmet, EminMutlu, On the track of Artificial


Intelligence: Learning with Intelligent Personal Assistant, proceedings of
International Journal of Human Sciences, 2016.

[5] Keerthana S, Meghana H, Priyanka K, Sahana V. Rao, Ashwini B Smart Home


Using Internet of Things, proceedings of Perspectives in Communication,
Embedded -systems and signal processing, 2017.

[6] Sutar Shekhar, P. Sameer, Kamad Neha, Prof. Devkate Laxman, An Intelligent
Voice Assistant Using Android Platform, IJARCSMS, ISSN: 232-7782, 2017.

[7] Rishabh Shah, Siddhant Lahoti, Prof. Lavanya. K, An Intelligent Chatbot using
Natural Language Processing, International Journal of Engineering Research,
Vol.6 , pp.281-286, 2017.

[8] Luis Javier RodrÃguez-Fuentes, Mikel PeÃagarikano, AparoVarona,


Germán

Bordel, GTTS-EHU Systems for the Albayzin 2018 Search on Speech Evaluation.

You might also like