sem5_synopsis
sem5_synopsis
Snehal Barkale 02
Shivraj Chavan 06
Omkar Chendge 07
University of Mumbai
(AY 2024-25)
CERTIFICATE
This is to certify that the Synopsis on Mini Project entitled “VoiceMate:AI-
powered personal assistant” is a bonafide work of Snehal Barkale (02), Shivraj
Chavan (06), Omkar Chendge (07) submitted to the University of Mumbai in
partial fulfillment for TE (Artificial Intelligence & Machine Learning Engineering)
semester V during the academic year 2024-25 as prescribed by University of
Mumbai.
Mentor
Prof. Anita Shirture
Examiners
1………………………………………
(Internal Examiner Name & Sign)
2…………………………………………
(External Examiner name & Sign)
Date:
Place:
Contents
Abstract i
Acknowledgments ii
1 Introduction 1
1.1 Introduction
1.2 Motivation
1.3 Problem Statement & Objectives
1.4 Organization of the Report
2 Literature Survey 4
3.1 Introduction
3.2 Architecture/ Framework
3.3 Algorithm and Process Design
3.4 Details of Hardware & Software
3.4 Experiment and Results for Validation and Verification
3.5 Analysis
3.6 Conclusion and Future work.
4 References 24
Abstract
We sincerely wish to thank the project guide Prof. Anita Shirture for her
encouraging and inspiring guidance helped us to make our project a success. Our
project guide makes us endure with her expert guidance, kind advice and timely
motivation which helped us to determine our project.
We would like to thank our project coordinator Prof. Anita Shirture for all the
support we needed from her for our project.
We also express our deepest thanks to our HOD Dr. Renuka Deshpande who’s
benevolent helps us making available the computer facilities to us for project in
our laboratory and making it true success. Without her kind and keen co-operation
our project would have been stifled to standstill.
Lastly, we would like to thank our college principal Dr. Pramod Rodge for
providing lab facilities and permitting to go on with our project. We would also
like to thank our colleagues who helped us directly or indirectly during our project.
List of figures
3.2 Architecture 11
3.3 Algorithm 12
List of Abbreviations
1.1 Introduction
A voice assistant is a type of artificial intelligence (AI) software application or
virtual assistant. In the fast-paced world of today, the demand for efficiency and
convenience has led to the rise of virtual assistants, revolutionizing the way we
interact with technology and manage our daily tasks. A virtual assistant is a
computer program or application that uses artificial intelligence (AI) and natural
language processing (NLP) to provide users with a wide range of services and
support, often mimicking the role of a human personal assistant. These digital
companions have transformed the way we work, stay organized, and access
information. The concept of a virtual assistant can be traced back to the advent of
speech recognition and text-to-speech technology. Over the years, advancements
in machine learning, data analytics, and AI have allowed virtual assistants to
become increasingly sophisticated and versatile. These digital helpers are now
integrated into various devices and platforms, including smartphones, smart
speakers, smartwatches, and even cars, making them accessible to a wide range
of users.
Virtual assistants come in various forms and are often tailored to specific
applications and ecosystems. Some of the most popular virtual assistants include
Apple's Siri, Amazon's Alexa, Google Assistant, and Microsoft's Cortana. These
platforms can perform a multitude of tasks, such as answering questions, setting
reminders, sending messages, playing music, providing directions, and
controlling smart home devices.
2. Lack of Personalization
The voice assistance has been developed for users for educational, business and
for personal use. It has achieved the objectives and scope that were stated in this
project the project will achieve some of the below objectives:
It will have a proper Graphical User Interface (GUI). It can open chrome,
YouTube, Wikipedia, all windows applications, etc to search information and read
2 or 3 lines for the user from Wikipedia. It can open power point presentation. It
can tell us the current time. It can send mails, SMS. It can make phone calls. It can
play online music. It can predict weather. It will have a chat history keeping
feature. It will have a Face authentication system which will allow the program to
run only when it detects a face.
3. Proposed system
3.1 Introduction
Virtual assistant is software program that helps you ease your day-to-day tasks,
such as showing weather forecasting, playing music, etc. They can take commands
as voice or text. Voice based intelligent assistant need an invoking words or wake
words to active the listener, followed by commands. For my project the wake, up
word is “SOFIA”. Our voice assistant is designed to be used efficiently for all
users. This personal assistant software improves user’s productivity by managing
day to day tasks & providing information from online sources to users.
• Audio Input/Output:
• Speech Recognition:
o Choose the platform for your voice assistant. It can be a mobile app,
a web-based interface, a smart speaker, or a custom hardware device.
The diagram shows the main process flow of how Voice Assistant works.
Fig 3.2 Voice Assiatant Framework
In this module, the person’s commands are converted from speech to text using the
Google Speech to Text Cloud API. Google Speech to Text Cloud API transcribes
the speech file using the most advanced deep learning neural network algorithms
for automatic speech recognition (ASR) and returns the text statement. Google
Speech to Text Cloud API is one of the simplest methods for recognizing speech
and can analyse up to 1 min of voice data.
This module is responsible for understanding the correct command from the text
generated by the Google API and then confirming it with a human to execute the
desired action. Because of the uncertainties in human language, it is extremely
challenging to create software that correctly ascertains the text’s intended meaning,
so NLP is used in this module for manipulating and recognizing the text. NLP
deconstructs the text into small units to assist the computer in understanding the
ingesting text. Different libraries and algorithms are proposed for NLP, such as the
Natural Language Processing toolkit (NLTK).
Module 3: Command execution
Software Requirements:
• Operating System
• Speech Recognition Software
• Text-to-Speech (TTS) Software
• Local Databases
• Development Environment
Hardware Requirements:
• Microphone
• Speakers
• Processing unit (CPU/GPU)
• Storage (HDD/SSD)
• Memory (RAM)
• Power supply
The expected result of our project is we will be developing a voice assistant that
will be useful in educational purposes, business, personal use, etc.
3.6 Analysis
Modules needed
We will set our engine to Pyttsx3 which is used for text to speech in Python and
sapi5 is a Microsoft speech application platform interface we will be using this for
text to speech function.
You can change the voice Id to “0” for the Male voice while using assistant here
we are using a Female voice i.e “1” for all text to speech.
3.6 Conclusion
[2] Patrick Nguyen, Georg Heigold, Geoffrey Zweig, Speech Recognition with
Flat Direct Models, IEEE Journal of Selected Topics in Signal Processing,
2010
[6] Sutar Shekhar, P. Sameer, Kamad Neha, Prof. Devkate Laxman, An Intelligent
Voice Assistant Using Android Platform, IJARCSMS, ISSN: 232-7782, 2017.
[7] Rishabh Shah, Siddhant Lahoti, Prof. Lavanya. K, An Intelligent Chatbot using
Natural Language Processing, International Journal of Engineering Research,
Vol.6 , pp.281-286, 2017.
Bordel, GTTS-EHU Systems for the Albayzin 2018 Search on Speech Evaluation.