b.e-cse-batchno-52
b.e-cse-batchno-52
SCHOOL OF COMPUTING
SATHYABAMA
INSTITUTE OF SCIENCE AND TECHNOLOGY
(DEEMED TO BE UNIVERSITY)
Accredited with Grade “A” by NAAC | 12B Status by UGC | Approved by
AICTE
JEPPIAAR NAGAR, RAJIV GANDHI SALAI,
CHENNAI - 600119
APRIL - 2023
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
BONAFIDE CERTIFICATE
This is to certify that this Project Report is the bonafide work of Sri Meenakshi
Pandey(Reg.No - 39110967) and S Sindu(Reg.No - 39110948) who carried out the Project
Phase-2 entitled “AI DESKTOP ASSISTANT” under my supervision from January 2023 to
April 2023.
Internal Guide
Ms. C A Daphine Desona Clemency, M.E.
I, Sri Meenakshi Pandey (Reg.No- 39110967), hereby declare that the Project Phase-2
Report entitled ―AI DESKTOP ASSISTANT” done by me under the guidance of Ms. C
A Daphine Desona Clemency, M.E. is submitted in partial fulfillment of the
requirements for the award of Bachelor of Engineering degree in Computer Science and
Engineering.
DATE:20.4.2023
I convey my thanks to Dr. T. Sasikala M.E., Ph. D, Dean, School of Computing, Dr.
L. Lakshmanan M.E., Ph.D., Head of the Department of Computer Science and Engineering
for providing me necessary support and details at the right time during the progressive
reviews.
I would like to express my sincere and deep sense of gratitude to my Project Guide Ms. C A
Daphine Desona Clemency, M.E. for her valuable guidance, suggestions and constant
encouragement paved way for the successful completion of my phase-2 project work.
I wish to express my thanks to all Teaching and Non-teaching staff members of the
Department of Computer Science and Engineering who were helpful in many ways for the
completion of the project.
ABSTRACT
As we know Python is an emerging language, so it becomes easy to write a script for Voice
Assistant in Python. The instructions for the assistant can be handled as per the requirement of
user. Speech recognition is the process of converting speech into text. This is commonly used in
voice assistants like Alexa, Siri, etc. In Python there is an API called Speech-Recognition which
allows us to convert speech into text. It was an interesting task to make my own assistant. It
became easier to send emails without typing any word, searching on Google without opening
the browser, and performing many other daily tasks like playing music, opening your favorite
IDE with the help of a single voice command. In the current scenario, advancement in
technologies is such that they can perform any task with same effectiveness or can say more
effectively than us. By making this project, I realized that the concept of AI in every field is
decreasing human effort and saving time. The AI Desktop Assistant project aims to develop a
cutting-edge software application that leverages the latest advancements in Artificial
Intelligence to provide users with a personalized, intuitive, and efficient desktop assistant. The
assistant will use natural language processing to interpret user commands, generate personalized
responses, and perform various tasks, including but not limited to scheduling appointments,
sending emails, managing files, and providing recommendations based on user preferences. The
system will also learn from user interactions, continually improving its accuracy and
functionality over time. The project will utilize machine learning techniques such as deep neural
networks to develop robust models that can accurately understand and respond to user input.
The end goal is to create a user-friendly, efficient, and reliable AI desktop assistant that can
improve productivity, simplify tasks, and enhance the overall user experience.
TABLE OF CONTENTS
Chapte TITL
r No Page No.
E
ABSTRACT v
1 INTRODUCTION 1
LITERATURE SURVEY 4
2
2.1 Inferences from Literature Survey 6
2.2 Open problems in Existing System 7
3 REQUIREMENTS ANALYSIS
3.1 Feasibility Studies/Risk Analysis of the Project 11
3.2 Software Requirements Specification Document 11
3.3 System Use case 12
4 DESCRIPTION OF PROPOSED SYSTEM 17
4.1 Selected Methodology or process model 18
4.2 Architecture / Overall Design of Proposed System 19
Description of Software for Implementation and Testing plan
4.3 of the Proposed Model/System 19
INTRODUCTION
Artificial Intelligence when used with machines, it shows us the capability of thinking like
humans. In this, a computer system is designed in such a way that typically requires
interaction from human. As we know Python is an emerging language so it becomes easy to
write a script for Voice Assistant in Python. The instructions for the assistant can be handled
as per the requirement of user. Speech recognition is the Alexa, Siri, etc. In Python there is an
API called Speech Recognition which allows us to convert speech into text. It was an
interesting task to make my own assistant. It became easier to send emails without typing any
word, searching on Google without opening the browser, and performing many other daily
tasks like playing music, opening your favorite IDE with the help of a single voice command.
In the current scenario, advancement in technologies is such that they can perform any task
with same effectiveness or can say more effectively than us.
By making this project, I realized that the concept of AI in every field is decreasing human
effort and saving time. As the voice assistant is using Artificial Intelligence hence the result
that it is providing are highly accurate and efficient. The assistant can help to reduce human
effort and consumes time while performing any task, they removed the concept of typing
completely and behave as another individual to whom we are talking and asking to perform
task. The assistant is no less than a human assistant, but we can say that this is more effective
and efficient to perform any task. The libraries and packages used to make this assistant
focuses on the time complexities and reduces time. The functionalities include, it can send
emails, it can read PDF, It can send text on WhatsApp, It can open command prompt, your
favorite IDE, notepad etc., It can play music, It can do Wikipedia searches for you, It can
open websites like Google, YouTube, etc., in a web browser, It can give weather forecast, It
can give desktop reminders of your choice. It can have some basic conversation. Tools and
technologies used are PyCharm IDE for making this project, and I created all py files in
PyCharm. Along with this I used following modules and libraries in my project. pyttsx3,
SpeechRecognition, Datetime, Wikipedia, Smtplib, pywhatkit, pyjokes, pyPDF2, pyautogui,
pyQt etc. I have created a live GUI for interacting with the Assistant as it gives a design and
interesting look while having the conversation.
An AI desktop virtual assistant is a revolutionary technology that has transformed the way we
interact with our devices. The technology is designed to mimic human conversation, allowing
users to communicate with their devices using natural
language. It has numerous applications and is being used in a variety of industries, from
healthcare to finance and entertainment.
AI desktop virtual assistants are software programs that use artificial intelligence to
understand and respond to user queries. They are designed to interact with users in a
conversational manner, using natural language processing and machine learning algorithms to
understand user input and provide relevant responses. They can be accessed through a range
of devices, including desktops, laptops, and smartphones.
The use of AI desktop virtual assistants has become increasingly popular in recent years, as
users seek to simplify their daily tasks and enhance their productivity. They can be used for a
variety of purposes, such as scheduling appointments, setting reminders, searching the web,
sending emails, and controlling smart home devices. This makes them an essential tool for
both personal and professional use.
One of the key benefits of AI desktop virtual assistants is their ability to learn from user
interactions. They use machine learning algorithms to analyze user data and improve their
performance over time. This means that the more users interact with the assistant, the better it
becomes at understanding their needs and providing relevant responses. As a result, the
technology is becoming increasingly sophisticated, with new features and capabilities being
added all the time.
Another benefit of AI desktop virtual assistants is their ability to integrate with other
technologies. They can be used in conjunction with smart home devices, such as smart
speakers and thermostats, to control the environment and perform other tasks. They can also
be integrated with other software programs, such as productivity tools and customer
relationship management (CRM) systems, to streamline workflows and improve efficiency.
In the healthcare industry, AI desktop virtual assistants are being used to improve patient care
and reduce administrative burdens. They can be used to schedule appointments, send
reminders, and provide patients with personalized healthcare information. They can also be
used to analyze patient data and provide insights into patient health, allowing healthcare
providers to make more informed decisions.
In the finance industry, AI desktop virtual assistants are being used to improve customer
service and streamline workflows. They can be used to provide customers with personalized
financial advice, analyze customer data to identify trends and opportunities, and perform
routine tasks such as account balance inquiries and transaction histories.
In the entertainment industry, AI desktop virtual assistants are being used to enhance the user
experience. They can be used to recommend movies and TV shows, provide personalized
entertainment news and updates, and even create personalized playlists based on user
preferences.
Despite the many benefits of AI desktop virtual assistants, there are also some concerns about
their use. One of the key concerns is privacy and security. As these assistants are designed to
collect and analyze user data, there are concerns about how this data is being used and who
has access to it. There are also concerns about the accuracy and reliability of the technology,
particularly in industries such as
healthcare where accuracy is critical.
To address these concerns, it is important to ensure that AI desktop virtual assistants are
designed with privacy and security in mind. This means implementing robust security
measures to protect user data, such as encryption and multi-factor authentication. It also
means implementing transparency and accountability measures, such as clear privacy policies
and user consent requirements.
Overall, AI desktop virtual assistants are a transformative technology that has the potential to
revolutionize the way we interact with our devices. They offer numerous benefits, including
improved productivity, personalized experiences, and enhanced customer service. As the
technology continues to evolve, it is likely that we will see even more applications and
benefits emerge, making them an essential tool for both personal and professional use.
CHAPTER 2
LITERATURE SURVEY
This area of digital assistants having speech reputation has visible some primary
advancements or inventions. This is especially due to its call for in gadgets like smart watches
or health bands, speakers, Bluetooth earphones, cellular telephones, computer or desktop, TV,
and so forth. Almost all the digital gadgets which are coming nowadays with voice assistants,
which assist to control the device with speech recognition. A new set of strategies is being
evolved constantly to improve the performance of voice computerized seek. With using voice
assistants, we are able to automate the project without difficulty, simply give the center to the
machine within the speech shape and all the duties might be accomplished by means of it
from changing your speech into textual content shape to putting off keywords from that text
and execute the question to provide outcomes to the person. This has been one of the most
beneficial improvements in era. Before AI we have been the ones who have been upgrading
technology to do an assignment however now the gadget is itself able to counter new
responsibilities and clear up it without need to involve the people to conform it.
A computer primarily based approach for performing a command via a voice consumer
interface on a subset of objects. The subset is selected from a fixed of items, each having an
object type at least one taggable field is associated with the object type and has a
corresponding value. The set of objects is saved in the laptop memory. An utterance is
acquired from the person and consists of a command, an object type choice, a tag-gable field
selection, and a price for the taggable discipline. Responsive to the utterance, at least one item
is retrieved from the set of gadgets, the item of the sort selected through the user and having a
price within the taggable area selection that matches the taggable field fee obtained from the
user the command is done on the item. The object includes textual content that‘s converted to
voice output. They envisioned that someday computers will recognize natural language and
count on what we need, whilst and where we need it, and proactively whole responsibilities
on our behalf. However, speech recognition and machine getting to know have persevered to
be refined, and based records served through packages and content providers have emerged.
We agree with that as computer systems turn out to be smaller and greater ubiquitous [e.g.,
wearables and Internet of Things (IoT). The recognizer is designed to change a verbal
articulation from an individual into an alternate method of data (e.g., text). A handheld
individual colleague including a voice-recognizer and a characteristic dialect processor is
disclosed. This snippet of data can be a plan for the day, data in the individual‘s logbook or
data from the individual‘s address book, Such as a telephone number. The Most well-known
utilization of iPhone is ―SIRI‖ which causes the end client to impart end client versatile with
voice and it additionally reacts to the voice charges of the client. It is named as Personal
Assistant with Voice Recognition Intelligence, which takes the client contribution to type of
voice or content and process it and returns the yield in different structures like activity to be
performed or the item is directed to the end client. Furthermore, this proposed framework can
change the method for communications between end client and the cell phones. Open Data is
currently gathering consideration for imaginative administration creation, predominantly in
the zone of government, bio science, and shrewd venture. Be that as it may, to advance its
application more for purchaser administrations, a web crawler for Open Data to realize what
sort of information is there would be of assistance. Virtual Personal Assistant (VPA) is the
up-and-coming age of bearer administrations for
portable clients. VPA is accepted to be the smart advancement of administrations to take care
of the regularly expanding demand by the portable experts for portability and network. The
Virtual Personal Assistant (VPA) will empower the client to productively handle expanding
interest of phone calls, messages, gatherings and different exercises. In any case, a great many
people don‘t utilize them consistently. In particular, critical concerns rose around security,
adaptation, information permanency and straight forwardness. Drawing on these discoveries
we talk about key difficulties, including outlining for interrupt ability; reexamination of the
human similitude; issues of trust and information proprietorship. As virtual assistants move
toward becoming more intelligent and the IVA biological community of administrations and
gadgets extends, there‘s a developing need to comprehend the security and protection dangers
from this rising innovation. Better demonstrative testing can uncover such vulnerabilities and
prompt more reliable frameworks. It enables the objective clients to connect with PCs and
web-based administrations with a wide cluster of usefulness considering different web
administrations and social media. There are four standard parts of the system: the voice
recognition module, the natural language processing module, conversational agent and the
content extraction module. The current screen per client writing computer programs is not
fitting for getting to Internet in perspective of the base help they give for web content and the
nonattendance of voice affirmation. The Virtual Right- hand programming open in the market
are not especially given everything and unfit to utilize it similarly. Some may confront issue
now too. This paper presents a usability of four Virtual assistant voice-based and contextual
text (Google assistant, Coratan, Siri, Alexa). Cortana can likewise read your messages, track
your area, watch your perusing history, check your contact list, watch out for your datebook,
and set up this information together to propose valuable data, on the off chance that you
enable it. You can likewise type your inquiries or solicitations, if you want to not stand up
uproarious. It is only desktop based virtual assistant. Siri: Siri has been an integral part of iOS
since the dispatch of iOS 5 of every 2011. It began with the nuts and bolts, for example,
climate and informing, yet has extended significantly from that point forward to help all the
more outsider mix with MacOS. While Siri‘s jokes are unbelievable, the virtual aide is getting
more able consistently. Presently, you can request that it call individuals, send messages, plan
gatherings, dispatch applications and recreations, and play music, answer questions, set
updates, and give climate conjectures. Google Assistant: Google Assistant (which has
consolidated capacities from the more seasoned Google now, as now is being eliminated) is
unique in relation to Cortana and Siri. Survey on Virtual Assistant: Google Assistant, Siri,
Cortana, Alexa 193 The significantly conversational VA is capable at interpreting essential
vernaculars and understanding the importance behind unobtrusively complex request like,
―What should we have for dinner?‖ It can in like manner see up to six unmistakable voices for
couples and families, each voice settling to different logbook events and slants, great position
amazing to Assistant and impeccable in a condition where everyone uses the voice helper on a
singular gadget. Alexa: While sharing different features similarly as various VAs, Alexa is in
its own one of a kind class. Amazon‘s voice partner isn‘t centered on portable or PC
purposes, but instead for the independent Amazon Echo speaker and a set number of Amazon
Fire gadgets, with a more prominent focus on entire house administration and administrations
as opposed to PC situated errands. Each business visionary, side trickster and multitasking
proficient out there would love to have a virtual assistant right hand to go up against a portion
of the dull everyday errands that accompany existing in the advanced time. Similarly, as with
any developing innovation, in any case, it can be hard to isolate the build up from the
certainties. There are four noteworthy players seeking consideration: Amazon (Alexa),
Apple (Siri), (Google Assistant) and Microsoft (Cortana). I invested hours testing each of the
four assistants by making inquiries and giving charges that numerous business clients would
utilize. Amid the testing procedure, I noticed the accomplishment of the AI‘s reaction to me,
and in addition different components a planned user may think about, for example, simplicity
of setup, general capacity to perceive my voice and relevant comprehension. About each cell
phone and PC available today has a brilliant right hand caught inside, like an accommodating
phantom—however how might they stack up against each other? While it may seem like Siri,
Cortana, and the mysterious Google Assistant are in general just assortments of the same
virtual partners, they each have their own specific unconventionality‘s, imperfections, and
characteristics. So, which one‘s best for clients? All things considered, that isn‘t a basic
request to answer, as they‘re like the point that it‘s hard to take a gander at them without
plunging significant into their capacities.
Artificial Intelligence (AI) is any task performed by program machine, which otherwise
human needs to apply intelligence to accomplish it. It is the science and engineering of
making machines to demonstrate intelligence especially visual perception, speech recognition,
decision-making, and translation between languages like human beings. AI is the simulation
of human intelligence processes by machines, especially computer systems. Being a new
technology, there is a huge shortage of working manpower having data analytics and
data science skills; those in turn can be deputed to get maximum output from artificial
intelligence. As the advancement of AI rising, businesses lack as killed professional who can
match the requirement and work with this technology. Business owners need to train their
professionals to be able to leverage the benefits of this technology. Artificial neural networks
allow modeling of nonlinear processes and become a useful tool for solving many problems
such as classification, clustering, dimension reduction, regression, structured prediction,
machine translation, anomaly detection, pattern recognition, decision-making, computer
vision, visualization, and others. This wide range of abilities makes it possible to use
artificial neural networks in many areas. Recent developments in AI techniques
complimented by the availability of high computational capacity at increasingly accessible
costs, wide availability of labeled data, and improvement in learning techniques result in
exploring the wide application domain for AI. AI improves lives of human beings by assisting
in driving, taking personal care of aged
/handicap people, executing arduous and dangerous tasks, assisting in making informed
decisions, rationally managing huge amounts of data that would otherwise be difficult to
interpret, assisting in translating, and communicating multilingual while not knowing the
language of our interlocutors and many more. Artificial intelligence is already everywhere
and is widely used in ways that are obvious. The long-term economic effects of AI are
uncertain. A survey of economists showed disagreement about whether the increasing use of
robots and AI will cause a substantial increase in long-term unemployment, but they
generally agree that it could be a net benefit, if productivity gains are redistributed. A 2017
study by PricewaterhouseCoopers sees the People‘s Republic of China gaining economically
the most out of AI with 26,1% of GDP until 2030. A February 2020 European Union white
paper on artificial intelligence advocated for artificial intelligence for economic benefits,
including "improving healthcare (e.g. making
diagnosis more precise, enabling better prevention of diseases), increasing the efficiency of
farming, contributing to climate change mitigation and adaptation, improving the efficiency
of production systems through predictive maintenance", while acknowledging potential risks.
Concern over risk from artificial intelligence has led to some high-profile donations and
investments. A group of prominent tech titans including Peter Thiel, Amazon Web Services
and Musk have committed $1 billion to OpenAI, a nonprofit company aimed at championing
responsible AI development. The opinion of experts within the field of artificial intelligence
is mixed, with sizable fractions both concerned and unconcerned by risk from eventual super
humanly capable AI. Other technology industry leaders believe that artificial intelligence is
helpful in its current form and will continue to assist humans. Oracle CEO Mark Hurd has
stated that ―AI will actually create more jobs, not less jobs" as humans will be needed
to manage AI systems. Facebook CEO Mark Zuckerberg believes AI will "unlock a huge
amount of positive things,‖ such as curing disease and increasing the safety of autonomous
cars.
There are three identified challenges that vendors address in the voice-recognition domain:
first improving speech recognition and command processing; second offering support for
different languages, different accents, and bilingual users; and third understanding
conversational contexts and establishing rapport.5 Vendors have recognized these challenges,
and some proposals and efforts are promising. Deep learning algorithms, for instance, have
enabled tremendous advances in speech recognition. These three challenges are well
recognized and the community is working towards mitigating them.
REQUIREMENT ANALYSIS
Feasibility study can help you determine whether or not you should proceed with your
project. It is essential to evaluate cost and benefit. It is essential to evaluate cost and benefit of
the proposed system. Five types of feasibility study are taken into consideration.
1. Technical feasibility:
It includes finding out technologies for the project, both hardware and software. For virtual
assistant, user must have microphone to convey their message and a speaker to listen when
system speaks. These are very cheap now a days and everyone generally possess them.
Besides, system needs internet connection. While using Assistant, make sure you have a
steady internet connection. It is also not an issue in this era where almost every home or office
has Wi-Fi.
2. Operational feasibility:
It is the ease and simplicity of operation of proposed system. System does not require any
special skill set for users to operate it. In fact, it is designed to be used by almost everyone.
Kids who still don‘t know to write can readout problems for system and get answers.
3. Economical feasibility:
Here, we find the total cost and benefit of the proposed system over current system. For this
project, the main cost is documentation cost. User also would have to pay for microphone and
speakers. Again, they are cheap and available. As far as maintenance is concerned, Assistant
won‘t cost too much.
4. Organizational feasibility:
This shows the management and organizational structure of the project. This project is not
built by a team. The management tasks are all to be carried out by a single person. That won‘t
create any management issues and will increase the feasibility of the project.
5. Cultural feasibility:
It deals with compatibility of the project with cultural environment. Virtual assistant is built
in accordance with the general culture. The project is named Assistant so as to
represent Indian culture without undermining local beliefs. This project is technically feasible
with no external hardware requirements. Also, it is simple in operation and does not cost
training or repairs. Overall feasibility study of the project reveals that the goals of the
proposed system are achievable. Decision is taken to proceed with the project.
3.2.1. PYCHARM: It is an IDE i.e., Integrated Development Environment which has many
features like it supports scientific tools (like matplotlib, numpy, scipy etc) web frameworks
(example Django, web2py and Flask) refactoring in Python, integrated python debugger, code
completion, code and project navigation etc. It also provides Data Science when used with
Anaconda.
3.2.2. PYQT5 FOR LIVE GUI: PyQt5 is the most important python binding. It contains set
of GUI widgets. PyQt5 has some important python modules like QTWidgets, QtCore, QtGui,
and QtDesigner etc.
pywhatkit: It is python library to send WhatsApp message at a particular time with some
additional features.
Datetime: This library provides us the actual date and time. Wikipedia: It is a python module for
searching anything on Wikipedia.
Smtplib: Simple mail transfer protocol that allows us to send mails and to route mails between
mail servers.
pyPDF2: It is a python module which can read, split, merge any PDF.
sys: It allows operating on the interpreter as it provides access to the variables and functions that
usually interact strongly with the interpreter.
1. The customer visits the company's website and clicks on the chatbot icon.
2. The chatbot greets the customer and asks how it can assist them.
3. The customer provides a brief description of their issue or question.
4. The chatbot analyzes the customer's message and offers relevant responses or actions.
5. If the chatbot is unable to resolve the issue or if the customer requests additional
assistance, the chatbot escalates the conversation to a human support agent.
6. The chatbot logs the conversation and any actions taken, then ends the conversation
with the customer.
Alternative Flows:
● If the customer requests to speak with a human support agent at any point, the chatbot
immediately escalates the conversation to an available agent.
● If the chatbot is unable to understand the customer's message, it asks the customer to
clarify and repeats the step of analyzing the message.
Post-Conditions:
● The customer's issue has been resolved, or the customer has been escalated to a
human support agent.
● The conversation has been logged and can be used for future reference or analysis.
3.3.1 ER DIAGRAM
Initially, the system is in idle mode. As it receives any wake up cal it begins execution. The
received command is identified whether it is a questionnaire or a task to be performed.
Specific action is taken accordingly. After the Question is being answered or the task is being
performed, the system waits for another command. This loop continues unless it receives quit
command. At that moment, it goes back to sleep.
The class user has 2 attributes command that it sends in audio and the response it receives
which is also audio. It performs function to listen the user command. Interpret
it and then reply or sends back response accordingly. Question class has the command in
string form as it is interpreted by interpret class. It sends it to general or about or search
function based on its identification. The task class also has interpreted command in string
format. It has various functions like reminder, note, mimic, research and reader.
In this project there is only one user. The user queries command to the system. System then
interprets it and fetches answer.The response is sent back to the user.
The above sequence diagram shows how an answer asked by the user is being fetched from
internet. The audio query is interpreted and sent to Web scraper. The web scraper searches
and finds the answer. It is then sent back to speaker, where it speaks the answer to user.
The main component here is the Virtual Assistant. It provides two specific service, executing
Task or Answering your question.
CHAPTER 4
We are familiar with many existing voice assistants like Alexa, Siri, Google Assistant,
Cortana which uses concept of language processing, and voice recognition. They listen the
command given by the user as per their requirements and performs that specific function in a
very efficient and effective manner. As these voice assistants are using Artificial Intelligence
hence the result that they are providing are highly accurate and efficient. These assistants can
help to reduce human effort and consumes time while performing any task, they removed the
concept of typing completely and behave as another individual to whom we are talking and
asking to perform task. These assistants are no less than a human assistant, but we can say
that they are more effective and efficient to perform any task. The algorithm used to make
these assistant focuses on the time complexities and reduces time. But for using these
assistants one should have an account (like Google account for Google assistant, Microsoft
account for Cortana) and can use it with internet connection only because these assistants are
going to work with internet connectivity. They are integrated with many devices like, phones,
laptops, and speakers etc.
It was an interesting task to make my own assistant. It became easier to send emails without
typing any word, searching on Google without opening the browser, and performing many
other daily tasks like playing music, opening your favorite IDE with the help of a single voice
command. Assistant is different from other traditional voice assistants in terms that it is
specific to desktop and user does not need to make account to use this, it does not require any
internet connection while getting the instructions to perform any specific task. The IDE used
in this project is PyCharm. All the python files were created in PyCharm and all the necessary
packages were easily installable in this IDE. For this project following modules and libraries
were used i.e. pyttsx3, SpeechRecognition, Datetime, Wikipedia, Smtplib, pywhatkit,
pyjokes, pyPDF2, pyautogui, pyQt etc. I have created a live GUI for interacting with the
Assistant as it gives a design and interesting look while having the conversation. With the
advancement Assistant can perform any task with same effectiveness or can say more
effectively than us. By making this project, I realized that the concept of AI in every field is
decreasing human effort and saving time. Functionalities of this project include, It can send
emails, It can read PDF, It can send text on WhatsApp, It can open command prompt, your
favorite IDE, notepad etc., It can play music, It can do Wikipedia searches for you, It can
open websites like Google, YouTube, etc., in a web browser, It can give weather forecast, It
can give desktop reminders of your choice. It can have some basic conversation.
The system is designed using the concept of Artificial Intelligence and with the help of
necessary packages of Python. Python provides many libraries and packages to perform the
tasks, for example pyPDF2 can be used to read PDF. The data in this project is nothing but
user input, whatever the user says, the assistant performs the task accordingly. The user input
is nothing specific but the list of tasks which a user wants to get performed in human
language i.e., English.
Fig 4.1: A Representation of an AI Assistant receiving commands from
User
The system architecture diagram of the proposed system has been shown in the above figures.
The system is designed using the concept of Artificial Intelligence and with the help of
necessary packages of Python. Python provides many libraries and packages to perform the
tasks, for example pyPDF2 can be used to read PDF. The data in this project is nothing but
user input, whatever the user says, the assistant performs the task accordingly. The user input
is nothing specific but the list of tasks which a user wants to get performed in human
language i.e., English.
4.3.1.3. pywhatkit: It is python library to send WhatsApp message at a particular time with
some additional features. Python offers numerous inbuilt libraries to ease our work. Among
them pywhatkit is a Python library for sending WhatsApp messages at a certain time, it has
several other features too. Following are some features of pywhatkit module:
● Send WhatsApp messages.
● Play a YouTube video.
● Perform a Google Search.
● Get information on a particular topic.
The pywhatkit module can also be used for converting text into handwritten text images.
4.3.1.4. Datetime: This library provides us the actual date and time. Python Datetime
module comes built into Python, so there is no need to install it externally.
Python Datetime module supplies classes to work with date and time. These classes provide a
number of functions to deal with dates, times and time intervals. Date and datetime are an
object in Python, so when you manipulate them, you are actually manipulating objects and
not string or timestamps.
4.3.1.6. Smtplib: Simple Mail Transfer Protocol (SMTP) is used as a protocol to handle the
email transfer using Python. It is used to route emails between email servers. It is an
application layer protocol which allows to users to send mail to another. The receiver
retrieves email using the protocols POP(Post Office Protocol) and
IMAP(Internet Message Access Protocol). When the server listens for
the TCP connection from a client, it initiates a connection on port 587. Python provides a
smtplib module, which defines an the SMTP client session object used to send emails to an
internet machine. For this purpose, we have to import the smtplib module using the import
statement.
4.3.1.7. pyPDF2: PyPDF2 is a free and open source pure-python PDF library capable of
splitting, merging, cropping, and transforming the pages of PDF files. It can also add custom
data, viewing options, and passwords to PDF files. PyPDF2 can retrieve text and metadata
from PDFs as well. The low-level API (based on Pygments) allows writing programs that
generate or efficiently manipulate documents. The high-level API (based on ReportLab)
enables the creation of complex documents such as forms, books, or magazines with just a
few lines of code. PyPDF2 supports:
• Converting PDF files into images (png or jpeg) or text files;
• Converting PDF to text, image to text
• Creating new PDF documents from scratch;
• Editing existing PDFs by adding, removing, replacing, or modifying pages;
• Modifying existing PDFs by rotating pages, adding watermarks, changing fonts, etc.;
• Signing documents with digital signatures (certificates must be present); PyPDF2 has
been designed with performance in mind. It uses native C code to handle the most time-
consuming tasks (such as parsing) but never sacrifices the simplicity of its interface. The
library is also thread-safe, and its memory footprint is not much larger than the one required
by Python (around 1MB).
4.3.1.8. Pyjokes: Python supports creation of random jokes using one of its libraries. Let us
explore it a little more, Pyjokes is a python library that is used to create one-line jokes for
programmers. Informally, it can also be referred as a fun python library which is pretty
simple to use.
4.3.1.10. Pyautogui: Python pyautogui library is an automation library that allows mouse
and keyboard control. Or we can say that it facilitates us to automate the movement of the
mouse and keyboard to establish the interaction with the other application using the Python
script. It provides many features, and a few are given below.
• We can move the mouse and click in the other applications' window.
• We can send the keystrokes to the other applications. For example - filling out the
form, typing the search query to browser, etc.
• We can also take snapshots and give an image.
• It allows us to locate a window of the application, and move, maximize,
minimize, resizes, or close it.
• Display alert and message boxes.
4.3.1.11. os: Python OS module provides the facility to establish the interaction between the
user and the operating system. It offers many useful OS functions that are used to perform
OS-based tasks and get related information about operating system. The OS comes under
Python's standard utility modules. This module offers a portable way of using operating
system dependent functionality.
4.3.1.12. sys: The python sys module provides functions and variables which are used to
manipulate different parts of the Python Runtime Environment. It lets us access system-
specific parameters and functions. The sys module comes packaged with Python, which
means you do not need to download and install it separately using the PIP package manager.
In order to start using the sys module and its various functions, you need to import it.
4.3.1.13. subprocess: The subprocess module present in Python(both 2.x and 3.x) is used to
run new applications or programs through Python code by creating new processes. It also
helps to obtain the input/output/error pipes as well as the exit codes of various commands.
4.3.1.14. pygame: Game programming is very rewarding nowadays and it can also be used
in advertising and as a teaching tool too. Game development includes mathematics, logic,
physics, AI, and much more and it can be amazingly fun. In python, game programming is
done in pygame and it is one of the best modules for doing so.
4.3.1.14. pygame: Python provides a library named keyboard which is used to get full control
of the keyboard. It‘s a small Python library which can hook global events, register hotkeys,
simulate key presses and much more.
• It helps to enter keys, record the keyboard activities and block the keys until a
specified key is entered and simulate the keys.
• It captures all keys, even onscreen keyboard events are also captured.
• Keyboard module supports complex hotkeys.
• Using this module we can listen and send keyboard events.
• It works on both windows and linux operating system.
4.3.2. FUNCTIONS
4.3.2.1. takeCommand(): The function is used to take the command as input through
microphone of user and returns the output as string.
4.3.2.2. wishMe(): This function greets the user according to the time like Good Morning,
Good Afternoon and Good Evening.
4.3.2.3. taskExecution(): This is the function which contains all the necessary task
execution definition like sendEmail(), pdf_reader(), news() and many conditions in if
condition like ―open google‖, ―open notepad‖, ―search on Wikipedia‖ ,‖play music‖ and
―open command prompt‖ etc.
The system testing is done on fully integrated system to check whether the requirements are
matching or not. The system testing for desktop assistant focuses on the following four
parameters:
4.3.3. FUNCTIONALITY
In this we check the functionality of the system whether the system performs the task which it
was intended to do. To check the functionality each function was checked and run, if it is able
to execute the required task correctly then the system passes in that particular functionality
test. For example, to check whether Assistant can search on Google or not, user said ―Open
Google‖, then Assistant asked, ―What should I search on Google?‖ then user said, ―What is
Python‖, Assistant open Google and searched for the required input.
4.3.4. USABILITY
Usability of a system is checked by measuring the easiness of the software and how user
friendly it is for the user to use, how it responses to each query that is being asked by the
user. It makes it easier to complete any task as it automatically do it by using the essential
module or libraries of Python, in a conversational interaction way. Hence any user when
instruct any task to it, they feel like giving task to a human assistant because of the
conversational interaction for giving input and getting the desired output in the form of task
done. The desktop assistant is reactive which means it know human language very well and
understand the context that is provided by the user and gives response in the same way, i.e.
human understandable language, English. So user finds its reaction in an informed and smart
way. The main application of it can be its multitasking ability. It can ask for continuous
instruction one after other until the user ―QUIT‖ it. It asks for the instruction and listen the
response that is given by user without needing any trigger phase and then only executes the
task.
4.3.5. SECURITY
The security testing mainly focuses on vulnerabilities and risks. As Assistant is a local
desktop application, hence there is no risk of data breaching through remote access. The
software is dedicated to a specific system so when the user logs in, it will be activated.
4.3.6. STABILITY
Stability of a system depends upon the output of the system, if the output is bounded and
specific to the bounded input then the system is said to be stable. If the system works on all
the poles of functionality, then it is stable.
The planning phase involves defining the scope of the project, setting objectives, and
identifying the key stakeholders. In the case of an AI desktop virtual assistant, the project
team would need to determine the specific features and functionalities that the assistant would
need to perform and identify the user groups and target audience for the assistant. This would
involve conducting a thorough analysis of the market and the needs of potential users.
The project team would also need to define the project schedule, including key milestones
and deadlines, and identify the resources required for the project. This would include
determining the necessary hardware and software infrastructure, as well as the personnel
needed to develop and maintain the AI assistant.
The execution phase involves the actual development and implementation of the AI desktop
virtual assistant. This would involve a range of activities, such as designing the user interface
and programming the underlying AI algorithms. The project team would also need to conduct
extensive testing and quality assurance to ensure that the assistant is functioning correctly and
providing accurate and useful responses to user queries.
In addition, the project team would need to establish a process for collecting and analyzing
user feedback. This would involve developing a system for tracking user interactions with the
assistant, and using this data to continually improve the performance and functionality of the
assistant.
The monitoring and control phase involves ongoing monitoring of the project progress and
making adjustments as necessary to ensure that the project is meeting its objectives. This
would involve regular project status updates, as well as ongoing performance monitoring of
the AI assistant to identify and address any issues or areas for improvement.
Overall, a project management plan for an AI desktop virtual assistant would need to be
comprehensive and flexible, considering the unique challenges and complexities of
developing and implementing this type of technology. It would need to be focused on
delivering a high-quality product that meets the needs of users, while also ensuring that the
project stays on schedule and within budget. With careful planning and execution, an AI
desktop virtual assistant can provide significant benefits to users across a range of industries
and revolutionize the way we interact with our devices.
The project titled ―A.I. DESKTOP ASSISTANT‖ was designed by our team. From installing
of all the packages, importing, creating all the necessary functions, designing GUI in PyQT
and connecting that live GUI with the backend, was all done by us. We have done all the
research before making this project, designed the requirement documents for the
requirements and functionalities, wrote synopsis and all the documentation, code and made
the project in such a way that it is deliverable at each stage. We have created the front end (.ui
file) of the project using PyQt designer, the
front end comprises of a live GUI and is connected with the .py file which contains all the
classes and packages of the .ui file. The live GUI consists of moving GIFs which makes the
front end attractive and user friendly. We have written the complete code in Python language
and in PyCharm IDE from where it was very easy to install the packages and libraries, We
have created the functions like takeCommand(), wishMe() and taskExecution() which has the
following functionalities, like takeCommand() which is used to take the command as input
through microphone of user and returns the output as string, wishMe() that greets the user
according to the time like Good Morning, Good Afternoon and Good Evening and
taskExecution()which contains all the necessary task execution definition like sendEmail(),
pdf_reader(), news() and many conditions in if condition like ―open Google‖, ―open
notepad‖, ―search on Wikipedia‖
,‖play music‖ and ―open command prompt‖ etc. While making this project we realized that
with the advancement Assistant can perform any task with same effectiveness or can say
more effectively than us. By making this project, we realized that the concept of AI in every
field is decreasing human effort and saving time. Functionalities of this project include, It can
send emails, It can read PDF, It can send text on WhatsApp, It can open command prompt,
your favorite IDE, notepad etc., It can play music, It can do Wikipedia searches for you, It
can open websites like Google, YouTube, etc., in a web browser, It can give weather forecast,
It can give desktop reminders of your choice. It can have some basic conversation. At last, we
have updated my report and completed it by attaching all the necessary screen captures of
inputs and outputs, mentioning the limitations and scope in future of this project.
Deployment strategy: This outlines the approach that will be used to deploy the AI desktop
assistant model, including the infrastructure required, software dependencies, and any third-
party services that may be needed.
Testing and validation: This is a critical component of the T/S2O plan, as it ensures that the
AI desktop assistant model is functioning as expected and meets the performance and
accuracy requirements.
Monitoring and maintenance: Once the AI desktop assistant model is deployed, it needs to
be monitored and maintained to ensure that it continues to perform optimally. This includes
regular updates, bug fixes, and enhancements.
Security and compliance: Security and compliance are also important considerations in the
T/S2O plan, as the AI desktop assistant model may handle sensitive data and needs to be
compliant with relevant regulations and standards.
User support and training: Finally, the T/S2O plan should include provisions for user
support and training, as the AI desktop assistant model may be used by individuals
with varying levels of technical expertise.
Integration with Existing Systems: The AI desktop assistant needs to be integrated with
other systems and applications already in use by the organization to ensure that it functions
effectively within the organization's technology infrastructure.
Training and Support: End-users need to be trained on how to use the AI desktop assistant
effectively, and ongoing support should be provided to help users address any issues that may
arise during use.
By having a transition plan in place, organizations can ensure that the AI desktop assistant is
implemented effectively and efficiently, and that end-users are adequately supported
throughout the process. Overall, a T/S2O plan is an essential component of any AI desktop
assistant project, as it helps to ensure that the model is deployed and operated effectively and
efficiently and provides a framework for ongoing improvement and optimization.
CHAPTER 5
IMPLEMENTATION DETAILS
The code provides a Python implementation of an AI Desktop Assistant project that uses
speech recognition to take user commands, perform various tasks, and provide verbal
responses. The program imports several necessary packages for speech recognition, text-to-
speech conversion, music playback, opening files and web pages, and accessing information
from Wikipedia. The AI assistant can perform tasks like opening word, powerpoint, excel,
zoom, notepad, and chrome applications, searching Wikipedia, telling jokes, opening web
pages such as YouTube, Google, and Stack Overflow, and playing music. Additionally, the
assistant can write notes, speak the current date, and greet the user based on the current time.
Overall, the project demonstrates the potential of AI-powered desktop assistants in
automating routine tasks and enhancing user productivity.
The development setup for a Python-based virtual assistant would typically involve using an
integrated development environment (IDE) such as PyCharm, VSCode or Spyder. These
IDEs provide tools for coding, debugging, and testing Python programs. The virtual assistant
would also require the use of various Python libraries and packages, such as speech
recognition, pyttsx3, wikipedia, webbrowser, and pygame, to name a few.
The deployment setup would depend on how the virtual assistant is intended to be used. If it
is a personal project meant to be run on a local machine, the deployment process would
involve installing any necessary packages on the target machine and running the program on
that machine. However, if the virtual assistant is intended to be used by others or integrated
into a larger application, more involved deployment processes would be necessary.
1. Create an executable file for the virtual assistant using tools such as Pyinstaller,
cx_Freeze, or Py2Exe, which can package the Python program and its dependencies
into a single executable file. This makes it easier to distribute the program and ensures
that users don't need to install any additional packages.
2. Use cloud platforms like AWS, Azure or Google Cloud for deployment if the virtual
assistant is intended to be used by multiple users over the internet. In this case, the
program would be hosted on a cloud server and accessed through a web interface or
API.
3. Use containers like Docker or Kubernetes for deployment if the virtual assistant is
meant to be deployed on multiple machines. This makes it easier to manage and scale
the deployment.
4. Set up a continuous integration/continuous deployment (CI/CD) pipeline to automate
the deployment process. This ensures that any changes made to the code are
automatically tested, built, and deployed to the target environment.
Overall, developing and deploying a virtual assistant requires careful planning and
consideration of the intended use case and target audience.
5.2 ALGORITHMS
Neural Networks: Neural networks are a type of machine learning algorithm inspired by the
structure of the human brain. They are composed of multiple layers of interconnected nodes
and are used for a variety of tasks such as image and speech recognition, natural language
processing, and predictive modeling.
These are just a few examples of the many algorithms used in the development and
deployment of AI applications. The choice of algorithm will depend on the specific task at
hand, the size and complexity of the dataset, and other factors such as computational
resources and time constraints.
5.2.1.1 K-Means Clustering: K-means is a popular clustering algorithm used to group data
points into k clusters. In the code provided, K-means is used to cluster the RGB values of
each pixel in the image into a specified number of clusters (n_colors). Here
K defines the number of pre-defined clusters that need to be created in the process, as if K=2,
there will be two clusters, and for K=3, there will be three clusters, and so on.It allows us to
cluster the data into different groups and a convenient way to discover the categories of
groups in the unlabeled dataset on its own without the need for any training.
It is a centroid-based algorithm, where each cluster is associated with a centroid. The main
aim of this algorithm is to minimize the sum of distances between the data point and their
corresponding clusters. The algorithm takes the unlabeled dataset as input, divides the dataset
into k-number of clusters, and repeats the process until it does not find the best clusters. The
value of k should be predetermined in this algorithm.
Determines the best value for K center points or centroids by an iterative process. Assigns each
data point to its closest k-center. Those data points which are near to the particular k-center,
create a cluster.
5.2.1.2 Principal Component Analysis (PCA): PCA is a technique used to reduce the
dimensionality of a dataset by transforming it into a lower-dimensional space that still
contains most of the information in the original dataset. Principal Component Analysis is an
unsupervised learning algorithm that is used for the dimensionality reduction in machine
learning. It is a statistical process that converts the observations of correlated features into a
set of linearly uncorrelated features with the help of orthogonal transformation. These new
transformed features are called the Principal Components. It is one of the popular tools that is
used for exploratory data analysis and predictive modeling. It is a technique to draw strong
patterns from the given dataset by reducing the variances.
PCA generally tries to find the lower-dimensional surface to project the high- dimensional
data.
PCA works by considering the variance of each attribute because the high attribute shows the good
split between the classes, and hence it reduces the dimensionality.
Some real-world applications of PCA are image processing, movie recommendation system,
optimizing the power allocation in various communication channels. It is a feature extraction
technique, so it contains the important variables and drops the least important variable. In the
code provided, PCA is used to further reduce the dimensionality of the clustered RGB values.
Mini-batch K-means addresses this issue by processing only a small subset of the data, called
a mini-batch, in each iteration. The mini-batch is randomly sampled from the dataset, and the
algorithm updates the cluster centroids based on the data in the mini-batch. This allows the
algorithm to converge faster and use less memory than traditional K-means. In the code
provided, MiniBatch K-means is used to cluster the
reduced data generated by PCA.
The mini batch K-means is faster but gives slightly different results than the normal batch K-
means. Here we cluster a set of data, first with K-means and then with mini batch K-means,
and plot the results. We will also plot the points that are labeled differently between the two
algorithms.
5.2.1.4 Nearest Neighbors: Nearest Neighbors is a machine learning algorithm used for
classification and regression. K-NN is a non-parametric algorithm, which means it does not
make any assumption on underlying data. It is also called a lazy learner algorithm because it
does not learn from the training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset. KNN algorithm at the training phase just
stores the dataset and when it gets new data, then it classifies that data into a category that is
much similar to the new data. These are the steps of the algorithm:
The nearest neighbor algorithm is easy to implement and executes quickly, but it can
sometimes miss shorter routes which are easily noticed with human insight, due to its
"greedy" nature. As a general guide, if the last few stages of the tour are comparable in length
to the first stages, then the tour is reasonable; if they are much greater, then it is likely that
much better tours exist. Another check is to use an algorithm such as the lower bound
algorithm to estimate if this tour is good enough.
In the worst case, the algorithm results in a tour that is much longer than the optimal tour. To
be precise, for every constant r there is an instance of the traveling salesman problem such
that the length of the tour computed by the nearest neighbor algorithm is greater than r times
the length of the optimal tour. Moreover, for each number of cities there is an assignment of
distances between the cities for which the nearest
neighbor heuristic produces the unique worst possible tour. (If the algorithm is applied on
every vertex as the starting vertex, the best path found will be better than at least N/2-1 other
tours, where N is the number of vertices.)
The nearest neighbor algorithm may not find a feasible tour at all, even when one exists.
In the code provided, the Nearest Neighbors algorithm is used to find the nearest cluster
centers to each of the reduced data points.
5.3 TESTING
This code is a Python script that creates a virtual assistant which listens to voice commands
and performs various tasks based on the commands received.
The virtual assistant can perform tasks such as opening files, searching on Wikipedia,
opening web pages, telling jokes, playing music, and taking notes.
The code uses various Python libraries such as SpeechRecognition, pyttsx3, datetime, os,
wikipedia, webbrowser, pygame, subprocess, and pyjokes.
The speak() function is used to convert the text to speech using the pyttsx3 library, and the
takeCommand() function is used to listen to the user's voice commands using the
SpeechRecognition library.
The wishMe() function greets the user based on the current time, and the play_music()
function plays a song using the pygame library. The note() function takes a text input from
the user and creates a note file using the subprocess library.
The code uses various conditional statements to execute different tasks based on the user's
voice commands. The code is executed in a while loop that continuously listens to the user's
voice commands until the user terminates the program.
This code is a Python script for a virtual assistant that can take voice commands from the user
and perform various tasks, such as searching Wikipedia, opening websites, playing music,
and more. Here is a brief summary of what the code does:
1. Importing necessary packages: The code starts by importing the necessary libraries for
the project, including the speech_recognition, pyttsx3, datetime, os, wikipedia,
webbrowser, pygame, time, subprocess, and pyjokes libraries.
2. Initializing the speech engine: The code initializes the pyttsx3 speech engine and sets
the voice property to the first voice in the list of available voices.
3. Function to speak out the text: The speak() function is defined to speak out the text
passed to it using the pyttsx3 speech engine.
4. Function to wish according to the time: The wishMe() function is defined to wish the
user according to the time of day. The function uses the datetime library to get the
current hour and then speaks a greeting based on that hour.
5. Function to take command from the user: The takeCommand() function is defined to
take user input using the speech_recognition library. The function listens to the user's
voice input and returns the recognized text.
6. Function to play music: The play_music() function is defined to play music using the
pygame library. The function loads an MP3 file and plays it using the mixer module
of the pygame library.
7. Function to create a note: The note() function is defined to create a note by opening
the notepad.exe application and saving the text entered by the user in a text file.
8. Function to get the current date: The date() function is defined to get the current date
and speak it out using the pyttsx3 speech engine. The function uses the datetime
library to get the current month and day and then speaks out the month name and day
number.
9. The main function: The main function contains the logic to execute various tasks
based on user input. The function uses an infinite loop to keep listening to the user's
voice input and then performs the appropriate task based on the input.
For example, if the user says "open youtube", the function will open the YouTube website in
the default web browser. Similarly, if the user says "play music", the function will call the
play_music() function to play music using the pygame library.
Overall, the code is a good starting point for creating a voice-controlled virtual assistant in
Python. It can be expanded upon by adding more features and improving the accuracy of the
speech recognition.
CHAPTER 6
In this proposed concept effective way of implementing a Personal voice assistant, Speech
Recognition library has many in-built functions, that will let the assistant understand the
command given by user and the response will be sent back to user in voice, with Text to
Speech functions. When assistant captures the voice command given by user, the under lying
algorithms will convert the voice into text. And according to the keywords present in the text
(command given by user), respective action will be performed by the assistant. This is made
possible with the functions present in different libraries. Also, the assistant was able to
achieve all the functionalities with help of some API‘s. We had used these APIs for
functionalities like performing calculations, extracting news from web sources, and for some
other things. We will be sending a request, and through the API, we‘re getting the respective
output. API‘s like WOLFRAMALPHA, are very helpful in performing things like
calculations, making small web searches. And for getting the data from web, not every API
will have the capability to convert the raw JSON data into text. So, we used a library called
JSON, and it will help in parsing the JSON Data coming form websites, to string format. In
this way, we are able to extract news from the web sources, and send them as input to a
function for further purposes. Also, we have libraries like Random and many other libraries,
each corresponding to a different technology. We used the library OS to implement Operating
System related functionalities like Shutting down a system, or restarting a system. pyautogui
is a library that is implemented for functionalities like, capturing a screenshot. psutil is a
library, and is used for functionalities like checking battery status.
The programming language used in this project is Python, which is known for its versatility,
and availability of wide range of libraries. For programming the Virtual Assistant, we used
Microsoft Visual Studio Code (IDE) which supports Python programming language. Speech
Recognition library is present in Python, and is having some in-built functions. Initially, we
will define a function for converting the text to speech. For that, we use pyttsx3 library. We
will initialize the library instance to a variable. We use say() method and pass the text as an
argument to that, for which the output will be a voice reply. For recognizing the voice
command given by user, another function has been defined. In that function, define
microscope source and within its scope, we use respective functions and store the output in a
variable. For the whole process, we have many services to use, like Google Speech
Recognition engine, Microsoft Bing Voice Recognition engine, and products of many other
big companies like IBM, Houndify etc., For this project, we choose Google‘s Speech
Recognition Engine, that will convert the respective analog voice command into a digital text
format. We pass that text as an input to the Assistant, and it will search for the keyword. If
the input command has a word that matches with the respective word, the respective function
will be called, and it will perform the action accordingly, like telling time, or date, or telling
battery status, taking a screenshot, saving a short note, and many more.For this Personal
Virtual Assistant, the main advantage is that it saves a lot of time, and it can even handle
queries from people, of different voices. There is no rule that one has to give any exact
specified command to trigger a particular action. User has the flexibility to give command for
user, in natural language. The programming
language used to design this Voice enabled Personal Assistant for PC is PYTHON
3.8.3. And the IDE (Integrated Development Environment) that we used is Microsoft Visual
Code.
This Assistant consists of three modules. First is, assistant accepting voice input from user.
Secondly, analysing the input given by the user, and mapping it to the respective intent and
function. And the third is, the assistant giving user the result all along with voice.
Initially, the assistant will start accepting the user input. After receiving the input, the
assistant will convert the analog voice input to the digital text. If assistant was not able to
convert the voice into text, it will start asking user for the input again. If converted, it will
start analyzing the input and will map the input with particular function. And later, the output
will be given to user via the voice command.
The assistant, on starting, will initially wait for the input to be given from user. If the user
gives input command, via voice, the assistant will capture it, and searches for the keyword
present in the input command. If the assistant was able to find a key word, then it will
perform the task accordingly, and returns the output back to user, in voice. If not, the assistant
will again start waiting for the user to give input.Each of these functionalities are having their
own importance in the whole system working.
● User Input—The assistant will wait for the user to give voice command for further
processing.
● Introducing to user—The user who is asking assistant to introduce itself, will display
the following.
● Reading out news—If the user asks the assistant to read out some news, the assistant
will display the new line by line and it will also read out the news.
● Taking a sample note—If the user has a small note to be taken, he can ask the
assistant to do so, and the assistant will take the notes and save it in a notepad file.
● Showing Note—If the user asks the assistant to display the note, and to speak out the
note, the assistant will do so.
● YouTube searches—If the user asks the assistant to do some YouTube searches, the
assistant will do that. It will ask the user, what to search in YouTube. After receiving
the input, it will open the YouTube page with that respective search.
● Web Searches—If the user asks the assistant to do some web searches, the assistant
will also do that. It will ask the user to search for what, and it will open the google
search in a new tab of browser.
● Opening Applications—If the user asks the assistant to open an application, like MS
Word, or any other, the assistant will do so immediately. And also, it will speak that it
opens the application.
An AI desktop assistant model is a program that uses artificial intelligence and natural
language processing to perform tasks and answer questions for the user. The assistant can
perform tasks such as opening applications, playing music, setting reminders, making calls,
sending messages, and more.
The model uses speech recognition to understand the user's commands and convert them into
text, which is then processed to perform the relevant task. It can also access
the internet and use web scraping to provide the user with relevant information from websites
such as Wikipedia or news portals.
The AI desktop assistant model can be customized to suit the user's needs, preferences, and
behavior. It can learn from the user's interactions and improve its responses and functionality
over time. The model can also be integrated with other devices and platforms such as
smartphones, smart home devices, and email.
Overall, an AI desktop assistant model can be a powerful tool for increasing productivity,
managing tasks, and simplifying daily life. It can provide a hands-free and intuitive
experience for the user, allowing them to perform tasks with ease and efficiency.
CHAPTER 7
CONCLUSION
7.1 CONCLUSION
This paper presents a comprehensive overview of the design and development of a Voice
enabled personal assistant for pc using Python programming language. This Voice enabled
personal assistant, in today's lifestyle will be more effective in case of saving time, compared
to that of previous days. This Personal Assistant has been designed with ease of use as the
main feature. The Assistant works properly to perform some tasks given by user.
Furthermore, there are many things that this assistant is capable of doing, like turning our PC
off, or restarting it, or reciting some latest news, with just one voice command. In conclusion,
an AI desktop assistant can be a very useful tool for streamlining daily tasks, improving
productivity, and providing personalized assistance to users. With advancements in natural
language processing, machine learning, and speech recognition technologies, these assistants
are becoming increasingly sophisticated and capable of handling a wide range of tasks.
● Security: AI desktop assistants can help improve security by providing users with
secure logins and authentication processes. They can also help prevent data breaches
by identifying and flagging suspicious activity.
Overall, AI desktop assistants have the potential to transform the way we interact with our
computers and technology. As the technology continues to evolve, we can expect more
advanced and personalized features that will further enhance their utility and
usefulness.
The future of AI desktop assistants is quite exciting, as their potential applications are
limitless. The future work of AI desktop assistants will likely focus on improving their
natural language processing capabilities and expanding their functionality to perform more
complex tasks. Some possible future developments include:
2. Deeper integration with other systems: AI assistants will become more deeply
integrated with other systems and devices, allowing them to control a wider range of
tasks and functions.
8. Improved security: As AI desktop assistants become more integrated into our daily
lives, ensuring their security and protecting user data will be critical. Future
development could focus on enhancing security measures to prevent hacking and
protect user privacy.
There are several research issues that are currently being explored in the development of AI
desktop assistants. Here are a few:
4. Privacy and Security: As with any AI technology that collects data, privacy and
security are major concerns. Researchers must develop robust security protocols to
protect user data, as well as strategies for addressing potential ethical issues that may
arise as these systems become more advanced.
6. Integration with Other Systems: Finally, researchers are exploring ways to integrate
AI desktop assistants with other systems, such as smart homes, cars, and other
devices. This requires developing standardized protocols for communication between
different systems and ensuring that the assistant can operate seamlessly in different
environments.
7. Ethics: As AI desktop assistants become more advanced, they raise ethical issues
around their use. For example, what happens when an AI desktop assistant makes a
mistake that harms a user? Researchers are working on developing ethical frameworks
that can guide the development and use of AI desktop assistants.
Data privacy and security: AI assistants require access to a user's personal data, such as
contacts, calendar, and email. Ensuring the privacy and security of this data is a major
concern and requires careful implementation and monitoring.
Compatibility with different operating systems and devices: AI assistants need to be
designed to work seamlessly with different operating systems and devices, including desktop
computers, laptops, tablets, and smartphones.
Natural Language Processing (NLP): One of the key features of an AI desktop assistant is
its ability to understand natural language commands and questions. However, NLP is a
complex and evolving field, and developing accurate and efficient NLP algorithms for an
assistant can be challenging.
Natural language processing (NLP) accuracy: The accuracy of NLP algorithms used in AI
desktop assistants is crucial to their effectiveness. However, NLP algorithms can be complex
and require a large amount of data to train the models. Developers must ensure that the
algorithms are accurate enough to understand and interpret user requests and respond
appropriately.
User training and education: AI desktop assistants can only be effective if users know how
to use them. Developers must provide adequate training and education resources to help users
understand how to interact with the assistant and how to use its various features.
These are just a few of the implementation issues that can arise while developing an AI
desktop assistant. Addressing these challenges requires careful planning, design, and
development to ensure that the assistant is effective, user-friendly, and secure. Overall,
implementing an AI desktop assistant requires a multidisciplinary approach, including
expertise in machine learning, NLP, data privacy and security, software engineering, and user
experience design.
REFERENCES: -
1. Elizabeth Sucupira Furtado, Virgilio Almedia And Vasco Furtado, ―Personal Digital
Assistants: The Need Of Governance‖
2. Zecheng Zhan, Virgilio Almedia, And Meina Song, ―Table-To-Dialog: Building Dialog
Assistants To Chat With People On Behalf Of You‖
3. Yusuf Ugurlu, Murat Karabulut, Islam Mayda ―A Smart Virtual Assistant Answering
Questions About COVID-19‖ Mathangi Sri ―NLP In Virtual Assistants‖
4. Anxo Pérez, Paula Lopez-Otero, Javier Parapar. ―Designing An Open-Source Virtual
Assistant‖
5. C K Gomathy And V Geetha. Article: A Real Time Analysis Of Service Based Using
Mobile Phone Controlled Vehicle Using DTMF For Accident Prevention. International
Journal Of Computer Applications 138(2):11-13, March 2016. Published By Foundation Of
Computer Science (FCS), NY, USA,ISSN No: 0975-8887
6. Dr.C.K.Gomathy , K. Bindhu Sravya , P. Swetha , S.Chandrika Article: A Location Based
Value Prediction For Quality Of Web Service, Published By International Journal Of
Advanced Engineering Research And Science (IJAERS), Vol-3, Issue-4 , April- 2016] ISSN:
2349-6495
7. C.K.Gomathy And Dr.S.Rajalakshmi.(2014),"A Business Intelligence Network Design
For Service Oriented Architecture", International Journal Of Engineering Trends And
Technology (IJETT) ,Volume IX, Issue III, March 2014, P.No:151-154, ISSN:2231-5381.
8. ―VIRTUAL PERSONAL ASSISTANT (VPA) FOR MOBILE USERS‖
9. D. SOMESHWAR, DHARMIK BHANUSHALI, SWATI NADKARNI,
―IMPLEMENTATION OF VIRTUAL ASSISTANT WITH SIGN LANGUAGE USING
DEEP LEARNING AND TENSORFLOW‖
10. C.K.Gomathy.(2010),"Cloud Computing: Business Management For Effective Service
Oriented Architecture" International Journal Of Power Control Signal And Computation
(IJPCSC), Volume 1, Issue IV, Oct - Dec 2010, P.No:22-27, ISSN: 0976- 268X.
11. Dr.C K Gomathy, Article: A Semantic Quality Of Web Service Information Retrieval
Techniques Using Bin Rank, International Journal Of Scientific Research In Computer
Science Engineering And Information Technology ( IJSRCSEIT ) Volume 3 | Issue 1 | ISSN :
2456-3307, P.No:1563-1578, February-2018
12. Dr.C K Gomathy, Article: A Scheme Of ADHOC Communication Using Mobile Device
Networks, International Journal Of Emerging Technologies And Innovative Research (
JETIR ) Volume 5 | Issue 11 | ISSN : 2349-5162, P.No:320-326, Nov-2018.
13. Dr.C K Gomathy, Article: Supply Chain-Impact Of Importance And Technology In
Software Release Management, International Journal Of Scientific Research In Computer
Science Engineering And Information Technology ( IJSRCSEIT ) Volume 3
| Issue 6 | ISSN : 2456-3307, P.No:1-4, July-2018
14. Hemalatha. C.Kand N. Ahmed Nisar (2011)., Explored Teachers‘ Commitment In Self
Financing Engineering Colleges, International Journal Of Enterprise Innovation Management
Studies (IJEIMS), Vol2. No2. July-Dec 2011 ISSN: 0976-2698 Retrieved From
Www.Ijcns.Com
15. Dr.C K Gomathy, Article: The Efficient Automatic Water Control Level Management
Using Ultrasonic Sensor, International Journal Of Computer Applications (0975 – 8887)
Volume 176 – No. 39, July 2020.
16. C K Gomathy And V Geetha. Article: A Real Time Analysis Of Service Based Using
Mobile Phone Controlled Vehicle Using DTMF For Accident Prevention. International
Journal Of Computer Applications 138(2):11-13, March 2016. Published By Foundation Of
Computer Science (FCS), NY, USA,ISSN No: 0975-8887
16. G. O. Young, ―Synthetic structure of industrial plastics (Book style with paper title and
editor),‖ in Plastics, 2nd ed. vol. 3, J. Peters, Ed. New York: McGraw-Hill, 1964, pp. 15–64.
17. W.-K. Chen, Linear Networks and Systems (Book styl\e).Belmont, CA: Wadsworth,
1993, pp. 123–135.
18. H. Poor, An Introduction to Signal Detection and Estimation. New York: Springer-
Verlag, 1985, ch. 4.
19. B. Smith, ―An approach to graphs of linear forms (Unpublished work style),‖
unpublished. 20. E. H. Miller, ―A note on reflector arrays (Periodical style—Accepted for
publication),‖ IEEE Trans. Antennas Propagat., to be published.
21. Ardissono, L., Boella. And Lesmo, L. (2000) ―A Plan-Based AgentArchitecture for
Interpreting Natural Language Dialogue‖, International Journal of Human-Computer Studies.
22. Nguyen, A. and Wobcke, W. (2005), ―An Agent-Based Approach to Dialogue
Management in Personal Assistant‖, Proceedings of the 2005 International Conference on
Intelligent User Interfaces.
23. Jurafsky & Martin. Speech and Language Processing – An Introduction to Natural
Language Processing, Computational Linguistics, and Speech Recognition. Prentice- Hall
Inc., New Jersey,2000.
24. Wobcke, W., Ho. V., Nguyen, A. and Krzywicki, A. (2005), ― A BDI Agent
Architecture for Dialogue Modeling and Coordination in a Smart Personal Assistant‖,
Proceedings of the 2005 IEEE/WIC /ACM International Conference on Intelligent Agent
Technology.
25. Knote, R., Janson, A., Eigenbrod, L. and Söllner, M., 2018. The What and How of Smart
Personal Assistants: Principles and Application Domains for IS Research.
26. Feng, H., Fawaz, K. and Shin, K.G., 2017, October. Continuous authentication for
voice assistants. In Proceedings of the 23rd Annual International Conference on Mobile
Computing and Networking (pp. 343- 355). ACM.
27. Canbek, N.G. and Mutlu, M.E., 2016. On the track of artificial intelligence: Learning
with intelligent personal assistants. Journal of Human Sciences, 13(1), pp.592-601.
28. Hwang, I., Jung, J., Kim, J., Shin, Y. and Seol, J.S., 2017, March. Architecture for
Automatic Generation of User Interaction Guides with Intelligent Assistant. In Advanced
Information Networking and Applications Workshops (WAINA), 2017 31st International
Conference on (pp. 352-355). IEEE.
29. Buck, J.W., Perugini, S. and Nguyen, T.V., 2018, January. Natural Language, Mixed-
initiative Personal Assistant Agents. In Proceedings of the 12th International Conference on
Ubiquitous.
APPENDIX
A. Source Code:-
def play_music():
pygame.init()
pygame.mixer.init() #
Load the MP3 file
pygame.mixer.music.load("C:/Users/srpandey/Music/Kesariya - Brahmastra.mp3") # Play
the audio
pygame.mixer.music.play()
# Wait for the audio to finish playing
while pygame.mixer.music.get_busy():
pygame.time.wait(1)
# Close the mixer and pygame
pygame.mixer.music.stop()
pygame.mixer.quit()
pygame.quit()
def note(text):
date = datetime.datetime.now()
file_name = str(date).replace(":", "-") + "-note.txt"
subprocess.Popen(["notepad.exe", file_name])
def date():
now = datetime.datetime.now()
month_name = now.month
day_name = now.day
month_names = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September',
'October', 'November', 'December']
ordinalnames = [ '1st', '2nd', '3rd', ' 4th', '5th', '6th', '7th', '8th', '9th', '10th', '11th', '12th',
'13th', '14th', '15th', '16th', '17th', '18th', '19th', '20th', '21st', '22nd', '23rd','24rd', '25th', '26th', '27th',
'28th', '29th', '30th', '31st']
elif 'who are you' in query or 'what can you do' in query:
speak(
'I am your personal assistant. I am programmed to minor tasks like opening youtube,
google chrome, gmail and search wikipedia etcetra')
elif "who made you" in query or "who created you" in query or "who discovered you" in
query:
speak("I was built by Sri Meenakshi Pandey & S Sindu")
● User Input—The assistant will wait for the user to give voice command for further
processing and if it doesn‘t understand, it asks the user to say again.
● Introducing to user—The user who is asking assistant to introduce itself, will display
the following and introduce him.
● Reading out news—If the user asks the assistant to read out some news, the assistant
will display the new line by line and it will also read out the news.
● Opening Applications—If the user asks the assistant to open an application, like MS
Word, or any other, the assistant will do so immediately. And also, it will speak that it
opens the application.
● Taking a sample note—If the user has a small note to be taken, he can ask the
assistant to do so, and the assistant will take the notes and save it in a notepad file.
● Telling a joke—If the user asks the assistant to tell a joke, he speaks a joke randomly.
● Searching in Wikipedia—If the user asks the assistant to search about a certain topic
in Wikipedia, the assistant search about it and speaks and displays two lines about it.
● Opening a file—If the user asks the assistant to open a certain file, the assistant opens
and displays it.
● Playing music—If the user asks the assistant to play music, the assistant plays the
directed path album in the background.
● Quitting the application—If the user asks the assistant to quit, the assistant greets the
user to have a good day and stops the application.
C. Research Paper
Abstract
Artificial Intelligence has been fast emerging as a noteworthy technology that has the capability to
revolutionize the cognitive behaviour of humans by simulating their intelligence for the betterment of
the mankind. AI consists of multi-functional technologies which plays a significant role in our
everyday lives like home automation where controlling the computer and performing multiple tasks
using voice commands to remote monitoring and control activities. This study is aimed at designing an
AI based virtual assistant that acts as a human language interface through automation and voice
recognition based interaction from human based on Python language . Instructions for voice assistant
are implemented in accordance with user requirement .The most successful Speech recognition
software‘s like Alexa, Siri, etc has been the brainchild of AI technology. Speech Recognition API in
python converts speech into text thereby sending and receiving the emails without typing, searching
the keywords in Google without even opening the browser along with carrying out several tasks like
playing music, scheduling meetings, checking mails etc., has been made possible through the help of
this AI based virtual Assistant software. In the present scenario, innovation in digital technologies has
resulted in increased effectiveness and accurateness of several tasks that would have required large
amount of human effort and resources. Through utilization of AI in every domain, remarkable
transformations have resulted in reduced time and labour. Thus AI based voice assistant software
offers highly accurate and efficient solution to minimise human effort and time while performing a task
that imitates a human assistant to carry out any particular task. Muti-functional aspects like voice
commands, sending emails, reading PDF, sending text on WhatsApp, opening a command prompt or
IDE, playing music, performing keyword searches in Wikipedia , giving weather forecast, desktop
reminders of your choice etc are some of the major operations that can be performed by the developed
AI based virtual assistant which also possess certain basic conversational abilities. Multiple python
libraries and speech recognition tools has been utilised for the project. A live GUI has been designed
for interacting with the AI virtual Assistant as it presents an elegant design framework to carry out the
necessary conversation.
Source: www.Slanglabs.com
Siri has been acquired by Apple which is Along with the most popular and widely sought
unequivocally considered to be the most well- after voice assistants discussed above, there are
known and extremely efficient voice assistant that still more smart voice assistants like Facebook‘s
can perform a wide range of operations like M and Microsoft ‗s Cortana etc.,. As people are
sending text messages, scheduling the meetings, more integrated with modern technologies and
dialling phone calls, ,activate battery power rapid proliferation of IoT, AI technology based
optimising mode, enable DND etc., Siri has the evolution in the form of voice assistants is
ability to respond user queries, transmit electronic unambiguously sure to get this innovative
communications through mails, activate alarms, technology to subsequent levels.
can carry out reservation in restaurants, provides
directions to places by its interpretation Our AI based voice assistant has been designed
knowledge of natural languages. Inspite of all the with the following objectives in mind
benefits that Siri has been offering, it has its own
drawbacks like it can operate only in Apple • To design effective personal assistant
devices, requires an active internet connection to software that uses semantic data sources
operate, Siri works well with English commands, available on the internet, user generated
but must be spoken clearly to understand, content and knowledge acquired from
conversing in a rapid manner or using strong knowledge databases.
accents cannot be identified by siri as its learning • To efficiently answer questions posed by
abilities may get reduced and it may impact its users with respect to various domains like
understanding of user‘s query. Moreover business environment, website details,
constraints like inaudible noise, presence of together with an appropriate chat
background disturbances and poor quality interface.
acoustic from headsets can also limit the voice • To efficiently save the time and efforts by
assistant functions. Siri thus requires seamless presenting a systematic understanding on
Wi-Fi connection for its effective operational several information through detailed
abilities. research and then making the report terms
of our understanding.
ALEXA(2014) • By presenting a rapid voice search
mechanism where more time can be
Amazon Echo was launched by Amazon which is saved.
a smart speaker that facilitates users with voice
assistant called Alexa that has been designed as The organization of research is as follows: initial
an internal strategy to help Amazon focus and section presents a detailed introduction on voice
enhance its customer base and further increase assistant technology along with recent techniques
revenue through facilitating online shopping available in market; second section offers a
experience. The main benefits are it‘s easier to systematic review on various AI based voice
operate process, nonstop music, shopping, timers assistants and their benefits and limitations. The
etc., but the limitations are its mishearing and section 3 is dedicated to our proposed
slower response rates. methodology followed by results discussion and
analysis in section 4.The study concludes by
presenting a summary of our research in
conclusion and future enhancements section.
Alexa is the voice service of all Amazon services
2. Literature Survey like Echo dot and Show that enables customers to
undergo personalized understanding by offering
Technological companies like Microsoft, Google, facilities to realize their skillfulness. Companies
Apple and Amazon have been making use of NLP like Uber, Capital One, Starbucks etc., makes use
to design and develop their AI voice assistants. of Alexa-enabled gadgets to enhance their
The major techniques employed by these software businesses. Following are some of the common
companies involve several processes right from tasks performed by voice assistants like
transforming their work flow and enhancing the
performance of their Personal Assistants in order • Setting Reminders and alarms
to be compatible with their device handling by • Sending and receiving messages
taking into account its compatibility and • Creating calendar entries
complexity. While Google has worked towards • Email briefing
improving its voice assistant‘s capabilities • Scheduling meetings
through making use of deep learning • Play music
methodologies to focus more on dialogue systems • Entertainment
.Microsoft employs ML tools and other NN based • Gaming
facilities to improve the Cortana‘s language • Weather forecast
processing abilities. Amazon undertakes Speech • Voice based home automation
Recognition technology based functionality to
• Multi-language answering abilities
convert speech to text, and certain positive
• Location information
reception of users like tongue appreciations to
• Maps
understand the different dialects, tones and
• Cloud and other online services
several nuances of text, thereby letting the
researchers and developers to design voice
assistants to enhance customer‘s experiences and
facilitate realistic dialogue exchanging capacities. The voice assistants discussed above have certain
limitations like most of the time is consumed in
Majority of voice assistants possess feministic entering the entries than actual work getting done
tone even though users can modify the voice in and often they don‘t possess or manage a detailed
accordance to their preferences. As voice knowledge database of their own and their
Assistants allows us to inquire about everything, perceptive comprehension mainly arises from
be it, location or weather or entertainment data acquired from domain as well as data
options, it also lets us to access translated models.
information in almost over 100 languages. This
feature of Google Assistant helps in home
automation to control the home from remote 3. Proposed System Architecture
places, favorite playlist can be played, and all
these functions can be carried out from our smart Majority of the famous and most widely used
phones throughout making use of hands-free existing voice assistants uses NLP, and speech
mode of speech identification process. recognition technologies to accomplish the task
of accurate recognition functionalities. By
Cortana is perhaps considered to be the leading listening to the directives issued by users, the
archetypal device that comprises of multiple requirements are understood and specific
sensors to sense its environmental surroundings. function in performed in an efficient manner.
As a part of Windows Shell, the special abilities Artificial Intelligence has been used to generate
of cortana in scheduling and assigning meetings accurate results and reduce the overall labour and
together with a Bot Framework to build skills time while carrying out the specific task. As far
needed to engage in conversation with other as conventional typing is concerned, it has been
digital assistants. It also learns about our time to reduced completely and this assistant has been
be as useful in offering suitable answers along designed to imitate a human assistant in
with completing basic tasks. facilitating an effective operation in hand. The
algorithm used focuses more on the time
complexities and reduces time. In order to use
virtual voice assistants its mandatory to have
accounts like Google for
Google assistant, Microsoft account for Cortana requesting to carry out the conversation in
etc.,and can be used only with internet particular language i.e., French.
connection. Our software is versatile and can be
integrated with several devices like, mobile
phones, laptops, speakers etc.
• Sending emails
• Reads PDF
• Sends text on WhatsApp
• Opens command prompt
• Opens our favourite IDE, notepad
• Plays music
• Does Wikipedia searches
• Opens websites in a web browser
• Gives weather reports
• Choice of setting up Desktop reminders.
• Performs basic conversation.
Operational efficiency:
Serviceability:
Reliability:
Durability: