0% found this document useful (0 votes)
14 views

CPP Project Report

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

CPP Project Report

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

A Capstone Project Planning (CPP – 22058)

Report on

Voice Assistance (AI)


Submitted in partial fulfilment for the award of Fifth Semester of Diploma
in Computer Engineering

By

Tejas Dilip Gokhale


Vishakha Sharad Gangurde
Nandini Rakesh Patil
Diksha Rajendra Borse
Jidnyesh Bhausaheb Borse

Guided by Ms. N.C.Borse

DEPARTMENT OF COMPUTER ENGINEERING


S.M.D.R. Government Polytechnic, Dhule (Institute Code: 0017)
2024-2025
MAHARASHTRA STATE BOARD OF TECHNICAL EDUCATION,
MUMBAI

CERTIFICATE
This is to certify that,
Sr. Roll Name Enrollment Exam Seat
No. No.
Number Number
1 307 Nandini Rakesh Patil 2200170102

2 317 Jidnyesh Bhausaheb Borse 2200170113

3 318 Diksha Rajendra Borse 2200170114

4 321 Vishakha Sharad Gangurde 2200170120

5 323 Tejas Dilip Gokhale 2200170123

From S.M.D.R. Goverment Polytechnic, Dhule have completed Capstone


Project Planning Report (CPP–22058) having title “Voice Assistance” during
the academic year 2024-2025. The project is completed in a group consisting of 5
persons under the guidance of the Faculty Guide.

Date: 29 / 11 / 2024

Place: Dhule

Signature: Signature:
Name: N.C.Borse Name: N.C.Borse
Guide Head
ABSTRACT

The Voice Assistant Project is designed to create an intelligent, interactive voice-controlled


assistant that can perform tasks, answer questions, and abstract information based on spoken
user input. By utilizing speech recognition, natural language processing (NLP), and text-to-
speech (TTS) technologies, the assistant can recognize and interpret voice commands and
provide contextual responses. It leverages external knowledge sources like Wikipedia to fetch
real-time information, enabling the assistant to answer a wide range of queries, from general
knowledge to specific details about people, places, or events.

This project uses Python programming along with libraries such as SpeechRecognition for
converting speech to text, pyttsx3 for generating speech responses, and wikipedia for
abstracting information. The voice assistant can respond to queries related to the time, date, or
factual questions (e.g., "Who is Albert Einstein?"). It continuously listens for commands,
processes the input, and provides appropriate verbal feedback, creating a seamless and
efficient user experience.

The system also incorporates error handling mechanisms to deal with unclear or unavailable
information, ensuring robust performance in diverse scenarios. Overall, the Voice Assistant
Project aims to enhance user interaction with technology by providing an accessible, hands-free
method for information retrieval and task management. This project serves as a foundation for
developing more advanced voice-based systems capable of understanding complex user
queries and automating various everyday tasks.
Content

Chapter Title Page


Number
Number

1 Introduction or Background of Industry v

2 Literature survey vi

2.1 Study of existing system / Review of research paper vi

2.2 Limitations of existing system / Problems discussed in research vii


papers

2.3 Problem Identification / Need of a system vii

2.4 Problem definition viii

3 Specifications ix

3.1 User requirements ix

3.2 System requirements


ix
3.3 System Developement ix

4 Proposed methodology x

4.1 Proposed work xi

4.2 Proposed design xii

5 Week wise Action Plan for Sixth semester xiv

6 References xv
Chapter 1: Introduction or Background of Industry

The voice assistant industry has rapidly evolved in recent years, becoming an essential
part of daily life. Voice assistants like Google Assistant, Amazon Alexa, Apple Siri,
and Microsoft Cortana are now integrated into smartphones, smart speakers, home
automation systems, and cars. These systems use advanced AI technologies such as
speech recognition, natural language processing (NLP), and text-to-speech (TTS)
to interact with users through voice commands.

Voice assistants simplify human-computer interaction by allowing users to perform tasks


hands-free. Python, with its extensive libraries for speech processing, NLP, and TTS,
has emerged as a popular choice for developing custom voice assistants. Libraries such
as SpeechRecognition, pyttsx3, nltk, and spaCy allow developers to create versatile
voice assistants that can understand speech, process commands, and deliver
responses in a natural-sounding voice. The growing demand for personalized, offline,
and privacy-focused voice assistants has sparked innovation, and Python provides the
flexibility to create such systems.
Chapter 2: Literature Survey

2.1 Study of Existing System / Review of Research Papers

Research and development in the field of voice assistants have focused on improving
speech recognition accuracy, enhancing natural language understanding, and creating
more robust and responsive systems. Several systems have been built using Python to
create voice assistants:

 Mycroft: An open-source voice assistant built on Python, which allows


customization and integration with a variety of platforms and devices. It is
capable of natural language understanding and performs tasks such as web
searching, setting reminders, and controlling IoT devices.

 Jarvis: A Python-based voice assistant that replicates some functionalities seen


in movies like “Iron Man”. It uses libraries like SpeechRecognition, pyttsx3, and
nltk for voice recognition, speech synthesis, and natural language processing.

 Google Assistant & Amazon Alexa: Though commercial systems, they inspire
custom voice assistant development. Python is commonly used to create skills or
routines for Alexa, or integrate Google Assistant's API with Python applications.

Research papers, such as those by Smith et al., 2019 and Jones et al., 2020,
emphasize using machine learning algorithms and neural networks to improve the
accuracy of speech recognition. Additionally, research in multi-turn conversation
(ability to handle ongoing dialogues) and contextual understanding has contributed to
the development of voice assistants capable of complex interactions.
2.2 Limitations of Existing System / Problems Discussed in Research Papers

Despite the growth of voice assistants, some major limitations persist:

 Speech Recognition in Noisy Environments: Current speech recognition


systems struggle to deliver accurate results in noisy settings, especially in real-
time processing. Many existing systems work best in ideal conditions and require
further refinement for real-world application.

 Limited Contextual Understanding: Most existing voice assistants can handle


simple, one-turn queries but fail when dealing with multi-turn conversations. For
example, systems struggle to maintain context or handle ambiguous queries in
natural conversations.

 Dependency on Cloud Services: Many commercial voice assistants rely on


cloud computing for speech recognition, natural language processing, and
database queries. This raises privacy concerns and limits functionality in areas
with weak or no internet connectivity.

 Personalization Challenges: Existing systems offer limited customization


options and struggle to adapt responses to individual user preferences.

2.3 Problem Identification / Need of a System

There is a need for a Python-based voice assistant that can overcome the limitations
of current systems. Some of the key problems include:

 Accurate Speech Recognition in Noisy Environments: Building a system that


works efficiently in a variety of environments and handles different accents,
speech patterns, and background noise.

 Improved Contextual Understanding: A voice assistant capable of


understanding and retaining the context of multi-turn conversations to simulate a
more human-like interaction.
 Privacy-Focused and Offline Functionality: Users are increasingly concerned
about data privacy. A local, offline voice assistant would reduce dependence on
cloud services, ensuring better privacy and control over data.

 Enhanced Personalization: A system that can be customized to fit user


preferences, making interactions more intuitive and personalized.

2.4 Problem Definition

The goal of this project is to develop a Python-based voice assistant that:

 Accurately converts speech into text, even in noisy environments, using libraries
like SpeechRecognition and pyaudio.
 Provides meaningful, context-aware responses using advanced natural language
processing (NLP) techniques with libraries like nltk or spaCy.
 Operates offline to ensure privacy and minimize reliance on cloud services.
 Allows users to customize responses, commands, and functionalities according
to personal preferences.
Chapter 3: Specifications

3.1 User Requirements

 Voice-based Interaction: Users should be able to interact with the assistant


using natural language and receive appropriate responses.
 Customization: The system should allow users to set preferences, modify
responses, and add custom commands.
 Real-time Processing: The assistant should process voice input in real-time and
provide prompt responses.
 Privacy: User data, including voice recordings, should not be transmitted to
external servers, ensuring confidentiality.
 Offline Operation: Basic functionalities (e.g., setting reminders, answering
questions) should be available offline.

3.2 System Requirements

 Hardware: A device with a microphone and speakers for input and output.
 Software: Python 3.x, along with libraries like SpeechRecognition, pyttsx3, nltk,
spaCy, wikipedia, and pyaudio.
 Operating System: Cross-platform compatibility (Windows, Linux, macOS).
 Additional: Access to APIs or custom databases for additional features
(weather, news, etc.), and internet for optional features like Wikipedia queries.

3.3 System Development

Tools and technologies used Language used:


Python 3 Modules used :

● pyttsx3 (imports voices and has functions related to speaking)


● datetime (#not important .)
● speech_recognition (to convert speech to text)
● wikipedia (to access Wikipedia information)
● webbrowser (to manipulate web browsing operations)
● os (for just os.clear())
● pywhatkit (for playing songs on youtube)
Chapter 4: Proposed Methodology

4.1 Proposed Work

In this E-R Diagram diagram shows entities and their relationship for a virtual assistant
system. We have a user of a system who can have their keys and values. It can be
used to store any information about the user. Say, for key “name” value can be “Jim”.
For some keys user might like to keep secure. There he can enable lock and set a
password (voice clip).

4.1.1 ER Diagram

Fig 1: ER Diagram
4.1.2 Use Case Diagram
Initially, the system is in idle mode. As it receives any wake up cal it begins execution.
The received command is identified whether it is a questionnaire or a task to be
performed. Specific action is taken accordingly. After the Question is being answered or
the task is being performed, the system waits for another command. This loop continues
unless it receives quit command. At that moment, it goes back to sleep .

Fig 2: User Case Diagram


4.1.3 Block Diagram
.

4.2 Proposed Design

The system will follow a modular design:

 Speech Input Module: Responsible for capturing audio from the microphone
and converting it to text using speech recognition algorithms.
 NLP Processing Module: Uses NLP libraries to process text and determine
intent (e.g., identify commands, questions).
 Response Generation Module: Generates an appropriate response based on
the intent, either by fetching data from external APIs or processing predefined
commands.
 Text-to-Speech Output Module: Converts the generated response into audio
and plays it back to the user.
Fig 2.1.1: Interaction Sequencial Diagram

Fig 2.1.2: Deployment Diagram


Chapter 5: Week Wise Action Plan for Sixth Semester

Week Task Dates

Initial Research: Study existing systems, identify 01/01/2025 to


Week 1
requirements, and select tools/libraries. 08/01/2025

Setup Development Environment: Install Python, libraries 09/01/2025 to


Week 2
(SpeechRecognition, pyttsx3, pyaudio). 16/01/2025

Implement Basic Speech Recognition: Create the foundation 17/01/2025 to


Week 3
for voice input capture and speech-to-text conversion. 24/01/2025

Develop Text-to-Speech (TTS) System: Implement the 25/01/2025 to


Week 4
pyttsx3 module for generating voice responses. 01/02/2025

Implement Simple Commands: Develop basic voice 28/02/2025 to


Week 5
commands (e.g., "What's the time?"). 03/02/2025

Integrate Wikipedia API: Implement functionality for 04/02/2025 to


Week 6
answering questions using Wikipedia. 11/02/2025

Add Natural Language Processing (NLP): Enhance the 12/02/2025 to


Week 7
system to process more complex commands using NLP. 19/02/2025

20/02/2025 to
Week 8 Offline Functionality: Ensure core functions can work offline.
27/02/2025

Add Personalization Features: Allow users to customize 28/02/2025 to


Week 9
commands and responses. 07/02/2025

Week Integration and Testing: Integrate all modules, test for bugs 01/02/2025 to
10 and refine the system. 08/02/2025

Week Final Documentation: Prepare the project report and 08/02/2025 to


11 documentation. 15/02/2025

Week 16/02/2025 to
Final Presentation: Prepare and present the final project.
12 23/02/2025
Chapter 6: References
1. Smith, J., et al. (2019). Speech Recognition in Noisy Environments. Journal of
Machine Learning Research.
2. Jones, R., et al. (2020). Enhancing NLP in Voice Assistants: Current Trends and
Challenges. Natural Language Engineering.

www.youtube.com
codewithharry.com
kaggle towardsdatascience.co

You might also like