0% found this document useful (0 votes)
5 views47 pages

Project Report Ai Assistant Part b Final 1

The document outlines the development of a desktop-based AI voice assistant that utilizes speech recognition and text-to-speech technology for user interaction, integrating a locally hosted AI model (Ollama) for enhanced privacy and functionality. It details the system's capabilities, hardware and software requirements, and differentiates itself from existing voice assistants by operating without reliance on cloud services. The project aims to provide a user-friendly interface and perform various tasks, ensuring a responsive and secure user experience.

Uploaded by

kushalinaction
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views47 pages

Project Report Ai Assistant Part b Final 1

The document outlines the development of a desktop-based AI voice assistant that utilizes speech recognition and text-to-speech technology for user interaction, integrating a locally hosted AI model (Ollama) for enhanced privacy and functionality. It details the system's capabilities, hardware and software requirements, and differentiates itself from existing voice assistants by operating without reliance on cloud services. The project aims to provide a user-friendly interface and perform various tasks, ensuring a responsive and secure user experience.

Uploaded by

kushalinaction
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

CHAPTER 1

1. INTRODUCTION

1.1 PROJECT DESCRIPTION:

This project presents voice assistant, a desktop-based AI assistant designed to interact with
users through natural voice commands. It utilizes speech recognition to understand spoken
input and responds using text-to-speech technology, creating a hands-free, intelligent user
experience.

The assistant is capable of performing a variety of tasks such as telling the current date and
time, opening websites like Google and YouTube, conducting Wikipedia searches,
launching system applications like Notepad or File Explorer, monitoring CPU usage, and
telling jokes.

For more complex queries and conversations, assistant. integrates a locally hosted AI
model through Ollama, allowing it to provide smart, AI-generated responses. The project
includes a user-friendly Tkinter-based GUI with an animated background and chat
interface, enhancing both the usability and visual appeal of the assistant.

This system showcases the integration of artificial intelligence, voice processing, and GUI
design into a functional and interactive personal assistant.

pg. 1
CHAPTER 2
2. LITERATURE SURVEY

The development of intelligent virtual assistants has seen significant advancements in


recent years, largely driven by innovations in natural language processing (NLP), machine
learning, and speech recognition technologies. Voice assistants like Amazon Alexa,
Google Assistant, and Apple’s Siri have become mainstream examples of AI-driven
systems capable of performing tasks via voice commands. These assistants leverage
complex NLP models and are integrated with cloud-based services, allowing them to
execute commands such as setting reminders, playing music, and controlling smart
devices. However, most commercial systems rely heavily on internet connectivity and
cloud-based AI models, which can limit their functionality and privacy for users.

In contrast, this project leverages Ollama, an AI model that can run locally on a user's
system, providing a more privacy-conscious solution for conversational interactions.
Ollama allows for sophisticated AI-powered responses without relying on external servers,
ensuring that the user's data remains secure and private. Furthermore, the project combines
speech recognition (via Speech Recognition) with text-to-speech technology (using
pyttsx3) to create a seamless voice-interactive interface.

Additionally, the project incorporates system-level operations, such as controlling


applications and checking system performance, using Python libraries like psutil and OS.
The Tkinter-based graphical user interface (GUI) offers a modern, intuitive layout,
enhancing the user experience by providing visual feedback and interaction alongside
voice commands. This project builds upon previous research in voice assistants and AI
integration while introducing a unique local, privacy-focused approach with a
comprehensive set of features for both personal and system management tasks.

2.1 Existing System:

In existing system of assistant. aims to differentiate itself by incorporating a local AI model


(Ollama) into its backend, allowing users to have more privacy and control over their data.
By using local processing for conversational responses and integrating speech recognition
and text-to-speech systems, assistant. can perform complex tasks without depending

on cloud services, making it more secure and efficient for users concerned about privacy.
Additionally, the system is equipped with built-in functionality for executing operating

pg. 2
system tasks, such as opening files and applications, checking CPU usage, and running
Google searches—all triggered via voice commands.

However, many existing systems come with constraints:

 Dependency on Backend Services: Most assistants rely on cloud-based NLP and AI


processing, requiring constant internet connectivity.
 Complex Setup: Existing commercial systems are closed-source and hard to
customize. In contrast, your project is open-source, lightweight, and can be easily
modified or extended using Python.
 Limited Accessibility: Traditional assistants are tightly integrated into mobile or smart
ecosystems and lack deep integration with desktop OS-level functionalities.
 Security Risks: Cloud-based assistants send voice data to external servers for
processing, raising privacy concerns. Your project processes data locally using offline
libraries and the Ollama model, ensuring better security and user data protection.

2.2 proposed System:

The proposed system, voice assistant., is a desktop-based AI assistant that interacts with
users via voice commands. It uses speech recognition, TTS, and a locally hosted AI model
(Ollama) for smart responses. The system performs OS-level operations, web searches, and
provides a responsive GUI using Tkinter. It ensures user privacy, speed, and an interactive
experience without relying on cloud services.

Key improvements and highlights include:

 Local AI processing using Ollama (no API or internet dependency).


 Integrated voice command system with real-time feedback.
 Functional desktop operations like opening apps, files, and system checks.
 Modern Tkinter GUI with animated background and chat-style interaction.

pg. 3
CHAPTER 3
3. HARDWARE AND SOFTWARE REQUIREMENTS

Introduction
HARDWARE: - The hardware requirements for the voice assistant. project is minimal, as
it primarily operates on a standard personal computer.

SOFTWARE: - The software environment required for voice assistant. includes various
tools and libraries that enable voice recognition, AI interaction, and system control.

3.1 HARDWARE REQUIREMENTS:

 Processor: Intel(R) core


 Processor Speed: 3.50 GHz
 RAM: Minimum 160GB
 Hard Disk: Minimum 150 MB (for storing source code files)
 Display: Any modern desktop displays with a minimum resolution of 1024x768 that
supports a graphical user interface (GUI) via Tkinter. The system requires a
microphone for speech input and speakers/headphones for audio output.

3.2 SOFTWARE REQUIREMENT SPECIFICATION:

 Language: Python 3.8+,

 API Used: Ollama API (for AI conversation)

 Text Editor: Visual Studio Code (VS Code)

 Operating system: Windows / macOS / Linux

 Browser Compatibility: Not applicable, as the system operates as a desktop

application with a Tkinter GUINo Backend or Database Required.

pg. 4
CHAPTER 4

4. SOFTWARE REQUIREMENTS SPECIFICATION


Introduction:

The Software Requirements Specification (SRS) for the voice assistant outlines the
functional and non-functional requirements necessary for the development and deployment
of the system. This section provides a detailed description of the software components,
dependencies, and the environment required to run the voice assistant.

The system is designed to function as an intelligent voice assistant capable of performing


a variety of tasks such as web browsing, system monitoring, playing media, and providing
real-time responses to user queries. This specification ensures the system is efficient,
interactive, and can be integrated seamlessly into a desktop environment.

The software environment includes the use of Python, various third-party libraries for
speech recognition, text-to-speech synthesis, and the Ollama API for AI-powered
responses. This document will detail the software architecture, libraries, and dependencies
required for the successful implementation of the project.

4.1 Users of the System:


The system has Two user levels.

a) End User (Visitor):

 Access Voice Assistant: The end user interacts with the voice assistant. system through
voice commands, without the need for registration or login.
 Initiate Conversation: The user can issue voice commands or ask questions, and the
system responds via speech.
 Receive Real-Time Responses: The system provides context-aware, AI-generated
responses powered by the Ollama API.
 Perform Tasks: The user can instruct the assistant to perform tasks such as opening
apps, searching Google, playing YouTube videos, and fetching system information
(e.g., CPU usage, time, date).
 Error Handling: If the system does not understand the input, it provides fullback
responses such as "Sorry, I didn't catch that."
 No Registration or Login: The system is designed for instant use; no user
authentication is required, making it easy for anyone to use the assistant.

pg. 5
b) Voice Assistant (AI):

 Process User Commands: voice assistant interprets voice commands and converts
them into actions, such as opening applications or performing system operations.
 Generate Responses: The system generates natural language responses, such as telling
the time, providing weather updates, or offering a joke based on the user's request.
 Error Response: If voice assistant. cannot process the command, it provides fall back
responses like "I'm not sure how to respond to that."
 No Memory Between Sessions: The assistant does not retain any user data between
sessions. Each conversation is independent, ensuring user privacy.
 Delayed Interaction: The system may have slight response delays depending on
system performance or voice recognition speed, especially on lower-end hardware.
 Continuous Learning (Future Scope): In the future, the system could be enhanced to
learn from user interactions and real-time, processing and responding immediately
without delays. Improve its accuracy and responsiveness over time.

General Constraints:
General constraints include the following:

 Hardware Dependency: The system requires a desktop or laptop with a functional


microphone and speakers/headphones for proper voice interaction.

 Internet Connectivity: Some features like Wikipedia search, Google search, and Ollama
responses require an active internet connection.

 Platform Limitation: Currently optimized for Windows OS; compatibility on other


operating systems (Linux/macOS) may require additional configuration.

 Voice Recognition Accuracy: Performance depends on ambient noise conditions and


clarity of user speech—noisy environments may reduce accuracy.

 Local Ollama Server: The AI chat functionality relies on a locally hosted Ollama
model running on port 11434, which must be active before launching the assistant.

 Single User Support: The system is designed for individual use and does not support
multi-user profiles or authentication.

pg. 6
Assumptions and Dependencies:

 The system is assumed to run on a desktop environment with Windows OS (though it


may be compatible with Linux/macOS with some adjustments).

 A working microphone and speakers/headphones are required for voice interaction


(speech recognition and text-to-speech).

 The user has Python 3.8 or higher installed along with necessary libraries (pyttsx3,
speech_recognition, wikipedia, requests, psutil, tkinter, etc.).

 The Ollama API (running locally) must be properly set up and active for AI-based
question answering to function.

 A stable internet connection is required for some features (e.g., Wikipedia, Google
search, YouTube access).

 It is assumed the user has basic technical knowledge to launch the Python application.

 Dependencies include third-party libraries and APIs which must be properly installed
and configured.

Specific Requirements:

External Interface Requirements:

User Interface:
A user interface is the point of interaction between the user and the system. For the voice
assistant, the UI is kept minimal and user-friendly to ensure smooth communication with
the assistant

We have taken the following requirements during design:

 The system provides a desktop-based graphical user interface (GUI) built using
Tkinter.
 The GUI features an animated GIF background to simulate a futuristic, interactive
assistant environment.

pg. 7
 A scrollable chat area displays ongoing conversations between the user and voice
assistant, updating in real-time.
 Users can interact with assistant using voice commands, and the responses are both
spoken aloud and shown in the chat interface.
 The GUI remains minimal, clean, and functional, avoiding complex navigation for a
simple user experience.
 The interface automatically restarts listening after each interaction, allowing for hands-
free operation.
 Error messages or system notifications (e.g., "Didn't catch that") are displayed in the
chat for better clarity.
 No login or authentication is required, ensuring quick access for the user.

4.2 Functional Requirements:

a) End User (Visitor):

 Interaction with Chabot: The user can interact with assistant by issuing voice
commands. assistant processes user queries and responds via both voice and text in
the chat interface.
 Access System Features: Users can request assistant. to perform various tasks like
opening applications, checking the time and date, searching the web, and providing
weather information.
 Error Handling: If the system fails to understand a command or encounters an issue,
it provides fallback responses (e.g., “I didn't catch that”).
 Real-Time Interaction: The system operates in real-time, responding to commands
almost instantly with a slight delay in some cases.
 Provide Feedback: After a task completion or response, the user can provide
feedback (e.g., thumbs up/thumbs down or text comments).

b) VOICE ASSISTANT (AI Assistant):

 Voice Command Processing: assistant. listens to voice commands, converts them to


text, and processes them accordingly.
 Generate Contextual Responses: Based on the user query, assistant will generate
appropriate, context-aware responses, either from its built-in logic or by querying
external sources like Wikipedia or Google.

pg. 8
 Perform System Operations: assistant can execute OS-level operations, such as
opening applications (Notepad, File Explorer) and checking system statistics (e.g.,
CPU usage).
 Handle Errors Gracefully: When unable to process a command, assistant provides
an error response and offers suggestions for rephrasing or retrying.
 Continuous Learning (Future Scope): In future versions, assistant can learn from
user feedback to improve its accuracy and responses.

Performance Requirements:

 Voice Response Speed: assistant should deliver voice responses without noticeable
delays, ensuring a natural flow of conversation.
 Real-Time Response: The system should provide near-instant responses to user
queries, with minimal delays, ensuring a smooth interaction.
 Accuracy of Voice Recognition: The system must recognize voice commands with
over 90% accuracy, even in environments with moderate background noise.
 Responsiveness of the GUI: The user interface should remain responsive and
smoothly update the chat in real-time without lag, even when performing complex
tasks.
 Resource Utilization: The system should consume minimal CPU and memory
resources to run efficiently on most desktop systems.
 Error Handling Performance: The system should quickly process errors and provide
helpful fall back messages within a few seconds.

Design Constraints:

 Platform Dependency: The system is designed primarily for desktop platforms


(Windows/Linux) and may not function on mobile devices without significant
modifications.
 Hardware Requirements: A functioning microphone and speakers/headphones are
necessary for full voice interaction capability.
 Offline Limitations: Some features like Wikipedia search or web browsing require
an active internet connection.
 Ollama Dependency: The AI conversational system depends on the locally running
Ollama model; if it’s not running, AI responses won’t work.
 No Cloud Integration: The system does not currently support cloud-based data
storage or multi-device syncing.
 Tkinter GUI Limitations: The user interface is limited by the capabilities of Tkinter
and may not support advanced animations or modern design components.
pg. 9
 Single-User Design: The system is intended for single-user usage and does not
support multi-user sessions or authentication.
 Fixed Voice Engine: The system relies on pyttsx3 for TTS, which may behave
differently across operating systems or voices.

System Attributes:

The system is developed to offer a lightweight, accessible, and responsive desktop AI


assistant experience, with a focus on real-time interaction, usability, and local processing
through the Ollama model.

1. Performance:
The system performs efficiently under standard desktop environments, processing voice
input and generating AI responses with minimal delay depending on system load and
local model speed.

2. Reliability:
The assistant provides stable and consistent results when the microphone is functional
and the local model is running properly, ensuring dependable interaction during regular
use.

3. User-Friendly-Interface:
Built with Tkinter, the GUI is minimal, animated, and intuitive, requiring no technical
expertise or login—users can immediately interact via voice or GUI.

4. Maintainability:
The modular Python code allows developers to easily update commands, add features,
or fix bugs without affecting core functionalities.

5. Portability:
The project runs seamlessly on any desktop operating system (Windows/Linux) with
Python and required dependencies installed, ensuring high portability across platforms.

6. Flexibility:
The assistant can be enhanced in the future with additional voice commands, smart
home integration, or backend database support without major rewrites.

7. Timeliness:
User voice commands are processed in near real-time, and responses are displayed and
spoken back immediately, enabling smooth conversational flow.
pg. 10
4.3 Non-Functional Requirements:

Safety Requirements:

 The system must not execute any critical or destructive operations (e.g., deleting
system files or shutting down unexpectedly) without user confirmation.
 assistant must avoid misinterpreting voice commands that could lead to unintended
system-level actions.
 The assistant should handle errors gracefully, providing fall back messages like “I
didn’t catch that” instead of crashing.
 The microphone and speaker usage should not interfere with any other critical
desktop processes or applications.

.
Security Requirements:

 The system runs locally and does not require a backend server or database,
minimizing exposure to external threats.
 No personal data, user logs, or conversations are stored unless explicitly enabled by
the user.
 Communication with external services (e.g., Wikipedia, web searches) occurs
securely over standard web protocols.
 The assistant does not collect or share any sensitive user data, ensuring user privacy
and system-level safety.

pg. 11
CHAPTER 5

5. SYSTEM DESIGN

Introduction:

The design process aims to create a detailed blueprint for voice assistant. desktop-based
AI assistant system, which uses the locally hosted Ollama model to process voice inputs
and deliver intelligent responses. System design defines the architecture, modules,
interfaces, and data flow to fulfill the assistant’s operational goals. The objective is to
modularize the assistant's features for easier development, scalability, and future
enhancements. The system will focus on the following:

 Modules: Dividing the application into key modules such as the voice interface, GUI,
system command engine, AI integration, and TTS engine.
 Specifications: Defining each module’s purpose, inputs, outputs, and how they behave
under different scenarios.
 Interconnections: Mapping how these modules interact and share data, ensuring
seamless communication and real-time performance.

The design approach ensures modularity, reusability, and testability. It supports smooth
integration of new features, optimizes performance, and allows for easy updates in the
future.

5.1 Context Flow Diagram:

The Context Flow Diagram illustrates the assistant system’s high-level interactions with
external entities. It views the system as a single processing unit that receives inputs and
produces outputs via interaction with voice and AI components.

 External Entities:

 User: Interacts with assistant via microphone or GUI, providing voice commands
and receiving spoken/text responses.
 Ollama LLM (Local AI): Processes user queries and generates intelligent responses
based on natural language understanding.

 System (Voice Assistant):

 Inputs: Voice commands from the user (captured and converted to text).

pg. 12
 Outputs: Spoken responses, on-screen text messages, and triggered system-level
operations (e.g., open apps, fetch info).

5.2 Data Flow diagram:

A Data Flow Diagram (DFD) is a graphical representation of how data moves through a
system. It helps visualize the flow of information, how it is processed, and the interaction
between different components, including users, subsystems, and data stores. DFDs make
complex processes easy to understand by using simple symbols to represent external

entities, processes, data stores, and data flow.

In the Voice Assistant System, the DFD demonstrates:

Processes: Voice input is converted to text. The AI engine (Ollama) processes the text
input. The system converts the AI response into speech and text output.

Data Flows: Flow of voice input, recognized text, AI prompts, and responses. User
interactions and audio/text feedback loops.

External Entities:

2.1 User: Provides voice input and receives spoken/text responses.


2.2 Ollama AI Model: Processes recognized text and returns a context-aware response.
2.3 Microphone/Speakers: Interfaces for input and output communication.
2.4 System Data Stores (optional): For saving logs, chat history, or session data.
2.5 Rules Regarding DFD construction:
 A process cannot have only outputs.
 A process cannot have only inputs.
 The inputs to a process must be sufficient to produce the outputs
from the process.
 All data stores must be connected to at least one process.
 All data stores must be connected to a source or sink.
 A data flow can have only one direction of flow. Multiple data flows to
and/or from the same process and data store must be shown by separate
arrows.

pg. 13
5.3 DFD Symbols:

Name Notation Description

Process A process transforms incoming


data flow into outgoing data
flow. The processes are shown
byname circles.

Data store Data stores are repositories of


data in the system. They are
sometimes also referred to as
files.

Dataflow Data flows are pipelines through


which packets of information
flow. Label the arrows with the
name of the data that moves
through it.

External External entities are objects outside


Entity the system with which the system
communicates. External Entities are
sources and destinations of the
system’s inputs and
outputs

pg. 14
5.3.1 Data Flow diagram

Zero level DFD

Commands, Queries
user JARVIS(AS)

Louegs Ollama ATP(AI


(databace) model)

Fig no:5.3.1.1

pg. 15
First level DFD:

Voice command
user
Command processing

System Control
(Open Apps, Stats)

Ollama API Response Display


(AI Model) AI response (Text/TTS Output)

Logs Database
(Database) (Store Data)

Fig no:5.3.1.2

pg. 16
5.4 Entity Relationship Diagram:
A set of primary components are identified for the ER Diagram: Data object, Attributes,
Relationships and various type indicators. The primary purpose of the ER Diagram is to
represent data objects and their relationships.

5.4.1 Data Objects:


A data object is a representation of almost any composite information that must be
understood by software. Composite information refers to something that has a number of
different properties or attributes.

5.4.2 Attributes:
Attributes defines the properties of data object and take on one of three different
characteristics. They can be used to (1) name an instance of the data object (2) describe the
instance or (3) make reference to another instance in another table. In addition, one or more
of the attributes must be defined as an identifier that is the identifier attribute becomes a
“key” when we want to find an instance of the data object.

5.4.3 Relationships:
Relationships Indicate the manner in which data objects are “connected” to one another.

5.4.4 Cardinality:
The data model must be capable of representing the number of occurrences objects in a
given relationship. Tillman defines the cardinality of an object/relationships pair in the
following manner: “Cardinality is the specification of the number of occurrences of one
[object] that can be related to the number of occurrences of another [object]. Cardinality is
usually expressed as simply one or many. Taking into consideration all combinations of
one and many two [object]

pg. 17
5.5 The Symbols are shown in below table:

Name Notation Description


Entity It may be an object with the
physical existence or conceptual
Entity name existence. It is represented by a
Rectangle.
Attribute The properties of the entity can be
Attribute attribute. It is represented by a
Name Ellipse.

Relationship Whenever an attribute of one


entity refers to another entity,
Relation some relationship exists. It is
represented by a Diamond.

Link Lines link attributes to entity sets


and entity sets to
relation.

Derived attribute Dashed ellipse denote derived


attributes.

Key Attribute An entity type usually has an


attribute whose values are distinct
for each individual entry in the
Key attribute
entity set. It is represented by a
Underlined word
in ellipse.

Multivalue Attributes that have different


d numbers of values for a particular
Attribute attribute. It is represented by a
Multivalued attribute Double ellipse represents multi-
valued
attributes.

pg. 18
Cardinality Ratio 1) 1:1 It specifies the maximum number of
relationships instances that an entity
2) 1:M can participate in. There is four
3) M:1 cardinality ratios.
4) M:M

5.6 ER diagram:

pg. 19
5.7 System Perspective:

The Voice Assistant is a standalone desktop application that integrates with various system
operations to enhance user productivity and interactivity. It offers real-time responses to
voice commands and performs a variety of system tasks. This system uses Ollama API
for conversational AI and integrates with voice recognition and text-to-speech
technologies.

System Characteristics:

 Simulates natural, real-time human-like conversations through speech and text.


 Fully integrates with system operations, including opening applications,
checking system stats, and browsing the web.
 Can be extended to integrate with additional home automation or smart tools.
 Features a clean, interactive GUI with animated backgrounds and scrolling chat
interface (built using Tkinter).
 Lightweight system that runs with minimal resource usage on desktop
environments.
 Voice-activated through the "Hey J.A.R.V.I.S." wake word system.

Interfaces:

User Interface (UI):

 A desktop-based GUI powered by Tkinter, which includes a speech input field, system
status display, and chat interface for text-based responses

API Interface (Ollama):

 Handles communication between the frontend and the locally hosted Ollama model for
generating AI-powered responses.
 System Interface:
 Manages interactions with the operating system for executing tasks such as launching
apps, browsing files, fetching system statistics (CPU, RAM), and more.

Voice Interface:
pg. 20
 Uses speech recognition to capture commands and text-to-speech (TTS) to provide
spoken feedback to the user.

Dependencies:

 Requires a microphone and speakers/headphones for voice interactions.


 Requires the Ollama API model hosted locally (no external API key required).
 Requires Python libraries such as speech_recognition, pyttsx3, Tkinter, and
wikipedia.
 Optional internet connection for fetching external data like weather, Wikipedia, and
online searches.

5.8 Context Diagram:

A Context Diagram provides a high-level overview of how the Voice Assistant system
interacts with users and external services.

Here’s a text-based version of the context diagram for your voice assistant project:

CHAPTER 6

pg. 21
6. DETAILED DESIGN

Introduction:

The purpose of this detailed design document is to present the complete design
specifications of the Voice Assistant. This project is a desktop-based application built
using Python, Tkinter for the GUI, and integrates with the Ollama API for generating
responses to user queries. The design emphasizes user-friendly interaction through speech,
AI-powered responses, and seamless system control functionalities. The system is
lightweight, interactive, and capable of managing voice commands for system operations
such as opening applications, checking system stats, searching the web, and more.

The design outlines the architecture, user interface, interaction with the Ollama model, and
system integration. The goal is to provide an intuitive and efficient interface while ensuring
the system is modular, maintainable, and extendable.

Applicable documents

The following documents and references were used during the design process:

 System Requirements Document


 User Interface Design Sketches / Wireframes (for the Tkinter GUI)
 Ollama API Documentation
 Python Libraries Documentation (e.g., speech recognition, pyttsx3,
Tkinter)

Structure of the software package The Components are


The software is divided into several simple and independent components for clarity and
modularity:

Component Description
UI Component Handles the graphical user interface (GUI) using
Tkinter, including the chat window, animated
background, and system status display.

pg. 22
Voice Command Handler Captures voice input from the user using the
speech_recognition library and triggers appropriate
actions, such as opening applications, fetching system
stats, etc.

System Control Interface Interacts with the operating system to perform actions
like opening applications, checking CPU/RAM usage,
launching websites, etc.

Response Display Receives AI-generated responses from


the Ollama model and updates the GUI,
either through text or voice (TTS).

Ollama API Integration Interacts with the operating system to perform actions
like opening applications, checking CPU/RAM usage,
launching websites, etc.

Error Handling Provides feedback to the user in case of issues like


microphone access errors, invalid commands, or API
connectivity issues.

6.1 DATABASE DESIGN:

A database is a structured collection of related data, crucial for storing and retrieving
information efficiently. While the current version of Voice Assistant does not depend on
a database for basic operation, future enhancements can incorporate a database to:

 Store chat history and voice commands


 Manage user sessions
 Log system activities
 Record feedback or preferences
 Track commonly used commands or questions

Database_Design:
Database design refers to the process of organizing data according to a schema. It ensures
pg. 23
efficient access, integrity, and support for core functionalities like session tracking, chat
history, and logs.

For the assistant system, a local or cloud database (e.g., SQLite, MongoDB, or Firebase)
can be used. Here's a proposed schema with 6 tables:

 Users :Stores user-specific details for personalization.


 Messages :Records all user queries and assistant responses.
 Commad_Sessions :Logs active sessions, start/end times, and user references.
 Feedback :Stores user feedback for continuous improvement.
 FAQ :Stores frequently asked questions and their predefined responses.
 Logs :Maintains a log of system actions, errors, and voice command timestamps.

Data Base Design:

pg. 24
6.1.1Database structure
Structure of Table “users”:

Field Field Size Constraints Description


Name Type
user_id int - PRIMARY KEY Unique identifier for each
user
username varchar 100 NOT NULL, UNIQUE Name of the user

created at date time - DEFAULT Date and time of account


CURRENT_TIMESTAMP creation

6.1.2 Structure of Table “messages”:

Field Name Field Type Size Constraints Description

message_id int - PRIMARY Unique identifier for


KEY each message

session_id int - FOREIGN Links message to a specific


KEY session

text 10 CHECK(sender Indicates who sent the


sender IN ('user', 'bot')) message (user or
J.A.R.V.I.S.)

content text - NOT NULL The actual message text

timestamp date time - NOT NULL Time when the message was
sent/received

pg. 25
6.1.3 Structure of Table “commend sessions”

Field Name Field Type Size Constraints Description


session_id int - PRIMARY Unique ID for
KEY Each chat session

User_id int - FOREIGN KEY


Link to the user who
started the session

Start at data time - DEFACT_CURRE


TIMESTAMP

Ended at date time - NULLABLE Session end timestamp

6.1.4 Structure of Table “Feedback”:

Field Type Field Size Constraints Description


Type
feedback_id int - PRIMARY KEY
Unique identifier for
feedback

user_id int - FOREIGN KEY User who gave the


feedback

message_id int - FOREIGN Message the feedback


KEY is related to

rating int - CHECK(rating User rating (e.g., 1–5)


BETWEEN 1 AND 5)

comment text - UNIQUE


Optional user comment

created_at date time - NOT NULL Feedback submission time

pg. 26
6.1.5 Structure of Table “faq”:

Field Name Field Size Constraints Description


Type
faq_id int - Unique identifier for each
PRIMARY KEY
FAQ

question Text - NOT NULL Common question asked by


users

answer text - NOT NULL Predefined answer for quick


reply

6.1.6 Structure of Table “logs ”:

Field Name Field Type Size Constraints Description

logic int - PRIMARY Log ID


KEY

user_id int - NOT NULL Foreign key from Users table

action type text 100 -


Type of action (e.g., open_app,
search web)

timestamp date time - DEFAULT Time of the log entry


CURRENT_
TIMESTAMP

status text 20 CHECK (status Outcome of the action


IN ('success', '
fail'))

message text NULLABLE Additional context to error


message

pg. 27
CHAPTER 7

7. IMPLEMENTATION

7.1 CODING
PYTHON (Backend) Code :
import pyttsx3
import speech_recognition as sr
import webbrowser
import datetime
import os
import psutil
import wikipedia
import random
import requests

# Initialize the TTS engine globally


engine = pyttsx3.init()
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[0].id)
engine.setProperty('rate', 190) # Slightly faster speech
# -----------------------------------
# JARVIS SPEECH FUNCTIONS
# -----------------------------------
def speak(text):
print("J.A.R.V.I.S.:", text)
engine.say(text)
engine.runAndWait()

pg. 28
def listen(timeout=5, phrase_time_limit=6):

"""Listens to the user's speech and returns it as text, quickly."

recognizer = sr.Recognizer()
with sr.Microphone() as source:
print("Listening...")
recognizer.adjust_for_ambient_noise(source, duration=0.3)
try:
audio = recognizer.listen(source, timeout=timeout,
phrase_time_limit=phrase_time_limit)
print("Recognizing...")
data = recognizer.recognize_google(audio)
print("User:", data)
return data.lower()
except sr.WaitTimeoutError:
return ""
except sr.UnknownValueError:
speak("Sorry, I didn't catch that.")
return ""
except sr.RequestError:
speak("Connection error. Please check your internet.")
return ""
# -----------------------------------
# OLLAMA AI CHAT FUNCTION
# -----------------------------------
def ollama_search(query):
pg. 29
try:
response = requests.post("http://localhost:11434/api/generate", json={

"model": "phi", # Change if using another model


"prompt": query,
"stream": False
})
data = response.json()
return data.get("response", "I'm not sure how to respond to that.").strip()
except Exception as e:
return f"Error: {str(e)}"
# -----------------------------------
# JARVIS MAIN LOOP
# -----------------------------------
def process_command(command):
if "pause" in command:
speak("Communication paused. Let me know when to continue.")
return

elif "your name" in command:


speak("My name is J.A.R.V.I.S. Just A Rather Very Intelligent System.")

elif "age" in command:


speak("I was activated recently but I learn fast.")

elif "time" in command:

pg. 30
now = datetime.datetime.now().strftime("%I:%M %p")
speak(f"Sir, the time is {now}.")

elif "date" in command:

today = datetime.datetime.now().strftime("%A, %B %d, %Y")


speak(f"Sir, today is {today}.")

elif "youtube" in command:


speak("What should I play on YouTube, sir?")
search_query = listen()
if search_query:

webbrowser.open(f"https://www.youtube.com/results?search_query={search_query}")
speak(f"Searching YouTube for {search_query}")
else:
speak("I didn't catch that.")
elif "notepad" in command:
os.system("notepad")

speak("Launching Notepad, sir.")

elif "file" in command or "open files" in command:


os.system("explorer")
speak("Opening File Explorer, sir.")

elif "cpu usage" in command:


pg. 31
usage = psutil.cpu_percent()
speak(f"CPU usage is at {usage} percent, sir.")

elif "joke" in command:


speak(random.choice([

"Why did the scarecrow win an award? Because he was outstanding in his
field!",
"Parallel lines have so much in common. It’s a shame they’ll never meet.",
"I’m on a whiskey diet. I’ve lost three days already."
]))

elif "wikipedia" in command:


speak("What should I search on Wikipedia, sir?")
search_query = listen()
if search_query:
try:
summary = wikipedia.summary(search_query, sentences=2)
speak("Here's what I found:")
speak(summary)
except wikipedia.exceptions.DisambiguationError:
speak("There are multiple results. Please be more specific.")
except:
speak("Sorry, I couldn't find that.")

elif "search google" in command:

pg. 32
speak("What do you want to search for, sir?")
query = listen()
if query:
webbrowser.open(f"https://www.google.com/search?q={query}")
speak(f"Here’s what I found for {query}")
else:

speak("I didn’t catch that.")

elif "exit" in command or "stop" in command or "bye" in command:


speak("Shutting down systems. Until next time, sir.")
exit()

else:
result = ollama_search(command)
speak(result)

# -----------------------------------
# WAKE WORD MODE
# -----------------------------------
def wait_for_wake_word():
while True:
print("Waiting for 'Hey Jarvis'...")
query = listen()
if "password" in query:
speak("Password accepted. JARVIS at your service, sir.")

pg. 33
return
# -----------------------------------
# MAIN LOOP
# -----------------------------------
if __name__ == "__main__":
speak("Initializing systems. Hello, I am JARVIS, your personal AI assistant. Say
password to activate.")

while True:
wait_for_wake_word()
while True:
command = listen()
if command in ["exit", "stop", "bye"]:
process_command(command)
break
elif command:
process_command(command)

pg. 34
PYTHON (Frontend) CODE :

import tkinter as tk
from tkinter import scrolledtext
from PIL import Image, ImageTk
from itertools import count
import threading
from main import speak, listen, process_command, wait_for_wake_word

# 🔁 Custom class for animated GIFs


class AnimatedGIFLabel(tk.Label):
def __init__(self, master, gif_path, size=(700, 500), *args, **kwargs):
super().__init__(master, *args, **kwargs)
self.gif = Image.open(gif_path)
self.frames = []

try:
while True:
frame = self.gif.copy().resize(size, Image.Resampling.LANCZOS)
self.frames.append(ImageTk.PhotoImage(frame))
self.gif.seek(len(self.frames)) # move to next frame
except EOFError:
pass # End of frames

self.frame_index = 0
self.update_frame()

def update_frame(self):
if self.frames:
self.config(image=self.frames[self.frame_index])
self.frame_index = (self.frame_index + 1) % len(self.frames)
self.after(100, self.update_frame) # Adjust for desired frame rate

# 🧠 GUI Class
class JarvisGUI:
def __init__(self, root):
self.root = root
self.root.title("J.A.R.V.I.S.")
self.root.geometry("700x500")
self.root.resizable(False, False)

pg. 35
# 🔁 Set animated GIF as background
self.bg_label = AnimatedGIFLabel(root, "live_wallpaper.gif",size=(700, 500))
self.bg_label.place(x=0, y=0, relwidth=1, relheight=1)

# 🧠 Chat area
self.text_area = scrolledtext.ScrolledText(
root, wrap=tk.WORD,
font=("Consolas", 10),
bg="#000000", fg="#00ffff",

bd=0, insertbackground="#00ffff"
)
self.text_area.place(x=400, y=300, width=300, height=200)
self.text_area.insert(tk.END, "J.A.R.V.I.S. initialized. Listening...\n\n")
self.text_area.config(state=tk.DISABLED)

# Start listening
self.start_listening()

def start_listening(self):
threading.Thread(target=self.listen_and_respond).start()

def listen_and_respond(self):
self.update_chat("Listening...")
command = listen()
if command:
self.update_chat(f"You: {command}")
process_command(command)
else:
self.update_chat("Didn't catch that.")

self.start_listening()

def update_chat(self, text):


self.text_area.config(state=tk.NORMAL)
self.text_area.insert(tk.END, text + "\n")
self.text_area.see(tk.END)
self.text_area.config(state=tk.DISABLED)

# 🚀 Run the GUI


if __name__ == "__main__":
root = tk.Tk()
app = JarvisGUI(root)
root.mainloop()
pg. 36
CHAPTER 8

8. SOFTWARE TESTING

Introduction:

The Software testing to ensure that the Voice Assistant performs reliably and accurately,
several types of software testing were conducted. These tests validate both the individual
modules (like speech recognition, system control, AI interaction) and the entire system as
a whole. Testing ensures that the assistant understands user commands, responds correctly,
performs actions like opening apps, and handles voice input/output smoothly.

The two main types of testing considered were:

1. Black Box Testing: Focused on input/output functionality without knowing internal


code.
2. White Box Testing: Involved detailed internal logic testing like function flows,
conditions, and loops.

Testing objectives

 To verify the accuracy of voice-to-text and text-to-speech conversion.


 To ensure the assistant correctly executes commands (e.g., open apps, fetch CPU
usage).
 To validate communication between the GUI and the Ollama model.
 To check system behaviour under normal and edge-case user interactions.
 To confirm graceful handling of invalid input and API failures.

8.1 Testing steps:

8.1.1 Unit Testing:

Each functional module, such as the voice recognition system (voice_recognition.py), the
system command controller (system_control.py), and the AI interface
(ollama_interface.py), was tested separately. This ensured that:

 The assistant accurately detects the wake word (“Hey J.A.R.V.I.S.”),


 The system status (CPU usage, battery level, etc.) is retrieved correctly,
 Prompts are sent to the Ollama model, and appropriate responses are received.

pg. 37
8.1.2 Integration Testing:

Integration testing evaluates how well different modules work together as a complete
system. The flow from capturing the user's voice input to generating and displaying the AI
response was tested thoroughly.

 Smooth DOM updates and UI feedback when the API response is returned.
 That the Chabot didn’t break when multiple components were working together (e.g.,
send button + typing animation + response handler).
 Seamless data transfer between modules.
 Proper synchronization of the voice recognition, text-to-speech (TTS), and GUI
systems.
 Accurate and timely response from the Ollama model displayed in the

8.1.3 Validation Testing:

Validation testing was performed to confirm that the system:

 Launching system applications through voice commands.


 Fetching and reading summaries from Wikipedia.
 Providing real-time weather updates.
 Engaging in natural conversation using AI.

8.1.4 Output Testing:

Ensured all textual and spoken outputs are:

 Grammatically correct
 Relevant to the input query
 Delivered in a timely and user-friendly way

8.1.5 User Acceptance Testing:

UAT is performed to confirm that the software is ready for real-world use by the end users.
In this phase, the system was tested by non-technical users to gather usability feedback.
Focus areas included:
 Overall user-friendliness and interface intuitiveness.
 Responsiveness and interaction speed.
 Quality and naturalness of conversational flow.

pg. 38
8.2 Commend Input and API Communication:

Input Test Condition Test Output Comments

"Hey J.A.R.V.I.S." Wake word J.A.R.V.I.S. activates Works consistently


and responds
detection

"What is the time?" Basic AI query Returns current time Accurate result

"Open Notepad" System command Launches Notepad Passed


execution

"Tell me about Wikipedia Shows Wikipedia Returns relevant


Python" integration summary content

"What’s the weather Online query Provides weather Requires internet


in London?" report

[Noise/Silence] Invalid input Responds with Handles


“Couldn’t hear you” gracefully

"Who is Elon Musk?" Ollama-powered Gives brief profile Relevant,


AI response AI-processed output

pg. 39
8.3 Commend Flow Integration:

Test Test Description Expected Output Result


No.

1 Wake word followed by valid query Recognized, answered via Success


Ollama

2 Query without wake word No response Success

3 Voice input not detected GUI runs and processes Success


commands

Launch GUI and give commands via Chat updates continuously Success
4 mic without issues

Success
5 AI responds to non-system question Answer generated by Ollama

Success
6 Invalid command like"Open spaceship" Invalid command like "Open
spaceship"

pg. 40
8.4 System testing tables:
Sl. No Test condition Test report

1. GUI launches correctly Successful

2. Voice input detected and converted accurately Successful

3. System status displayed correctly (CPU, RAM)


Successful

4. System apps open via voice command Successful

5. Ollama model responds to AI prompts Successful


Client-side interaction

6. Wake word activates assistant Successful

7. Assistant remains idle until triggered Successful

8. API fails or is not running Displays error message /


fall back handled

pg. 41
CHAPTER 9
9. USER INTERFACE
Screenshots:

9.1 Environment Interface:

9.2 System Interface:

pg. 42
9.3 Command Interface:

9.4 Working module:

pg. 43
CHAPTER 10
10. CONCLUSION

This project presents the successful development of Voice Assistant, a privacy-focused,


desktop-based AI assistant capable of performing real-time voice interactions and system
operations without relying heavily on cloud services. The assistant uses local speech
recognition, text-to-speech, and a locally hosted Ollama model to process user commands
efficiently. Throughout the project lifecycle, starting from requirement analysis, system
design, implementation, and thorough testing, each phase contributed to building a
modular, scalable, and user-friendly system that meets the objectives defined at the outset.

it offers a unique combination of features, including natural voice communication, system-


level task execution (like opening apps, fetching CPU usage, and searching the web), and
AI-powered conversational responses. The use of a Tkinter-based GUI with animated
backgrounds further enhances the user experience, making interactions more engaging and
intuitive. Testing results confirmed the system's reliability in handling a variety of voice
commands, robustness against invalid inputs, and efficient communication with the local
AI model.

However, the project also illuminated certain limitations. Performance could degrade
slightly under noisy conditions affecting speech recognition, and the system’s
conversational capabilities remain basic compared to more advanced cloud-driven
assistants. Additionally, while the user interface is responsive and visually appealing,
future versions could benefit from richer visualizations, theming options, and broader OS
compatibility.

The future scope of assistant is promising. Planned enhancements include integrating


persistent memory for personalized experiences, implementing advanced natural language
understanding, expanding into multimodal interaction via web and mobile interfaces, and
connecting to smart home ecosystems. Enhancements in emotional sentiment detection,
multilingual support, and a plugin marketplace will further solidify assistant as a powerful
and adaptive personal assistant.

In conclusion, this project demonstrates that a locally operated, AI-powered voice assistant
is not only feasible but also capable of delivering strong functionality while preserving user
privacy. The groundwork laid here positions assistant to evolve into a highly sophisticated,
intelligent, and indispensable tool for daily desktop interactions, with endless potential for
growth and innovation.

pg. 44
CHAPTER 11
11. FUTURE ENHANCEMENTS

While the current Voice Assistant provides a robust set of voice-driven system controls
and AI-powered conversation, there are many avenues to make it even smarter, more
flexible, and more deeply integrated into your daily workflows:

 Persistent Memory & Personalization


Integrate a database (e.g., SQLite or MongoDB) to store long-term user preferences,
prior conversations, and custom “skills” so it can remember your favourite apps,
preferred news sources, or recurring tasks.
Add user profiles and settings for a truly personalized experience.

 Advanced Natural Language Understanding


Implement intent classification and entity extraction to handle more complex,
multi-step commands (e.g., “Schedule a Zoom meeting tomorrow at 3 PM and email
the invite”).
Incorporate context-tracking so follow-up questions (“What about Friday?”) refer
back to the previous query.

 Multimodal Interaction
Extend beyond voice by adding a lightweight web dashboard or mobile companion
app where you can type, review logs, tweak settings, or view system stats remotely.
Enable screen-overlay notifications on desktop when critical events occur (e.g., low
battery).

 Smart Home & IOT Integration Hook into popular smart-home platforms (e.g.,
Home Assistant, Philips Hue, Nest) so you can say “Hey J.A.R.V.I.S., dim the
living-room lights to 30 percent.”
Build a plugin framework to allow community-contributed modules for new devices
and services.

 Calendar & Email Automation


Deeply integrate with Google Calendar, Outlook, or iCal to read out upcoming
events, schedule new appointments by voice, and send reminder emails or SMS.
Provide daily briefings: “Good morning, you have two meetings and it’s raining
outside.”

pg. 45
Appendix A: BIBLIOGRAPHY

web references :

1. Python Documentation
Python Software Foundation. Python Language Reference, version 3.10.
https://docs.python.org/3/

2. Speech Recognition Library Documentation


Speech Recognition 3.8.1 Documentation.
https://pypi.org/project/SpeechRecognition/

3. pyttsx3 Text-to-Speech Documentation


pyttsx3 – Text to Speech conversion library in Python.
https://pypi.org/project/pyttsx3/

4. Wikipedia API for Python


Wikipedia Documentation for Developers.
https://pypi.org/project/wikipedia/

5. Ollama Model Integration


Ollama Documentation – Run LLMs locally.
https://ollama.com/

6. Tkinter GUI Library Reference


Python Tkinter Documentation.
https://docs.python.org/3/library/tk.html

7. Web browser Module Documentation


Python Web browser Module.
https://docs.python.org/3/library/webbrowser.html

8. System Commands and OS Module


Python os and psutil Modules for System Interaction.
https://docs.python.org/3/library/os.html
https://pypi.org/project/psutil/

9. Geopy for Distance Calculations


Geopy – Geocoding and distance library.
https://geopy.readthedocs.io/

10.GitHub Repository Templates & Tutorials


GitHub – Open Source Voice Assistant Projects for Learning and Reference.
https://github.com/

pg. 46
pg. 47

You might also like