Project Report Ai Assistant Part b Final 1
Project Report Ai Assistant Part b Final 1
1. INTRODUCTION
This project presents voice assistant, a desktop-based AI assistant designed to interact with
users through natural voice commands. It utilizes speech recognition to understand spoken
input and responds using text-to-speech technology, creating a hands-free, intelligent user
experience.
The assistant is capable of performing a variety of tasks such as telling the current date and
time, opening websites like Google and YouTube, conducting Wikipedia searches,
launching system applications like Notepad or File Explorer, monitoring CPU usage, and
telling jokes.
For more complex queries and conversations, assistant. integrates a locally hosted AI
model through Ollama, allowing it to provide smart, AI-generated responses. The project
includes a user-friendly Tkinter-based GUI with an animated background and chat
interface, enhancing both the usability and visual appeal of the assistant.
This system showcases the integration of artificial intelligence, voice processing, and GUI
design into a functional and interactive personal assistant.
pg. 1
CHAPTER 2
2. LITERATURE SURVEY
In contrast, this project leverages Ollama, an AI model that can run locally on a user's
system, providing a more privacy-conscious solution for conversational interactions.
Ollama allows for sophisticated AI-powered responses without relying on external servers,
ensuring that the user's data remains secure and private. Furthermore, the project combines
speech recognition (via Speech Recognition) with text-to-speech technology (using
pyttsx3) to create a seamless voice-interactive interface.
on cloud services, making it more secure and efficient for users concerned about privacy.
Additionally, the system is equipped with built-in functionality for executing operating
pg. 2
system tasks, such as opening files and applications, checking CPU usage, and running
Google searches—all triggered via voice commands.
The proposed system, voice assistant., is a desktop-based AI assistant that interacts with
users via voice commands. It uses speech recognition, TTS, and a locally hosted AI model
(Ollama) for smart responses. The system performs OS-level operations, web searches, and
provides a responsive GUI using Tkinter. It ensures user privacy, speed, and an interactive
experience without relying on cloud services.
pg. 3
CHAPTER 3
3. HARDWARE AND SOFTWARE REQUIREMENTS
Introduction
HARDWARE: - The hardware requirements for the voice assistant. project is minimal, as
it primarily operates on a standard personal computer.
SOFTWARE: - The software environment required for voice assistant. includes various
tools and libraries that enable voice recognition, AI interaction, and system control.
pg. 4
CHAPTER 4
The Software Requirements Specification (SRS) for the voice assistant outlines the
functional and non-functional requirements necessary for the development and deployment
of the system. This section provides a detailed description of the software components,
dependencies, and the environment required to run the voice assistant.
The software environment includes the use of Python, various third-party libraries for
speech recognition, text-to-speech synthesis, and the Ollama API for AI-powered
responses. This document will detail the software architecture, libraries, and dependencies
required for the successful implementation of the project.
Access Voice Assistant: The end user interacts with the voice assistant. system through
voice commands, without the need for registration or login.
Initiate Conversation: The user can issue voice commands or ask questions, and the
system responds via speech.
Receive Real-Time Responses: The system provides context-aware, AI-generated
responses powered by the Ollama API.
Perform Tasks: The user can instruct the assistant to perform tasks such as opening
apps, searching Google, playing YouTube videos, and fetching system information
(e.g., CPU usage, time, date).
Error Handling: If the system does not understand the input, it provides fullback
responses such as "Sorry, I didn't catch that."
No Registration or Login: The system is designed for instant use; no user
authentication is required, making it easy for anyone to use the assistant.
pg. 5
b) Voice Assistant (AI):
Process User Commands: voice assistant interprets voice commands and converts
them into actions, such as opening applications or performing system operations.
Generate Responses: The system generates natural language responses, such as telling
the time, providing weather updates, or offering a joke based on the user's request.
Error Response: If voice assistant. cannot process the command, it provides fall back
responses like "I'm not sure how to respond to that."
No Memory Between Sessions: The assistant does not retain any user data between
sessions. Each conversation is independent, ensuring user privacy.
Delayed Interaction: The system may have slight response delays depending on
system performance or voice recognition speed, especially on lower-end hardware.
Continuous Learning (Future Scope): In the future, the system could be enhanced to
learn from user interactions and real-time, processing and responding immediately
without delays. Improve its accuracy and responsiveness over time.
General Constraints:
General constraints include the following:
Internet Connectivity: Some features like Wikipedia search, Google search, and Ollama
responses require an active internet connection.
Local Ollama Server: The AI chat functionality relies on a locally hosted Ollama
model running on port 11434, which must be active before launching the assistant.
Single User Support: The system is designed for individual use and does not support
multi-user profiles or authentication.
pg. 6
Assumptions and Dependencies:
The user has Python 3.8 or higher installed along with necessary libraries (pyttsx3,
speech_recognition, wikipedia, requests, psutil, tkinter, etc.).
The Ollama API (running locally) must be properly set up and active for AI-based
question answering to function.
A stable internet connection is required for some features (e.g., Wikipedia, Google
search, YouTube access).
It is assumed the user has basic technical knowledge to launch the Python application.
Dependencies include third-party libraries and APIs which must be properly installed
and configured.
Specific Requirements:
User Interface:
A user interface is the point of interaction between the user and the system. For the voice
assistant, the UI is kept minimal and user-friendly to ensure smooth communication with
the assistant
The system provides a desktop-based graphical user interface (GUI) built using
Tkinter.
The GUI features an animated GIF background to simulate a futuristic, interactive
assistant environment.
pg. 7
A scrollable chat area displays ongoing conversations between the user and voice
assistant, updating in real-time.
Users can interact with assistant using voice commands, and the responses are both
spoken aloud and shown in the chat interface.
The GUI remains minimal, clean, and functional, avoiding complex navigation for a
simple user experience.
The interface automatically restarts listening after each interaction, allowing for hands-
free operation.
Error messages or system notifications (e.g., "Didn't catch that") are displayed in the
chat for better clarity.
No login or authentication is required, ensuring quick access for the user.
Interaction with Chabot: The user can interact with assistant by issuing voice
commands. assistant processes user queries and responds via both voice and text in
the chat interface.
Access System Features: Users can request assistant. to perform various tasks like
opening applications, checking the time and date, searching the web, and providing
weather information.
Error Handling: If the system fails to understand a command or encounters an issue,
it provides fallback responses (e.g., “I didn't catch that”).
Real-Time Interaction: The system operates in real-time, responding to commands
almost instantly with a slight delay in some cases.
Provide Feedback: After a task completion or response, the user can provide
feedback (e.g., thumbs up/thumbs down or text comments).
pg. 8
Perform System Operations: assistant can execute OS-level operations, such as
opening applications (Notepad, File Explorer) and checking system statistics (e.g.,
CPU usage).
Handle Errors Gracefully: When unable to process a command, assistant provides
an error response and offers suggestions for rephrasing or retrying.
Continuous Learning (Future Scope): In future versions, assistant can learn from
user feedback to improve its accuracy and responses.
Performance Requirements:
Voice Response Speed: assistant should deliver voice responses without noticeable
delays, ensuring a natural flow of conversation.
Real-Time Response: The system should provide near-instant responses to user
queries, with minimal delays, ensuring a smooth interaction.
Accuracy of Voice Recognition: The system must recognize voice commands with
over 90% accuracy, even in environments with moderate background noise.
Responsiveness of the GUI: The user interface should remain responsive and
smoothly update the chat in real-time without lag, even when performing complex
tasks.
Resource Utilization: The system should consume minimal CPU and memory
resources to run efficiently on most desktop systems.
Error Handling Performance: The system should quickly process errors and provide
helpful fall back messages within a few seconds.
Design Constraints:
System Attributes:
1. Performance:
The system performs efficiently under standard desktop environments, processing voice
input and generating AI responses with minimal delay depending on system load and
local model speed.
2. Reliability:
The assistant provides stable and consistent results when the microphone is functional
and the local model is running properly, ensuring dependable interaction during regular
use.
3. User-Friendly-Interface:
Built with Tkinter, the GUI is minimal, animated, and intuitive, requiring no technical
expertise or login—users can immediately interact via voice or GUI.
4. Maintainability:
The modular Python code allows developers to easily update commands, add features,
or fix bugs without affecting core functionalities.
5. Portability:
The project runs seamlessly on any desktop operating system (Windows/Linux) with
Python and required dependencies installed, ensuring high portability across platforms.
6. Flexibility:
The assistant can be enhanced in the future with additional voice commands, smart
home integration, or backend database support without major rewrites.
7. Timeliness:
User voice commands are processed in near real-time, and responses are displayed and
spoken back immediately, enabling smooth conversational flow.
pg. 10
4.3 Non-Functional Requirements:
Safety Requirements:
The system must not execute any critical or destructive operations (e.g., deleting
system files or shutting down unexpectedly) without user confirmation.
assistant must avoid misinterpreting voice commands that could lead to unintended
system-level actions.
The assistant should handle errors gracefully, providing fall back messages like “I
didn’t catch that” instead of crashing.
The microphone and speaker usage should not interfere with any other critical
desktop processes or applications.
.
Security Requirements:
The system runs locally and does not require a backend server or database,
minimizing exposure to external threats.
No personal data, user logs, or conversations are stored unless explicitly enabled by
the user.
Communication with external services (e.g., Wikipedia, web searches) occurs
securely over standard web protocols.
The assistant does not collect or share any sensitive user data, ensuring user privacy
and system-level safety.
pg. 11
CHAPTER 5
5. SYSTEM DESIGN
Introduction:
The design process aims to create a detailed blueprint for voice assistant. desktop-based
AI assistant system, which uses the locally hosted Ollama model to process voice inputs
and deliver intelligent responses. System design defines the architecture, modules,
interfaces, and data flow to fulfill the assistant’s operational goals. The objective is to
modularize the assistant's features for easier development, scalability, and future
enhancements. The system will focus on the following:
Modules: Dividing the application into key modules such as the voice interface, GUI,
system command engine, AI integration, and TTS engine.
Specifications: Defining each module’s purpose, inputs, outputs, and how they behave
under different scenarios.
Interconnections: Mapping how these modules interact and share data, ensuring
seamless communication and real-time performance.
The design approach ensures modularity, reusability, and testability. It supports smooth
integration of new features, optimizes performance, and allows for easy updates in the
future.
The Context Flow Diagram illustrates the assistant system’s high-level interactions with
external entities. It views the system as a single processing unit that receives inputs and
produces outputs via interaction with voice and AI components.
External Entities:
User: Interacts with assistant via microphone or GUI, providing voice commands
and receiving spoken/text responses.
Ollama LLM (Local AI): Processes user queries and generates intelligent responses
based on natural language understanding.
Inputs: Voice commands from the user (captured and converted to text).
pg. 12
Outputs: Spoken responses, on-screen text messages, and triggered system-level
operations (e.g., open apps, fetch info).
A Data Flow Diagram (DFD) is a graphical representation of how data moves through a
system. It helps visualize the flow of information, how it is processed, and the interaction
between different components, including users, subsystems, and data stores. DFDs make
complex processes easy to understand by using simple symbols to represent external
Processes: Voice input is converted to text. The AI engine (Ollama) processes the text
input. The system converts the AI response into speech and text output.
Data Flows: Flow of voice input, recognized text, AI prompts, and responses. User
interactions and audio/text feedback loops.
External Entities:
pg. 13
5.3 DFD Symbols:
pg. 14
5.3.1 Data Flow diagram
Commands, Queries
user JARVIS(AS)
Fig no:5.3.1.1
pg. 15
First level DFD:
Voice command
user
Command processing
System Control
(Open Apps, Stats)
Logs Database
(Database) (Store Data)
Fig no:5.3.1.2
pg. 16
5.4 Entity Relationship Diagram:
A set of primary components are identified for the ER Diagram: Data object, Attributes,
Relationships and various type indicators. The primary purpose of the ER Diagram is to
represent data objects and their relationships.
5.4.2 Attributes:
Attributes defines the properties of data object and take on one of three different
characteristics. They can be used to (1) name an instance of the data object (2) describe the
instance or (3) make reference to another instance in another table. In addition, one or more
of the attributes must be defined as an identifier that is the identifier attribute becomes a
“key” when we want to find an instance of the data object.
5.4.3 Relationships:
Relationships Indicate the manner in which data objects are “connected” to one another.
5.4.4 Cardinality:
The data model must be capable of representing the number of occurrences objects in a
given relationship. Tillman defines the cardinality of an object/relationships pair in the
following manner: “Cardinality is the specification of the number of occurrences of one
[object] that can be related to the number of occurrences of another [object]. Cardinality is
usually expressed as simply one or many. Taking into consideration all combinations of
one and many two [object]
pg. 17
5.5 The Symbols are shown in below table:
pg. 18
Cardinality Ratio 1) 1:1 It specifies the maximum number of
relationships instances that an entity
2) 1:M can participate in. There is four
3) M:1 cardinality ratios.
4) M:M
5.6 ER diagram:
pg. 19
5.7 System Perspective:
The Voice Assistant is a standalone desktop application that integrates with various system
operations to enhance user productivity and interactivity. It offers real-time responses to
voice commands and performs a variety of system tasks. This system uses Ollama API
for conversational AI and integrates with voice recognition and text-to-speech
technologies.
System Characteristics:
Interfaces:
A desktop-based GUI powered by Tkinter, which includes a speech input field, system
status display, and chat interface for text-based responses
Handles communication between the frontend and the locally hosted Ollama model for
generating AI-powered responses.
System Interface:
Manages interactions with the operating system for executing tasks such as launching
apps, browsing files, fetching system statistics (CPU, RAM), and more.
Voice Interface:
pg. 20
Uses speech recognition to capture commands and text-to-speech (TTS) to provide
spoken feedback to the user.
Dependencies:
A Context Diagram provides a high-level overview of how the Voice Assistant system
interacts with users and external services.
Here’s a text-based version of the context diagram for your voice assistant project:
CHAPTER 6
pg. 21
6. DETAILED DESIGN
Introduction:
The purpose of this detailed design document is to present the complete design
specifications of the Voice Assistant. This project is a desktop-based application built
using Python, Tkinter for the GUI, and integrates with the Ollama API for generating
responses to user queries. The design emphasizes user-friendly interaction through speech,
AI-powered responses, and seamless system control functionalities. The system is
lightweight, interactive, and capable of managing voice commands for system operations
such as opening applications, checking system stats, searching the web, and more.
The design outlines the architecture, user interface, interaction with the Ollama model, and
system integration. The goal is to provide an intuitive and efficient interface while ensuring
the system is modular, maintainable, and extendable.
Applicable documents
The following documents and references were used during the design process:
Component Description
UI Component Handles the graphical user interface (GUI) using
Tkinter, including the chat window, animated
background, and system status display.
pg. 22
Voice Command Handler Captures voice input from the user using the
speech_recognition library and triggers appropriate
actions, such as opening applications, fetching system
stats, etc.
System Control Interface Interacts with the operating system to perform actions
like opening applications, checking CPU/RAM usage,
launching websites, etc.
Ollama API Integration Interacts with the operating system to perform actions
like opening applications, checking CPU/RAM usage,
launching websites, etc.
A database is a structured collection of related data, crucial for storing and retrieving
information efficiently. While the current version of Voice Assistant does not depend on
a database for basic operation, future enhancements can incorporate a database to:
Database_Design:
Database design refers to the process of organizing data according to a schema. It ensures
pg. 23
efficient access, integrity, and support for core functionalities like session tracking, chat
history, and logs.
For the assistant system, a local or cloud database (e.g., SQLite, MongoDB, or Firebase)
can be used. Here's a proposed schema with 6 tables:
pg. 24
6.1.1Database structure
Structure of Table “users”:
timestamp date time - NOT NULL Time when the message was
sent/received
pg. 25
6.1.3 Structure of Table “commend sessions”
pg. 26
6.1.5 Structure of Table “faq”:
pg. 27
CHAPTER 7
7. IMPLEMENTATION
7.1 CODING
PYTHON (Backend) Code :
import pyttsx3
import speech_recognition as sr
import webbrowser
import datetime
import os
import psutil
import wikipedia
import random
import requests
pg. 28
def listen(timeout=5, phrase_time_limit=6):
recognizer = sr.Recognizer()
with sr.Microphone() as source:
print("Listening...")
recognizer.adjust_for_ambient_noise(source, duration=0.3)
try:
audio = recognizer.listen(source, timeout=timeout,
phrase_time_limit=phrase_time_limit)
print("Recognizing...")
data = recognizer.recognize_google(audio)
print("User:", data)
return data.lower()
except sr.WaitTimeoutError:
return ""
except sr.UnknownValueError:
speak("Sorry, I didn't catch that.")
return ""
except sr.RequestError:
speak("Connection error. Please check your internet.")
return ""
# -----------------------------------
# OLLAMA AI CHAT FUNCTION
# -----------------------------------
def ollama_search(query):
pg. 29
try:
response = requests.post("http://localhost:11434/api/generate", json={
pg. 30
now = datetime.datetime.now().strftime("%I:%M %p")
speak(f"Sir, the time is {now}.")
webbrowser.open(f"https://www.youtube.com/results?search_query={search_query}")
speak(f"Searching YouTube for {search_query}")
else:
speak("I didn't catch that.")
elif "notepad" in command:
os.system("notepad")
"Why did the scarecrow win an award? Because he was outstanding in his
field!",
"Parallel lines have so much in common. It’s a shame they’ll never meet.",
"I’m on a whiskey diet. I’ve lost three days already."
]))
pg. 32
speak("What do you want to search for, sir?")
query = listen()
if query:
webbrowser.open(f"https://www.google.com/search?q={query}")
speak(f"Here’s what I found for {query}")
else:
else:
result = ollama_search(command)
speak(result)
# -----------------------------------
# WAKE WORD MODE
# -----------------------------------
def wait_for_wake_word():
while True:
print("Waiting for 'Hey Jarvis'...")
query = listen()
if "password" in query:
speak("Password accepted. JARVIS at your service, sir.")
pg. 33
return
# -----------------------------------
# MAIN LOOP
# -----------------------------------
if __name__ == "__main__":
speak("Initializing systems. Hello, I am JARVIS, your personal AI assistant. Say
password to activate.")
while True:
wait_for_wake_word()
while True:
command = listen()
if command in ["exit", "stop", "bye"]:
process_command(command)
break
elif command:
process_command(command)
pg. 34
PYTHON (Frontend) CODE :
import tkinter as tk
from tkinter import scrolledtext
from PIL import Image, ImageTk
from itertools import count
import threading
from main import speak, listen, process_command, wait_for_wake_word
try:
while True:
frame = self.gif.copy().resize(size, Image.Resampling.LANCZOS)
self.frames.append(ImageTk.PhotoImage(frame))
self.gif.seek(len(self.frames)) # move to next frame
except EOFError:
pass # End of frames
self.frame_index = 0
self.update_frame()
def update_frame(self):
if self.frames:
self.config(image=self.frames[self.frame_index])
self.frame_index = (self.frame_index + 1) % len(self.frames)
self.after(100, self.update_frame) # Adjust for desired frame rate
# 🧠 GUI Class
class JarvisGUI:
def __init__(self, root):
self.root = root
self.root.title("J.A.R.V.I.S.")
self.root.geometry("700x500")
self.root.resizable(False, False)
pg. 35
# 🔁 Set animated GIF as background
self.bg_label = AnimatedGIFLabel(root, "live_wallpaper.gif",size=(700, 500))
self.bg_label.place(x=0, y=0, relwidth=1, relheight=1)
# 🧠 Chat area
self.text_area = scrolledtext.ScrolledText(
root, wrap=tk.WORD,
font=("Consolas", 10),
bg="#000000", fg="#00ffff",
bd=0, insertbackground="#00ffff"
)
self.text_area.place(x=400, y=300, width=300, height=200)
self.text_area.insert(tk.END, "J.A.R.V.I.S. initialized. Listening...\n\n")
self.text_area.config(state=tk.DISABLED)
# Start listening
self.start_listening()
def start_listening(self):
threading.Thread(target=self.listen_and_respond).start()
def listen_and_respond(self):
self.update_chat("Listening...")
command = listen()
if command:
self.update_chat(f"You: {command}")
process_command(command)
else:
self.update_chat("Didn't catch that.")
self.start_listening()
8. SOFTWARE TESTING
Introduction:
The Software testing to ensure that the Voice Assistant performs reliably and accurately,
several types of software testing were conducted. These tests validate both the individual
modules (like speech recognition, system control, AI interaction) and the entire system as
a whole. Testing ensures that the assistant understands user commands, responds correctly,
performs actions like opening apps, and handles voice input/output smoothly.
Testing objectives
Each functional module, such as the voice recognition system (voice_recognition.py), the
system command controller (system_control.py), and the AI interface
(ollama_interface.py), was tested separately. This ensured that:
pg. 37
8.1.2 Integration Testing:
Integration testing evaluates how well different modules work together as a complete
system. The flow from capturing the user's voice input to generating and displaying the AI
response was tested thoroughly.
Smooth DOM updates and UI feedback when the API response is returned.
That the Chabot didn’t break when multiple components were working together (e.g.,
send button + typing animation + response handler).
Seamless data transfer between modules.
Proper synchronization of the voice recognition, text-to-speech (TTS), and GUI
systems.
Accurate and timely response from the Ollama model displayed in the
Grammatically correct
Relevant to the input query
Delivered in a timely and user-friendly way
UAT is performed to confirm that the software is ready for real-world use by the end users.
In this phase, the system was tested by non-technical users to gather usability feedback.
Focus areas included:
Overall user-friendliness and interface intuitiveness.
Responsiveness and interaction speed.
Quality and naturalness of conversational flow.
pg. 38
8.2 Commend Input and API Communication:
"What is the time?" Basic AI query Returns current time Accurate result
pg. 39
8.3 Commend Flow Integration:
Launch GUI and give commands via Chat updates continuously Success
4 mic without issues
Success
5 AI responds to non-system question Answer generated by Ollama
Success
6 Invalid command like"Open spaceship" Invalid command like "Open
spaceship"
pg. 40
8.4 System testing tables:
Sl. No Test condition Test report
pg. 41
CHAPTER 9
9. USER INTERFACE
Screenshots:
pg. 42
9.3 Command Interface:
pg. 43
CHAPTER 10
10. CONCLUSION
However, the project also illuminated certain limitations. Performance could degrade
slightly under noisy conditions affecting speech recognition, and the system’s
conversational capabilities remain basic compared to more advanced cloud-driven
assistants. Additionally, while the user interface is responsive and visually appealing,
future versions could benefit from richer visualizations, theming options, and broader OS
compatibility.
In conclusion, this project demonstrates that a locally operated, AI-powered voice assistant
is not only feasible but also capable of delivering strong functionality while preserving user
privacy. The groundwork laid here positions assistant to evolve into a highly sophisticated,
intelligent, and indispensable tool for daily desktop interactions, with endless potential for
growth and innovation.
pg. 44
CHAPTER 11
11. FUTURE ENHANCEMENTS
While the current Voice Assistant provides a robust set of voice-driven system controls
and AI-powered conversation, there are many avenues to make it even smarter, more
flexible, and more deeply integrated into your daily workflows:
Multimodal Interaction
Extend beyond voice by adding a lightweight web dashboard or mobile companion
app where you can type, review logs, tweak settings, or view system stats remotely.
Enable screen-overlay notifications on desktop when critical events occur (e.g., low
battery).
Smart Home & IOT Integration Hook into popular smart-home platforms (e.g.,
Home Assistant, Philips Hue, Nest) so you can say “Hey J.A.R.V.I.S., dim the
living-room lights to 30 percent.”
Build a plugin framework to allow community-contributed modules for new devices
and services.
pg. 45
Appendix A: BIBLIOGRAPHY
web references :
1. Python Documentation
Python Software Foundation. Python Language Reference, version 3.10.
https://docs.python.org/3/
pg. 46
pg. 47