AI Assistant
AI Assistant
of
BS-MS (CS)
By
V. Shreeja
L. Abhilash
OSMANIA UNIVERSITY
HYDERABAD
2025
CERTIFICATION
This is to certify that the Project Report titled AI ASSISTANT submitted in partial
fulfillment for the award of BS-MS Programme of VVISM, OU. Hyderabad, was
carried out by VPLN. Sri Harsha, V. Shreeja, L. Abhilash under my guidance. This
has not been submitted to any other University or Institution for the award of any degree
/ diploma / certificate.
Mrs. P. Shailaja ,
Vishwa Vishwani Institute Of
Systems And Management,
Hyderabad.
DECLARATION
I hereby declare that this Project Report submitted by me to the VVISM, OU.
other University or Institution for the award of any degree diploma/ certificate or
BS-MS PROGRAMME
Date Date
SIGNATURE DIRECTOR
Abstract
Literature Survey
12
2.1 Description of the software
2 Engineering Concepts Used 12
2.2 Description of Analysis and Design concepts used 12
2.3 Description of Tools used 13
2.4 Description of Methodology used.
Feasibility Study
15
3.1 Short Feasibility study Definition 15
3 3.2 Economical Feasibility 15
3.3 Technical Feasibility 16
3.4 Operational Feasibility
Test Cases 72
7 7.1 Introduction 72
7.2 Test Case Structure 73
7.3 Test Case for AI Chat Bot System 74
7.4 Chat bot Response Test Case 75
7.5 Sentiment Analysis Test Case 76
7.6 Intent Classification Test Case 77
7.7 Error handling Test Case
8 Conclusion 78
9 Bibliography 81
.
INTRODUCTION
2
processes us er queri es but al so provi des seam l ess cont rol ov er sel e ct
appl i cat i ons such as m edi a pl ayers, You Tube vi deos, and docum ent
edi t ors. B y com bi ni ng t hese feat ur es, t h e proj e ct showc ases how A I
can enhanc e bot h com m uni cat i on a nd user i nt e ract i o n, m aki ng
t echnol ogy m ore i nt ui t i ve and effi ci ent .
3
queri es.
• The r esponses o ft e n fe el rob oti c an d d i scon n ected , m a ki ng
t hem l ess nat ural i n hum an i nt eract i on.
4
com m and cor rect l y. Thi s hi ghl i ght s t he need fo r an al t ernat i ve
cont rol m et hod t o ensure seam l ess i nt eract i on i n di f fe rent
envi ronm ent s.
5
voi ce and g est ures, m aki ng i nt eract i o ns m ore i nt ui t i ve and
adapt abl e.
B y com bi ni ng t hese i nt era ct i on m et hods, our syst em ens ures a
m ore fl exi bl e and user -f ri endl y experi e nce whi l e acknowl e dgi ng
t hat gest ures ar e l i m i t ed t o cert ai n appl i cat i on cont rol s.
• Real -T i m e Data R e tri eval : Unl i ke m an y A I m odel s t h at rel y on
st at i c responses fro m pre -de fi ned d at as et s, our assi st ant f e t ches
real -t i m e dat a from t he i nt ernet , ensuri n g t hat :
• Users rec ei ve up -t o -dat e and ac curat e i nform at i on at al l
t i m es.
• The assi st ant can respond t o dynam i c queri es, such as
weat her upd at es, st o ck m arket t r ends, an d breaki ng ne ws.
• The syst em rem ai ns const ant l y updat ed , rat her t h an bei ng
rest ri ct ed by of fl i ne dat aset s.
Thi s capabi l i t y m a kes t he assi st ant m ore i nt el l i gent an d
rel i abl e for re al -wor l d appl i cat i ons.
Con text -A ware Co n versati on s : A m aj or adv ancem ent of our A I
assi st ant i s i t s abi l i t y t o engage i n m ore nat ur al and i nt el l i gent
conversat i ons by un derst andi ng cont ext and user i nt ent .
• Mai nt ai ns C onversat i on Hi st ory: The assi st ant rem em bers
previ ous i nt eract i o ns, al l owi ng for c ont i nuous and cohe rent
di al ogues.
• Underst ands Me ani ng B eyond Words : Inst e ad of m er el y
respondi ng t o keyw ords, t he assi st ant anal yzes use r i nt ent t o
provi de m ore r el eva nt answers.
• Person al i zed E xp eri en ce : Over t i m e, t he assi st ant adapt s t o
user pr efe renc es, m a ki ng responses m or e t ai l ored and ef fect i ve.
Thi s feat ur e enhan c es t he user expe ri en ce by m aki ng t he as si st ant
6
feel m or e i nt e ract i v e and i nt el l i gent , ra t her t han j ust respo ndi ng
wi t h i sol at ed answer s.
7
• Fet ch es i nform at i on from onl i ne sources t o ensure accu racy and
rel evan cy.
• Enabl es bet t er m ac hi ne l earni ng ad ap t at i on by cont i nuousl y
i m provi ng responses .
Unl i ke offl i ne A I m odel s, whi ch oft en provi de l i m i t ed res ponses
due t o st at i c d at as et s, our assi st ant i s const ant l y l ea rni ng and
evol vi ng, m aki ng i t m ore i nt el l i gent and responsi ve.
8
Fl exi b l e In p u t Op tion s for Users
Tradi t i onal A I assi s t ant s rel y on a si ngl e i nput m et hod, usual l y
voi ce. Our syst em offers m ul t i pl e i npu t opt i ons, m aki ng i t
adapt abl e t o di ff ere nt envi ronm ent s.
S upport s B ot h Voi ce and Gest ur es: Use rs can i nt er act i n a
way t hat best sui t s t hei r surroundi ngs.
Al l ows Con trol Wi th ou t T yp in g : B enefi ci al for hands - fre e
operat i on when wor ki ng or m ul t i t aski ng.
Mi ni m i zes Frust rat i on i n Noi sy Envi ronm ent s: Users can
rel y on gest ures wh en voi ce i nput i s di ffi cul t .
In tel l i gen t T ask Execu ti on B ased on Real -T i m e Data
Unl i ke A I m odel s t hat rel y onl y on pre -s t ored i nform at i on, o ur
assi st ant fet ches r ea l -t i m e dat a from t he i nt ernet t o m ake
bet t er deci si ons.
Provi d es L i ve Up d ates : Del i vers up -t o -dat e i nform at i on
i nst ead of rel yi ng o n out dat ed knowl ed ge.
Dyn am i cal l y Ad apts to User Need s : C an adj ust responses
based on new t r ends , event s, or user h ab i t s.
Red u ces S tal e Res p on ses : Ensures t hat i nform at i on i s
al ways fr esh and r el evant .
Practi cal G es tu re - B ased Con trol for S p eci fi c Ap p l i cati on s
Whi l e voi ce com m ands offer conv eni ence, gest ur es enha nce
hands -fr ee ope rat i o n for key appl i c at i ons.
Fast & E ffor tl ess Con trol : Al l ows users t o pl ay, pause, or
st op appl i cat i ons i nst ant l y.
Id eal for Med i a & Prod u cti vi ty Ap p l icati on s : S upport s
si m pl e hand gest ure s for cont rol l i ng vi deos, m usi c, and
docum ent s.
Red u ces O ver rel i an ce on Voi ce : Gi ves users al t ern at i ve
9
ways t o i nt eract wi t h t he syst em .
10
• Input: User's spoken words.
• Input:
• Output:
o If the battery is ≤ 30% and not charging, a low battery warning is issued.
o If the battery is < 25% and not charging, all applications close.
• Dependencies:
3. BertTextClassification Module
• Functionality: Classifies the transcribed text into three categories – Internet
Query, Automation, and Conversation.
11
• Dependencies: Connects with Internet Query, Automation, and Conversational
modules.
Classification Process
• The system first recognizes speech from the user.
• The system then responds using the appropriate module and delivers the output
via the speak() function.
5. Automation Module
• Functionality: Controls system applications based on classified user commands.
• Sub-Modules:
12
o Closing Applications: Closes specific system applications.
6. Conversational Module
• Functionality: Provides human-like conversational responses, emotional
support, and answers general queries.
13
2.LITERATURE SURVEY
2.1 Software Engineering Concepts Used
• Our project follows a modular software engineering approach, ensuring each
module is independent and scalable.
• Internet Query Module: Provides responses for real-time queries like weather
updates, cryptocurrency prices, and newss
• conversational model.
14
c. If the input is positive or neutral, a natural response is generated
from the conversational model.
Automation:
• PyAutoGUI: Used for keyboard functions.
• Closing Applications: Runs on a separate thread but only when Microsoft Word
is open, allowing for controlled application closure as needed.
• YouTube Play/Pause: Runs on a separate thread when automation is triggered
to control media playback, including playing/pausing videos, adjusting volume,
and closing YouTube Music and YouTube videos.
o The system listens to the user and converts spoken words into text using
the Speech Recognition module.
15
o If classified as Automation, it triggers commands to control
applications.
5. The response is generated and spoken aloud using the Speak Module.
16
3.FEASIBILITY STUDY
Defi n i ti on
A f easi bi l i t y st udy assesses t he pra ct i cal i t y and vi abi l i t y of a
pr oposed proj ect b y eval uat i ng t e chni ca l , economi c, and op erat i onal
as pect s t o det ermi ne i t s success pot ent i al .
3.1 Economic Feasibility
Economic feasibility assesses the cost-effectiveness of the system. Since this is a minor
project, no direct financial investments were required for development. The project
makes use of open-source technologies, eliminating licensing costs. The system runs
on a standard computing system, negating the need for additional hardware. Thus, the
implementation cost is minimal while providing high utility.
• Since this is a minor project, no financial investments were required for
development.
• All tools and technologies used in the project are open-source and free to use.
• The project runs on a standard computing system, eliminating the need for
additional hardware or costly infrastructure.
17
• The project is implemented using Python and various open-source libraries,
ensuring compatibility with different operating systems.
• The tools used, such as Speech Recognition, BERT, RoBERTa, , zero shot
classification , facebook/bart-large-mnli and PyAutoGUI, are readily available
and well-supported.
• The classification and automation modules operate locally, but some features
(e.g., fetching real-time updates) require internet access.
18
• Speech-based interaction allows accessibility and ease of use for users of all
backgrounds.
19
4.SOFTWARE REQUIREMENTS SPECIFICATION
4 . 1. In trod u cti on
IEE E 830 -1993 S t an dard for S oft w are R equi rem ent s S peci fi cat i on
Thi s sect i on defi n es t he fun ct i onal and non -funct i onal r equ i rem ent s
of t he syst em , i ncludi ng t echni cal const rai nt s and expect e d syst em
behavi or. The S R S ensures t hat al l aspect s of t he pro j ect are
s t ruct ured, docum en t ed, and i m pl em ent e d corre ct l y.
4.2 Fu n cti on al Requ i rem en ts
The funct i onal r eq ui rem ent s descri be t he essent i al oper at i ons and
capabi l i t i es of t he syst em . Each m o dul e has a di st i nc t rol e i n
processi ng user i npu t s and gener at i ng ap propri at e r esponses.
2. B attery Ch e ck i n g Mod u l e
The B at t ery C hecki n g Modul e m oni t ors t he syst em 's bat t ery l evel and
provi des voi ce al ert s based on i t s st at us . It ensures user awa reness of
l ow bat t ery l ev el s and prev ent s syst e m crash es by i ni t i a t i ng safe
s hut down procedur e s.
R et ri eves bat t ery st at us usi ng t he psut i l l i brary.
P rovi des real -t i m e al ert s t hrough T ex t -t o -S peech (T TS ) usi ng t he
s peak funct i on.
Warns t he us er whe n t he bat t ery i s l ow.
20
C l oses appl i cat i ons when bat t ery rea ch es a cri t i c al l evel .
Ensures syst em st a bi l i t y by prevent i ng unexpect ed shut do wns.
21
s ent i m ent anal ysi s t echni ques.
Adapt s responses based on t he em ot i onal st at e det ect ed, ensuri ng a
m ore hum an -l i ke i nt eract i on
7 . S en ti m en t An al ysi s
The assi st ant anal ys es user spe ech t o d e t erm i ne sent i m ent ( posi t i ve,
neut ral , or neg at i ve) and adj ust s respons es acco rdi ngl y.
Funct i onal i t y i t d et e ct s em ot i onal t one f rom t he user ’s voi ce . Adj ust s
conversat i on fl ow b ased on sent i m ent . C an offe r support i ve r esponses
or suggest i ons i n ca se of negat i ve s ent i m ent .
8 . S p eak Mod u l e
C onvert s t ext -based responses i nt o speech usi ng P yt t sx3, a t ext -t o -
s peech conv ersi on l i brary.
Ensures a n at ural an d sm oot h voi ce out put t o enhance us er e xperi enc e.
Faci l i t at es cl ea r and eff ect i ve com m uni c at i on bet we en t he syst em and
t he user.
4 . 3 Non -Fu n cti on al Req u i rem en ts
We have ensured t h at t he syst em m eet s t he fol l owi ng non -funct i onal
requi rem ent s, fo cusi ng on perform ance, usabi l i t y, and rel i ab i l i t y:
P erform anc e:
We have opt i m i zed t he syst em t o process and respond wi t hi n 2 -3
s econds of r ecei vi ng user i nput .
B ackground se rvi ce s l i ke bat t ery m oni t ori ng run e ffi ci ent l y wi t hout
i nt erferi ng wi t h pri m ary funct i ons.
Us abi l i t y:
The syst em i s desi gned t o be i nt ui t i ve, requi ri ng m i ni m al effort from
t he user.
S peech -bas ed i nt er a ct i on i s i m pl em ent e d t o be n at ural and r esponsi ve.
22
R el i abi l i t y:
We have h andl ed error sc enari os sm oot hl y, ensuri ng meani ngful
fal l back r esponses.
Int e rnet -d ependent m odul es provi de cl ea r not i fi cat i ons when
connect i vi t y i s l ost .
S ecuri t y:
No sensi t i ve user da t a i s st ored perm an e nt l y t o m ai nt ai n privacy.
We have m i ni m i z ed syst em perm i ssi ons to prevent securi t y
vul nerabi l i t i es.
S cal abi l i t y:
The a rchi t ect ur e i s desi gned t o support fut ure enhanc em ent s , such as
addi t i onal aut om at i on feat ur es, ensu ri ng l ong -t erm usabi l i t y and
adapt abi l i t y.
4.4 Process Description
1 . Mai n Program
• pyttsx3 → A text-to-speech library that enables the system to convert text into
speech.
23
• batteryStatus.check_battery → Monitors the battery level and provides status
updates.
Converts text to speech and waits until the speech output is complete.
24
Funct i on: speech_ re cog()
P urpose: C apt ures u ser spee ch, pro cesse s i t , cl assi fi es t he i nput , and
t ri ggers t he approp r i at e m odul e.
a . In i ti ati n g In ternet Q u ery i n a S ep arate T h read
• openi ng_i nt ernet = t hre adi ng.Thread(t arget =i nt ernet ,
daem on= Fal se) → C reat es a new t hr ead for t he i nt ernet () fu nct i on.
• openi ng_i nt ernet .st a rt () → S t art s t he t hr ead, ensu ri ng real -t i m e
query handl i ng wi t hout bl ocki ng t he m ai n process.
b . In i ti al i zin g S p eech Recogn i ti on
• r = sr.R ecogni zer( ) → C reat es an i nst ance of t he speech re c ogni t i on
engi ne.
• wi t h sr.Mi crophon e() as sourc e: → Us es t he syst em m i cro phone t o
capt ure voi c e i nput .
• r.pause_t hreshol d = 1 → Al l ows a 1 -second pause be fore st oppi ng
l i s t eni ng.
• audi o_dat a = r.l i s t en(source, 0, 15 ) → Li st ens for up t o 15 seconds
of user spe ech i nput .
25
c. Proc essi n g S p eech In pu t
• q = r.r ecogni z e_g oogl e(audi o_dat a, l a nguage=" en" ) → C o nvert s t he
capt ured audi o i nt o t ext usi ng Googl e S peech R e cogni t i on.
• pri nt (f" User spok e: {q}" ) → Di spl a ys t he re cogni zed t ext for
debuggi ng.
d . Cl assi fyi n g User In p u t
• x = cl assi fy_t ex t (q) → Ini t i at es B ER T t ext cl assi fi cat i on t o
det erm i ne t he i nput t ype.
e. H an d l i n g Di fferen t Q u ery T yp es
• If cl assi fi ed as " Int e rnet Query"
o C al l s sendkeys(q, speak) t o pro cess gen e ral user que ri es.
o P ri nt s: " cal l i ng i nt ernet query" .
• If cl assi fi ed as " C onversat i on"
C al l s sent i m ent (q, speak) t o an al yze s en t i m ent .
o C al l s convo(q, spea k) t o gener at e a con versat i onal respons e .
o P ri nt s: " cal l i ng conversat i onal m odul e" .
• If cl assi fi ed as " Aut om at i on Query"
o C al l s appl i cat i on(q, speak) t o ex ecut e aut om at i on t asks (e .g.,
openi ng/ cl osi ng appl i cat i ons).
o P ri nt s: " cal l i ng aut om at i on query" .
f. H an d l i n g E rrors
• ex cept Ex cept i on as e: C at ches any sp eech re cogni t i on er r ors (e.g.,
uncl ear audi o, no i nt ernet ).
• pri nt (" S peech re c ogni t i on error: " , st r(e) ) → Di spl ays t he er ror
m essage for d ebuggi ng.
Fu n cti on : cl assi fy_text(q )
P urpose: C at egori ze s user i nput i nt o Int ernet Qu ery, Aut om at i on, or
C onversat i on usi ng a fi ne -t uned B ER T m odel .
2 . T rai n i n g BE RT for T ext Cl assi fi cati on
26
• We t rai ned B ER T on a cust om dat as et cont ai ni ng l abel ed q ueri es for
accur at e cl assi fi cat i on.
• The m odel l ea rns pat t erns i n user i nput and assi gns t h e cor re ct
cat egory.
a . Processi n g User In p u t
• S peech R ecogni t i on capt ures us er i np ut and convert s i t t o t ext .
• cl assi fy_t ext (q) s ends t he t ext t o t he B ER T m odel , whi ch ret urns a
cat egory.
b . Cl assi fi cati on Ou tp u t & Rou ti n g
• " Int e rnet Que ry" → P assed t o i nt erne t
• " C onversat i on" → P assed t o sent im ent anal ysi s & conversat i on
m odul e.
• " Aut om at i on" → Rout ed t o t he syst em aut om at i on m odul e t o execut e
com m ands
B attery ch e ck i n g mod u l e :
The scri pt m oni t ors t he bat t ery st at us of a syst em and t akes n ecessa ry
act i ons based on t he bat t ery l evel . It us es t he psut i l m odul e t o ret ri eve
bat t ery i nfo rm at i on and i m pl em ent s l ogi c t o check bat t ery p ercent age
and charg er st at us.
27
1. Retrieve Battery Status:
28
▪ The assistant speaks: "Power levels are critically low,
[battery]%."
29
• selenium.webdriver → Automates web interactions by controlling a browser.
• By, WebDriverWait, EC → Helps locate elements on the webpage and wait for
them to load.
internet() function → This function is responsible for opening the target website.
driver.get("URL") → Directs the browser to the given URL.
print("accessing website") → Logs the start of the process.
print("accessed website") → Confirms that the page has loaded successfully.
30
Function: sendkeys(x, speak)
Purpose: Sends the user’s query to the webpage, retrieves the response, and speaks it
aloud.
1. Initial Processing
• Notifies the user that the request is being processed.
• Locates the input box on the webpage where the query will be entered.
31
• Constructs an XPath expression to locate the response dynamically.
6. Error Handling
• If an unexpected issue occurs (e.g., missing elements, loading failure), asks the
user to repeat the query.
• Conversational Module
• The Conversational Module is responsible for generating responses to general
user queries. It also provides emotional support by analyzing the user's
sentiment before responding. Below is a breakdown of how this module
functions.
2. Execution Flow
1. Speech Recognition captures the user’s query and converts it into text.
32
o If the sentiment is neutral or positive, it directly generates a response.
4. The generated response is then passed to the Speak Module for output.
33
• transformers (Hugging Face) → Loads pre-trained NLP models for sentiment
classification and response generation.
• AutoTokenizer → Converts text into tokens for processing.
• AutoModelForSequenceClassification → Loads a pre-trained RoBERTa model
for sentiment analysis.
• softmax (scipy.special) → Converts raw model outputs into probabilities.
warnings → Suppresses unwanted warning messages.
34
random, json → May be used for generating responses dynamically.
Function: sentiment(x, speak)
• Suppresses warnings to prevent unnecessary console messages.
• Tokenization: Converts user input into a format that the RoBERTa model can
process.
• return_tensors="pt" → Returns tokens in PyTorch tensor format for model
compatibility.
35
• Feeds tokenized text into the model to generate sentiment scores.
• Extracts raw logits (numerical values) from the model output.
• Converts logits into a NumPy array (scores) for further processing.
36
• get_intents() function
• Uses the zero-shot classification model to match the user’s input with the most
relevant intent.
• Returns the best-matching emotional category from the list of predefined
intents.
• Calls get_intents() to analyze the user's input and determine their emotional
state.
• Stores the identified intent in detected_input.
• Randomly selects a response from the JSON dataset related to the identified
intent.
• temp stores:
• A motivational quote related to the user’s struggle.
• Practical advice on how to implement a solution in daily life.
• The speak() function delivers this guidance to the user.
37
Explanation :
• random → Used to introduce variability in system responses, ensuring
conversations feel natural and dynamic.
• time → Helps track the duration of response generation and manage
delays for an optimized experience.
• ollama → A library used to communicate with the AI model for
conversational response generation.
• convo() function → This function handles user interactions and generates AI-
based responses.
• x (input parameter) → Stores the user’s spoken query (converted into text).
• speak() function → Converts text into speech, delivering the AI-generated
response audibly.
• complete_response → Stores the final AI-generated response that will be
spoken to the user.
38
• client = ollama.Client() → Initializes the Ollama AI client, which is
responsible for querying the AI model and retrieving responses.
• model = "jarvis1" → Specifies the AI model used for generating
conversational replies.
• prompt = x → The user’s query (x) is passed as an input prompt to the AI
model.
• start_time = time.time() → Records the exact time when the response
generation begins.
• client.generate(...) → Calls the AI model (jarvis1) to generate a response based
on the user’s input.
• stream=True → Enables real-time streaming, meaning the response is
processed in chunks instead of waiting for full completion.
• shown_messages = set() → Keeps track of the system-generated updates during
response generation.
• for part in response: → The function iterates through the AI-generated
response as it streams in chunks.
• elapsed_time = time.time() - start_time → Calculates the time elapsed since
response generation began.
39
Automation Module
The Automation Module processes user requests for system-related tasks such as
playing music, opening applications, or setting breaks. It uses cosine similarity to match
user input with predefined commands and then executes the corresponding action.
It can open applications like :
• Open applications such as YouTube, YouTube Music, and Microsoft Word.
• Control media playback using hand gestures (play, pause, close applications).
• Set timed breaks with an audible reminder.
• Match user input with predefined automation tasks using cosine similarity for
intent recognition.
This module ensures hands-free automation, making it easier to control the system
using voice commands. Below is a detailed explanation of how it functions.
40
automated_tasks dictionary → Stores predefined automation commands and
their possible variations.
Each task (key) is mapped to multiple ways a user might request it (list of
phrases).
41
Encodes each predefined task name into a vector representation using the model
: all-MiniLM-L6-v2
.Stores these encoded values in the automated_tasks dictionary.
42
• If no command closely matches the user’s input (similarity score < 0.4), the
system:
o Clears the similarity scores.
• Asks the user to repeat the command or informs them that the application
• If the command is for music playback, the system starts playing a song.
43
• If the command is for Microsoft Word, the system opens a new Word document
• If the command is for setting a break, the system activates the break timer and
informs the user.
44
Hand Gesture Control for Media and Applications
The Hand Gesture Control Module allows users to control media playback,
volume, and application closing using hand gestures detected via a webcam.
It utilizes OpenCV, MediaPipe, and PyAutoGUI to interpret hand movements
and trigger the corresponding system actions.
The following image provides a clear depiction of the gestures, helping you
recognize each move effortlessly
45
Importing Required Libraries
• OpenCV (cv2) → Captures real-time video from the webcam.
• MediaPipe (mp) → Detects and tracks hand landmarks.
• PyAutoGUI (py) → Simulates keyboard actions for media control.
• Threading (threading) → Enables gesture recognition to run in a separate
thread.
• Math (math) → Used for distance calculations in volume control.
• Time (time) → Introduces delays to prevent multiple unintended inputs.
Initializing Global Variables
• stop_thread → A flag to stop the gesture detection thread when needed.
• scissors_event → A threading event used to detect and trigger the closing
application gesture.
Function: play_pause()
1. Setting Up the Webcam and MediaPipe Hands Model
• Opens the webcam for capturing live video.
• Initializes the MediaPipe Hands model to detect hand landmarks.
2. Processing Video Frames for Gesture Recognition
• Converts each video frame from BGR to RGB (needed for MediaPipe
processing).
• Detects hand landmarks and overlays the skeleton representation on the hand.
3. Extracting Finger Landmark Positions
• Retrieves the y-coordinates of key finger joints:
o Index, Middle, Ring, and Pinky fingers (PIP & TIP joints).
o Thumb tip for volume control calculations.
46
Gesture-Based Controls
1. Play/Pause Media Control
• If all fingers are folded (except the thumb) → Triggers Play (space key).
• If all fingers are extended → Triggers Pause (space key).
2. Volume Control Gesture
• If only the index finger is folded while others are extended → Activates Volume
Control Mode
• Calculates the distance between the thumb and index finger:
o If distance > 40 → Increases volume (volumeup key).
o If distance < 40 → Decreases volume (volumedown key).
Closing the Application (Scissor Gesture)
• If only the index and middle fingers are folded while others are extended →
o Displays "Scissors (Closing App)" message.
o Triggers Ctrl + W → Closes the active application.
o Triggers Enter → Confirms the action if needed.
47
o Signals the scissors_event → Ends the gesture detection loop.
48
Recognizing the "Scissors" Gesture for Closing and Saving
• The system analyzes finger positioning to determine if the user is making the
"Scissors" gesture:
o Index and Middle Fingers: Folded (Tip below PIP joint).
o Ring and Pinky Fingers: Extended (Tip above PIP joint).
49
Ending the Gesture Detection Loop
• Once the closing and saving gestures are executed, the system stops detection:
o Flags application_closed = True to indicate that the process has
completed.
o The webcam feed is released, and OpenCV windows are closed.
50
5.STRUCTURED SYSTEMS ANALYSIS AND DESIGN
(SSAD)
1. In trod u cti on
S t ruct ured S yst em s Anal ysi s and Des i gn (S S AD) i s a s yst em at i c
approach t o anal yz i ng and desi gni ng an i nfo rm at i on s yst em by
breaki ng i t down i nt o m anageabl e com ponent s. Thi s m et hod focuses
on dat a fl ow, proce sses, and dat a st ora ge t o ensure a st ruc t ured and
effi ci ent desi gn. S S AD em pl oys graphi cal m od el s such as
Decom posi t i on Di agram s, Dat a Fl ow Di agram s ( D FD), Ent i t y -
R el at i onshi p (E -R ) Di agram s, and S t ru ct ure C hart s t o i l l ust rat e t he
s ys t em ’s archi t ect u r e.
The purpose of usi ng S S AD i n t hi s project i s t o provi de a cl ear vi sual
repres ent at i on of t he dat a fl ow, pro cess st ruct ure, and l ogi cal
rel at i onshi ps wi t hi n t he ch at bot syst em . The chat bot r el i es on A I -
dri ven Nat ur al Lang uage P roc essi ng (N LP ), S ent i m ent Ana l ysi s, and
R esponse Gene rat i on, m aki ng S S AD essent i al for und erst andi ng
s ys t em i nt eract i ons .
5 .1 Decom p osi ti on Di agram
A De com posi t i on Di agram (Top -down approach ) br eaks t h e syst em
i nt o sm al l er sub -co m ponent s t o underst and t he hi e rar chy o f processes .
The chat bot syst em can be brok en down i nt o:
Us er Input H andl i ng – C apt ures user t ex t and prepro cesses i nput .
S ent i m ent Anal ysi s – Det erm i nes i f t he user’s m essage i s P osi t i ve,
Neut ral , or N egat i ve .
Int ent C l assi fi c at i on – Id ent i fi es t he pur pose of t he m essag e .
R esponse Gene rat i o n – P roduces an app ropri at e ch at bot res ponse.
Us er Int er act i on S t orage – Mai nt ai ns conversat i on hi st ory for
l earni ng.
51
5 .2 Data Fl ow Di ag ram (DFD)
A DFD r epres ent s how dat a m oves t hrough t he syst em . It i s
cat egori z ed i nt o:
Level 0 - C ont ext D i agram
The User i nt e ract s wi t h t he syst em by provi di ng t ext i nput.
Ext ernal syst em s (such as dat abases o r sent i m ent m odel s) m ay be
i nt egrat ed.
Level 1 - D et ai l ed D at a Fl ow
Us er ent e rs i nput → P assed t o NLP proc essi ng.
Tokeni zat i on and S ent i m ent Anal ysi s → Det erm i nes t he em ot i on
behi nd t he m essage.
Int ent R ecogni t i on Modul e → C l assi fi es t he purpose (e.g., quest i on,
greet i ng, com pl ai nt ) .
R esponse Gene rat i o n → A I gene rat es an appropri at e repl y.
C hat bot sends respo nse t o user.
Thi s DFD ensur es l ogi cal fl ow and se am l ess i nt era ct i ons bet we en
com ponent s.
52
5 .3 E n ti ty -Rel ati on sh i p (E -R) Di agram
An E -R Di agram de fi nes t he rel at i onsh i p bet ween di ff eren t ent i t i es
wi t hi n t he syst em .
Ent i t i es and At t ri but es
Us er (Us er_ ID, Nam e, Message )
C hat bot (A I_Model , R esponse_ ID, Ti m e st am p)
S ent i m ent Anal ysi s (Message_ ID, S ent i m ent _Type)
Int ent R e cogni t i on (Int ent _ ID, Int ent _C at egory)
R el at i onshi ps
A User sends m ul t i pl e m essages t o t he c hat bot .
Each m essa ge u ndergoes S ent i m e nt Anal ysi s and Int ent
C l assi fi cat i on.
The C hat bot gener at es a respons e based on sent i m ent and i nt ent .
Thi s st ruct ure ensures effi ci ent dat a m anagem ent an d syst em
s cal abi l i t y.
53
5 . 4 S tru ctu re Ch ar t
A S t ruct ure C hart i s a hi erar chi cal re present at i on of t he m odul ar
com ponent s of t he s yst em .
M ai n S yst em
├ ─ ─ User Input P rocessi ng
│ ├ ─ ─ Tokeni zat i on
│ ├ ─ ─ Text P reprocessi ng
├ ─ ─ S ent i m ent Anal ysi s
│ ├ ─ ─ NLP -based sent i m ent det ect i on
├ ─ ─ Int ent R ecogni t i on
│ ├ ─ ─ AI m odel cl assi fi cat i on
├ ─ ─ R esponse Gen erat i on
│ ├ ─ ─ AI Model (B l enderbot -400M)
│ ├ ─ ─ Zero -shot cl assi fi cat i on
Thi s m odul ar desi gn enhanc es m ai nt ai na bi l i t y and debuggi ng.
54
5 . 5 S S AD Meth od ol ogy Used i n th e Project
1 . Req u i rem en t An al ysi s
Id ent i fy chat bot funct i onal i t i es (sent i m ent anal ysi s, i nt ent
cl assi fi cat i on, respo nse gener at i on).
Gat her syst em r equi rem ent s (soft w are, h ardwa re, A I m odel s) .
2 . S ystem Desi gn Usi n g S S AD
Decom pose t he syst em i nt o st ruct ured c om ponent s.
Desi gn Dat a Fl ow and Ent i t y R el at i onshi ps t o ensure sm oot h
i nt eract i on.
3 . S ystem Imp l em en tati on
Devel op m odul ar pr ogram s for t ext pro cessi ng, A I -bas ed s ent i m ent
anal ysi s, and respon se gener at i on.
4 . T esti n g & Mai n ten an ce
Uni t t est i ng for ea ch m odul e (e.g., N LP , AI p rocessi ng, respons e
generat i on).
Int eg rat i on t est i ng t o ensure prop er syst em -wi de fun ct i onal i t y.
5 . Ad van tages of U si n g S S AD
S t ruct ured Appro ac h: Ensures a syst e m at i c breakdown o f syst em
funct i onal i t i es.
B et t er Dat a Fl ow Underst andi ng: G raphi cal m odel s ( DFD, E -R
Di agram ) si m pl i fy syst em i nt eract i ons.
Im prov ed Mai nt ai n abi l i t y: Modul ar st ruct ure fa ci l i t at es debuggi ng
and fut ure upg rades.
S cal abi l i t y: S upport s i nt egrat i on of addi t i onal AI m odel s or
funct i onal i t i es i n t he fut ure.
55
OBJECT-ORIENTED ANALYSIS AND DESIGN (OOAD)
1. In trod u cti on
Obj ect -Ori ent ed A nal ysi s and Desi g n (OOAD ) is a soft ware
engi neeri ng appro ac h t hat uses obj ect s as t he fundam ent al bui l di ng
bl ocks of a syst em . It focus es on i dent i fyi ng real -worl d ent i t i es, t hei r
at t ri but es, behavi or s , and rel at i onshi ps t o cre at e a m od ul ar and
s cal abl e syst em . The OOAD m et hodol ogy i s used t o m odel t he chat bot
s ys t em , al l owi ng for reusabi l i t y, m ai nt ai nabi l i t y, and an effi ci ent
s oft ware st ru ct ure.
The ch at bot syst em , power ed by Nat ur al Languag e P roc essi ng (NLP ) ,
S ent i m ent Anal ysi s, and AI -d ri ven re sponse generat i on, benefi t s
s i gni fi cant l y from O OAD pri nci pl es. Thi s approach ensures t hat each
funct i onal com pone nt , such as User Input P rocessi ng, Sent i m ent
Anal ysi s, Int ent C l assi fi cat i on, and R esponse Gener at i on, is
encapsul at ed i nt o di st i nct obj ect s t hat i nt eract se am l essl y.
2. Ph ases of O O AD
OOAD consi st s of t wo key phases:
1 . Obj ect -Ori ent ed Anal ysi s (OOA ) – Ident i fi es s yst em
requi rem ent s usi ng obj ect s and t hei r i nt eract i ons.
2 . Obj ect -Ori ent ed D esi gn (OOD) – S t ruct u res t hese obj ect s i nt o a
wel l -defi ned syst em archi t ect u re.
56
Act ors:
• User: S ends m essag es t o t he chat bot .
Us e C ases:
• User i nput s t ext
57
3.3 Act i vi t y Di agra m
An Act i vi t y Di agra m represent s t he wo rkfl ow of t he chat bo t syst em .
1 . User ent e rs i nput
58
3.5 S equence and C ol l aborat i on Di agra m s
S equence Di agram
A S equence Di agr am shows t he i nt eract i on order bet ween com ponent s .
Us er → C hat bot → S ent i m ent Anal ysi s → Int ent C l assi fi cat i on →
R esponse Gene rat i o n → User
Thi s represent at i on hel ps underst and m essage fl ow ac ros s syst em
m odul es.
C ol l aborat i on Di agr am
A C ol l aborat i on D i agram depi ct s ho w obj ect s work t o get her t o
execut e a funct i on.
• The User obj ect i nt e ract s wi t h t he C hat b ot obj ect .
• The C hat bot obj e ct cal l s S ent i m ent Anal ysi s and Int ent
R ecogni t i on obj ect s.
Thi s di agram hel ps vi sual i ze obj ect dep endenci es and i nt era ct i ons.
59
3.6 C l ass Di agram
A C l ass Di agram de fi nes obj ect s, at t ri b ut es, and t hei r rel at i onshi ps.
Key C l asses i n t he C hat bot S yst em :
• User C l ass: S t ores user ID, m ess age hi st ory.
60
Thi s st ruct ure enabl es a m odul ar and re usabl e desi gn.
61
Thi s ensures t hat al l chat bot funct i onal i t i es t ransi t i on sm oothl y.
62
B y m odul ari zi ng c om ponent s, t he syst em rem ai ns scal abl e and
adapt abl e.
Thi s st ruct ure ensur es sm oot h i nt egrat i on bet ween com pone nt s.
63
4. Advant ages of O OAD
• Encapsul at i on: Dat a and fun ct i ons ar e bundl ed i nt o obj e ct s ,
i m provi ng securi t y.
64
FORM DESIGNING
1. Introduction
Form desi gn pl ays a cru ci al rol e i n ensuri ng a se am l ess user
experi en ce by st ruc t uri ng i nt eract i ons effi ci ent l y. In our A I ch at bot
s ys t em , t he form i s desi gned t o be i nt ui t i ve and voice -dri v en,
el i m i nat i ng t he need for m anual i nput . U nl i ke t radi t i onal ch a t bot s t hat
rel y on but t ons or t ext fi el ds, our s yst em pri ori t i ze s speech
recogni t i on, al l owi ng users t o i nt eract n at ural l y.
Our A I Assi st ant syst em rel i es on sp eech recogn i ti on (sp eec h _recog)
for user i nt er act i on, el i m i nat i ng t he need for m anual i nput but t ons.
The desi gn ensur es a seam l ess and n at ural conve rsat i on exp eri enc e.
Us er In p u t H an d l ing
• Voi ce In p u t O n l y : Users i nt era ct t hrou gh speech r ecogni t i on.
• Au tom ati c Detecti on : The syst em l ist ens for user speech
wi t hout requi ri ng a m anual t ri gger.
• Cl ari fi cati on Req u ests : If i nput i s uncl ear or un re cogni zed, t he
chat bot asks for cl ar i fi cat i on.
65
Gesture-Based Controls :
o R ecogni zes hand g e st ures usi ng a webc am .
o Execut es pred efi ned act i ons based on det ect ed gest ur es
(e.g., pl ay, paus e, cl ose an appl i cat i on).
o P rovi des vi sual feed back t o t he user.
66
Au tom ati on T ask for O u tp u t:
o Execut es a ct i ons l i ke:
▪ Pl ayi n g Mu si c: If t he use r s ays, "P l ay som e m usi c ," t he
assi st ant wi l l open t he def aul t m edi a pl a yer or pl ay a song fr om
an onl i ne source.
▪ O p en i n g Ap p li cati on s: C om m ands l ike " Open Word" or
" Launch YouTube" wi l l t ri gger t he resp ect i ve appl i cat i ons.
▪ Con trol l i n g Med i a : Users can say " P a use vi deo" , " In cre a s e
vol um e" , or use gest ures t o perfo rm t hes e act i ons.
67
DATA DICTIONARY
1. In trod u cti on
A Dat a Di ct i onary i s a st ruct ured col l ect i on of dat a el em ent s used
wi t hi n a syst em .
For t he A I ch at bot syst em , t he dat a di ct i onary i ncl udes fi el d s for user
i nput , chat bot responses, sent i m ent anal ysi s, conversat i on hi st ory,
and syst em l ogs. Thi s docum ent at i on i s essent i al for devel opers ,
dat abase adm i ni st rat ors, and an al yst s t o underst and dat a rel a t i onshi ps
and depend enci es.
2. Im p ortan ce of a Data Di cti on ary
• Ensures C onsi st enc y: Defi nes st andard form at s fo r st ori ng and
ret ri evi ng dat a.
• C onst rai nt s: S peci fi es uni que, nul l , or r equi red val ues.
68
• Defaul t V al ue: Th e i ni t i al val ue ass i gned i f no i nput i s
provi ded.
69
4.1 User Input D at a
Field Name Data Descripti Example Usage Constraint
Type on Values s / Rules
Stores the Capturing
Input_Text raw user "What’s the user None
String speech weather commands
input. today?"
Classifies Determines Must
Input_Type the type "Question", response match
String of user "Command behavior predefined
input. ", categories
"Greeting"
Represent Determines
Confidence_Scor s accuracy of Must be
e confidenc recognition between 0
Float e level of 0.0 - 1.0 -1
speech
recognitio
n.
Identifies Language Must
Language the "en", "es", processing match
String language "fr" supported
of the languages
input.
The Used for
Processed_Text cleaned- "Play a processing None
String up version song"
of user
input.
Detected Personalize Must
Emotion_Detecte emotion "Neutral", d response match
d String from user "Angry", generation predefined
input. "Happy" emotions
70
4.2 C hat bot R esponse Dat a
Field Name Data Descripti Example Usage Constrain
Type on Values ts / Rules
The AI-
Respons_text String generated "Sure, Displays
response opening output to None
to user YouTube..." user
input.
Categoriz "Answer", Must
Response_Type String es the "Command Helps in match
response. Execution", response predefined
"Clarificatio handling categories
n"
Emotion "Neutral", Must
Emotion_Tone String conveyed "Supportive Personalizes match
in ", AI tone predefined
response. "Encouragin tones
g"
Indicates
Execution_Statu Boolea if a Tracks
s n command success/fail None
was True, False ure of
successful actions
ly
executed.
Indicates Enables
Followup_Requi Boolea if AI multi-turn
red n needs to True, False conversatio None
ask a ns
follow-up.
71
4.3 S ent i m ent Anal ysi s Dat a
Field Name Data Description Example Usage Constraints
Type Values / Rules
72
4.4 Int e rnet C l assi fi cat i on Dat a
Data Descri p ti on E xamp l e
Fi el d Name Typ e Val u es Usage Ru l es
73
4.6 Error Logs and S yst em Moni t ori ng
Field Name Data Description Example
Type
74
percent age wi t hi n t he 0-100% ran ge t o t ri gger appropri at e
power -s avi ng act i on s.
• Au tom ati on Comman d Veri fi cati on : E xecut ed com m ands m us t
m atch p red efi n ed au tom ati on task s t o prevent uni nt en ded
act i ons.
• S en ti m en t An al ysi s Accu racy : Em ot i on det ect i on m us t
corre ct l y cl assi fy t ext i nt o predefi n ed cat egori es: Posi ti ve,
Neu tral , or N egati v e.
75
TEST CASES
1. In trod u cti on
Tes t cases ar e st ru ct ured sc enari os us ed t o veri fy t h e co rrect n ess ,
funct i onal i t y, and perform an ce of a syst em . They hel p i dent i fy
pot ent i al bugs, ensure soft wa re m e et s requi rem ent s, and i m prove
rel i abi l i t y. Each t e st case consi st s of t est i nput s, execut i on st eps ,
expect ed out put s, an d act ual resul t s.
For t he A I chat bot syst em , t est cases w i l l cover fun ct i onal i t i es such
as user i nput h andl i ng, chat bot r esponses, sent i m ent anal ys i s, i nt ent
cl assi fi cat i on, er ror handl i ng, and syst em perform an ce.
2. Im p ortan ce of T est Cases
• Ensures Fun ct i onal i t y: C onfi rm s t hat al l feat ur es work as
i nt ended.
76
5. Expect ed R esul t : Th e ant i ci pat ed out put .
6. Act ual R esul t : The observed out com e af t er exe cut i on.
7. S t at us: P ass/ Fai l based on com pari son bet ween exp ect ed a nd
act ual resul t s.
77
4.2 C hat bot R esponse Test C ases
78
4.3 S ent i m ent Anal ysi s Test C ases
Tes t
Test Test E xp ected Actu al
Cas e Precon d i ti on s S tatu s
S cen ari o S tep s Resu l t Resu l t
ID
1. Ent er
S yst em
Det ect "I am S ent i m ent
T C- S yst em is det ect s
posi t i ve happy" det ect ed P as s
007 runni ng posi t i ve
sent i m ent 2. corre ct l y
sent i m ent
S ubm i t
1. Ent er S yst em
Det ect S ent i m ent
T C- S yst em is "I am det ect s
negat i ve det ect ed P as s
008 runni ng sad" 2. negat i ve
sent i m ent corre ct l y
S ubm i t sent i m ent
1. Ent er
S yst em
Det ect " It i s a S ent i m ent
T C- S yst em is det ect s
neut ral norm al det ect ed P as s
009 runni ng neut ral
sent i m ent day" 2. corre ct l y
sent i m ent
S ubm i t
79
4.4 Int ent C l assi fi c a t i on Test C ases
IC-04 Small Talk Engage in casual Detects small talk and Pass
Handling conversation responds accordingly
IC-05 Task-Based Request a specific task Classifies correctly and
Intent (e.g., "Set a processes the request Pass
reminder")
IC-06 Ambiguous Provide an unclear Requests clarification or
Input phrase selects the closest intent Pass
80
4.5 Error H andl i ng Test C ases
Test Scenario Steps Expected Result Status
Case
ID
81
CONCLUSION
The development of our AI assistant marks a significant step toward intelligent and
interactive virtual assistance. This project integrates various technologies, including
speech recognition, automation, and natural language understanding, to create a system
that efficiently processes user commands and executes appropriate actions. By
leveraging Python, SpeechRecognition, psutil, and threading, we have built a
framework that can listen, interpret, and respond in real time, enhancing user
experience and productivity.
One of the core aspects of this AI assistant is its ability to classify user inputs and
respond accordingly. The system can identify whether a query pertains to internet
searches, general conversation, or task automation, and based on this classification, it
directs the request to the appropriate module. This structured approach ensures that
commands are processed accurately and efficiently, minimizing unnecessary
computations and optimizing response time.
Additionally, task automation plays a crucial role in this project. The assistant can open
and control applications, execute system commands, and manage various workflows,
reducing the need for manual intervention. This feature is particularly beneficial for
users who seek a hands-free, voice-controlled experience for multitasking.
Furthermore, by integrating threading, the assistant can run automation tasks in the
background without interrupting the core speech recognition functionality.
A key enhancement in our AI assistant is its ability to provide real-time battery
monitoring. Using the psutil library, the system continuously checks the device’s
battery level and alerts the user when power is critically low. If the battery percentage
drops below a threshold without a charger plugged in, the assistant notifies the user and
takes necessary action, such as closing active applications. This proactive approach
helps in preventing system shutdowns due to low power, ensuring a smooth user
experience.
Beyond functionality, the AI assistant is designed to offer emotional support and
82
engaging interactions. Inspired by virtual assistants like J.A.R.V.I.S. and F.R.I.D.A.Y.,
it delivers responses with a professional yet friendly tone, making interactions feel
natural. The assistant can recognize when the user needs a motivational boost,
encouragement, or even a break reminder. Notifications for resuming work, such as
"Alright, boss, time to get back to work. Let’s get things moving," ensure that the
assistant acts as both a productivity enhancer and a supportive companion.
While the current system offers a robust framework for intelligent assistance, there are
several areas for future improvements. Enhancing contextual understanding, improving
multi-turn conversation handling, and integrating external APIs for expanded
capabilities are potential upgrades that can make the assistant even more efficient.
Further refinement in machine learning-based intent recognition and better
personalization based on user preferences can significantly improve the assistant’s
adaptability.
In conclusion, this AI assistant serves as a strong foundation for intelligent virtual
assistance. Its real-time speech recognition, automation capabilities, battery
management, and engaging personality make it a valuable tool for users seeking
efficiency and convenience. As technology advances, expanding its capabilities will
further solidify its role as a reliable and intelligent digital assistant.
Key Takeaways & Future Enhancements
• Real-time Speech Recognition: The assistant listens and responds instantly,
ensuring smooth interaction.
• Multi-Tasking Capabilities: Background automation runs independently while
allowing new commands.
• Battery Monitoring & Power Management: Ensures smooth operation by
preventing shutdowns.
• Emotional & Motivational Support: Engages users with encouraging and
personalized responses.
• Advanced Task Execution: From opening applications to internet searches, it
streamlines workflows.
83
Upcoming Upgrades:
• Expansion of automation support for third-party applications and IoT devices.
• Enhanced security and personalization to tailor the experience to individual
users.
• We are planning to add more system applications for automation, enabling
control over system settings, file operations, and u
84
BIBLIOGRAPHY
Book
• Reema Thareja, Python Programming, Oxford University Press.
Tools & Technologies
• Python: Programming Language. [https://www.python.org/]
• OpenCV: Open Source Computer Vision Library. [https://opencv.org/]
• SpeechRecognition: Python library for speech-to-text processing.
• Mediapipe: Framework for gesture recognition.
[https://developers.google.com/mediapipe]
• LLaMA: Open-Source Large Language Model
Datasets & APIs Used
• Google Speech-to-Text API (for speech recognition).
• Pre-trained Machine Learning models for sentiment analysis.
• Open-source datasets for training gesture-based commands.
Inspiration Behind the Project
• Our AI assistant was inspired by the futuristic technology depicted in Iron Man,
particularly J.A.R.V.I.S. While our model is not as advanced, it takes a small
step in that direction by integrating real-time data fetching, conversational AI,
and application control through gestures and voice commands.
85
• tilities
86