0% found this document useful (0 votes)
28 views92 pages

AI Assistant

The document presents a dissertation titled 'AI Assistant' submitted for the BS-MS degree at Osmania University by students VPLN. Sri Harsha, V. Shreeja, and L. Abhilash. It outlines the development of an AI assistant that enhances user interaction through real-time data retrieval, human-like conversations, and gesture-based controls, addressing limitations of existing AI systems. The project utilizes advanced machine learning models like BERT and RoBERTa to provide a more personalized and intuitive user experience.

Uploaded by

Abhi Lash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views92 pages

AI Assistant

The document presents a dissertation titled 'AI Assistant' submitted for the BS-MS degree at Osmania University by students VPLN. Sri Harsha, V. Shreeja, and L. Abhilash. It outlines the development of an AI assistant that enhances user interaction through real-time data retrieval, human-like conversations, and gesture-based controls, addressing limitations of existing AI systems. The project utilizes advanced machine learning models like BERT and RoBERTa to provide a more personalized and intuitive user experience.

Uploaded by

Abhi Lash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 92

AI ASSISTANT

Dissertation submitted in partial fulfillment of the requirements


for the award of degree

of

BS-MS (CS)
By

VPLN. Sri Harsha

V. Shreeja

L. Abhilash

OSMANIA UNIVERSITY
HYDERABAD
2025
CERTIFICATION

This is to certify that the Project Report titled AI ASSISTANT submitted in partial

fulfillment for the award of BS-MS Programme of VVISM, OU. Hyderabad, was

carried out by VPLN. Sri Harsha, V. Shreeja, L. Abhilash under my guidance. This

has not been submitted to any other University or Institution for the award of any degree

/ diploma / certificate.

Name and address of the guide Signature of the Guide

Mrs. P. Shailaja ,
Vishwa Vishwani Institute Of
Systems And Management,
Hyderabad.
DECLARATION

I hereby declare that this Project Report submitted by me to the VVISM, OU.

HYDERABAD, is a Bonafide work undertaken by me and it is not submitted to any

other University or Institution for the award of any degree diploma/ certificate or

published any time before.

Name and Address of the student Signature of the student


V. Shreeja,
Mr. VPLN Sri Harsha ,
Mr. L. Abhilash.
Vishwa Vishwani Institute of
Systems And Management,
Hyderabad.
OSMANIA UNIVERSITY

BS-MS PROGRAMME

Proforma for Approval of Mini- Project Proposal

Enrollment No 217021026153 College Name: VVISM

Name and Address of the student: VPLN Sri Harsha, V. Shreeja,


L. Abhilash

Vishwa Vishwani Institute Of

Systems And Management. Hyderabad

Title of the Project : AI Assistant.

Software used in developing Project : Python, TensorFlow, Bert,

BART, Ollama, Roberta.

Name & Address of the Supervisor : Mrs. P.Shailaja ,

Vishwa Vishwani Institute Of

Systems And Management Hyderabad.

Signature of the Student Signature of the Supervisor

Date Date

SIGNATURE DIRECTOR
Abstract

Our AI Assistant is designed to transform how people interact with


technology on a daily basis. By using modern machine learning models
(state-of-the-art models), this assistant can understand commands and
emotions, responding in ways that feel natural and intuitive. It’s built to learn
from each interaction, adapting to user’s preferences overtime,and
offers a more personalized experience.

Some of the models used include BERT (Bidirectional Encoder


Representations from Transformers) and RoBERTa (Robustly optimized
BERT approach), and other advanced NLP models that allow the
assistant to understand context better and provide accurate responses. We
also employ machine learning techniques to ensure the assistant can improve
itself with every user interaction.

Although the assistant can handle various tasks, it is constantly improving to


better serve users. As it learns from each interaction, it learns and adapts
from the user and tries to offer more relevant content,ensuringa smoother
and more personalized experience. As development progresses, you will
notice a clear distinction between our assistant and others
Contents
S.NO TITLE PAGE NO
Introduction
1.1 Introduction of the Application 1
1.2 Draw backs of existing system 2
1
1.3 Proposed system 3
1.4 Advantages of Developed system 6
1.5 Description of modules 8

Literature Survey
12
2.1 Description of the software
2 Engineering Concepts Used 12
2.2 Description of Analysis and Design concepts used 12
2.3 Description of Tools used 13
2.4 Description of Methodology used.

Feasibility Study
15
3.1 Short Feasibility study Definition 15
3 3.2 Economical Feasibility 15
3.3 Technical Feasibility 16
3.4 Operational Feasibility

Object oriented Analysis and Design 53


53
5 5.1 Use Case Model
54
Use Case Specification-Format
55
5.2 Activity Diagram
56
5.3 Identification of Scenarios and Sequence Diagram
5.4 Collaboration Diagram 57
5.5 Class Diagram 59
5.6 Component Diagram 60
5.7 Deployment Diagram
Form Designing 62
6 62
6.1 Introduction
63
6.2 User Input Handling
63
6.3 Conversational Module
6.4 Gesture-Based Controls 64
6.5 Data Dictionary

Test Cases 72
7 7.1 Introduction 72
7.2 Test Case Structure 73
7.3 Test Case for AI Chat Bot System 74
7.4 Chat bot Response Test Case 75
7.5 Sentiment Analysis Test Case 76
7.6 Intent Classification Test Case 77
7.7 Error handling Test Case
8 Conclusion 78
9 Bibliography 81

.
INTRODUCTION

1.Introduction of the Application

Art i fi ci al Int el l i gen ce ( A I) i s t ransfo r m i ng t he way hum ans


i nt eract wi t h t echn ol ogy, enabl i ng m achi nes t o perfo rm t asks t hat
t radi t i onal l y requi re d hum an i nt el l i genc e. F rom voi ce assi st ant s and
s m art hom e aut o m at i on to sel f -dri vi ng cars and h eal t hcar e
di agnost i cs, A I i s rapi dl y evol vi ng a nd i nt egrat i ng i nt o vari ous
as pect s of our d ai l y l i ves. As m a chi ne l earni ng m odel s i m p rove, A I
s yst em s are b ecom i n g m ore i nt ui t i ve, ad apt i ve, and effi ci ent , m aki ng
i nt eract i ons wi t h t ec hnol ogy seam l ess an d m ore ac cessi bl e.
The evol ut i on of A I i s not j ust about autom at i on but al so about enhanci ng
product i vi t y, acc ess i bi l i t y, and conveni ence. Wi t h adv ance m ent s i n
com put er vi si on, nat ural l anguage proc e ssi ng, and deep l ear ni ng, AI
i s now capabl e of underst andi ng ges t ures, em ot i ons, and voi ce
com m ands, m aki ng hum an -com put er i nt eract i ons m ore nat u ral and
i nt ui t i ve.
In this rapidly evolving landscape, intelligent systems are being
developed to retrieve real-time information, engage in human-like
conversations, and respond dynamically based on user inputs. These
capabilities bridge the gap between humans and machines, making
interactions more engaging and personalized.
Taki ng a st ep furt her i n t hi s di rect i on, our proj ect i nt egrat es m ul t i pl e AI -
dri ven fun ct i onal i t i es. It fet ch es r eal -t i m e dat a, en abl es i nt eract i ve
hum an -l i ke conve rs at i ons, and al l ows users t o cont rol ap pl i cat i ons
t hrough hand gest ures. The syst em not onl y understands and

2
processes us er queri es but al so provi des seam l ess cont rol ov er sel e ct
appl i cat i ons such as m edi a pl ayers, You Tube vi deos, and docum ent
edi t ors. B y com bi ni ng t hese feat ur es, t h e proj e ct showc ases how A I
can enhanc e bot h com m uni cat i on a nd user i nt e ract i o n, m aki ng
t echnol ogy m ore i nt ui t i ve and effi ci ent .

Draw b ack s of exi s ti n g system


Art i fi ci al Int el l i gence (A I) assi st ant s such as S i ri , Al exa, Googl e
Assi st ant , and C ort ana have r evol ut i oni zed hum an -co m put er
i nt eract i ons. Howev er, despi t e t hei r ad vanced capabi l i t i es, t hey
com e wi t h sev eral l i m i t at i ons t hat hi nder fl exi bi l i t y, re al -t i m e
adapt abi l i t y, and m ul t i m odal i nt eract i on. B el ow are t he key
drawba cks of exi st i ng syst em s:

L i m i ted Real -T im e Data Fet ch i n g


• Most AI assi st ant s r el y on p red efi n ed datab ases or API -b ased
i n tegrati on s t o provi de i nform at i on.
• T h ey d o n ot fetch real -ti m e d ata dyn am i cal l y from ext er nal
sources, m aki ng r es ponses out dat ed i n c ert ai n si t uat i ons.
• For exam pl e, w eat h er updat es or st ock pri ces m ay com e f rom
d el ayed th i rd -p arty API sou rc es , l e adi ng t o i nconsi st enci es i n
real -t i m e appl i c at i ons.

L ack of Con textu al Un d erstan d i n g i n Con versati on s


• Many A I assi st ant s st ruggl e wi t h m ai n tai n i n g con text i n l on g
con versati on s .
• If a use r asks, " Wh at ’s t he we at her t od ay ?" fol l owed by " And
what about t om orrow? ", m ost syst ems fai l t o l i nk t he t wo

3
queri es.
• The r esponses o ft e n fe el rob oti c an d d i scon n ected , m a ki ng
t hem l ess nat ural i n hum an i nt eract i on.

No G estu re -B ased Con trol s for Ap p l i cati on s


• Whi l e som e AI sy st em s offer gestu r e con trol for sp ec i fi c
d evi ces , t hey d o n ot ext en d th i s fu n cti on al i ty to con trol l i n g
softwar e ap p l i cati on s .
• Users cannot p au se, p l ay, or cl ose ap p l i cati on s l i ke YouTu be,
m usi c pl ayers, or do cum ent edi t ors usi ng hand gest ures.
• Thi s l i m i t at i on reduces e ffi ci en cy i n h an d s -free i n te racti on s ,
especi al l y i n scena ri os l i ke present at i ons or ent ert ai nm e nt
syst em s.

Dep en d en cy on Cl ou d -B ased Processi n g


• Many A I assi st ant s rel y h eavi l y on cl ou d -b ased p rocessi n g ,
whi ch i ncre ases l at e ncy and rai s es p ri vacy con ce rn s .
• Users m ust have an acti ve i n tern et con n ecti on t o i nt eract wi t h
t hese syst em s, m aki ng t hem unrel i abl e i n offl i ne scen ari os.
• Dat a pri va cy beco m es a conc ern as con versati on s an d u ser
i n p u ts are often st ored i n cl ou d serve rs , l eadi ng t o pot en t i al
securi t y ri sks.

O ver-Rel i an ce on Voi ce Com m an d s


C ont rol l i ng t he appl i cat i on wi t h voi ce i n a noi sy background
can be chal l engi ng. For exam pl e, when a user says " i ncreas e
vol um e," background noi se m ay i nt erfer e wi t h accur at e
recogni t i on, m aki ng i t di ffi cul t for t he assi st ant t o i nt erpret t he

4
com m and cor rect l y. Thi s hi ghl i ght s t he need fo r an al t ernat i ve
cont rol m et hod t o ensure seam l ess i nt eract i on i n di f fe rent
envi ronm ent s.

1. 3.Prop osed syste m :


Our syst em enhan ces t he cap abi l i t i es of AI assi st an t s by
i nt roduci ng re al -t i m e dat a r et ri eval , h um an -l i ke convers at i ons, and
l i m i t ed gest ure -bas ed cont rol s for spe ci fi c appl i cat i ons. Unl i ke
t radi t i onal A I m od e l s t hat pri m ari l y r el y on voi ce com m an ds and
pre-exi st i ng dat ab ases, our assi st a nt i nt egrat es m ul t i pl e
i nt eract i on m et hods t o provi de a m or e dynam i c and e ffi ci e nt user
experi en ce.
Mu l ti -Mod al In tera cti on (Voi ce + G es t u re)
Most AI assi st ant s pri m ari l y depend on voi ce recogni t i on, whi ch
can b e ch al l engi ng i n noi sy envi ronm ent s or si t uat i ons wh ere users
prefe r si l ent i nt era c t i ons. Our assi st ant enhanc es ac cessi bi l i t y by
offeri ng a com bi nat i on of voi ce and g est ure -bas ed cont rol s .
• Voi ce Com m and s : Users can i nt era ct wi t h t he assi st ant nat ural l y
usi ng spoken queri es t o fet ch i nform a t i on, conduct searc hes ,
and m anage appl i cat i ons l i ke m usi c pl ayers and vi deo st ream i ng
servi ces.
• G estu re -B ased : Wh i l e voi ce com m ands work for g ener al
i nt eract i on, gest ures are support ed onl y for speci fi c
appl i cat i ons, such a s:
o YouTube Musi c & V i deos – P l ay, pause, st op, and adj us t
vol um e , cl osi ng appl i cat i on .
o Openi ng New Do cu m ent – C an t ake do wn not es for you .
• S eam l ess S wi tch ing : Users can ef for t l essl y swi t ch bet ween

5
voi ce and g est ures, m aki ng i nt eract i o ns m ore i nt ui t i ve and
adapt abl e.
B y com bi ni ng t hese i nt era ct i on m et hods, our syst em ens ures a
m ore fl exi bl e and user -f ri endl y experi e nce whi l e acknowl e dgi ng
t hat gest ures ar e l i m i t ed t o cert ai n appl i cat i on cont rol s.
• Real -T i m e Data R e tri eval : Unl i ke m an y A I m odel s t h at rel y on
st at i c responses fro m pre -de fi ned d at as et s, our assi st ant f e t ches
real -t i m e dat a from t he i nt ernet , ensuri n g t hat :
• Users rec ei ve up -t o -dat e and ac curat e i nform at i on at al l
t i m es.
• The assi st ant can respond t o dynam i c queri es, such as
weat her upd at es, st o ck m arket t r ends, an d breaki ng ne ws.
• The syst em rem ai ns const ant l y updat ed , rat her t h an bei ng
rest ri ct ed by of fl i ne dat aset s.
Thi s capabi l i t y m a kes t he assi st ant m ore i nt el l i gent an d
rel i abl e for re al -wor l d appl i cat i ons.
Con text -A ware Co n versati on s : A m aj or adv ancem ent of our A I
assi st ant i s i t s abi l i t y t o engage i n m ore nat ur al and i nt el l i gent
conversat i ons by un derst andi ng cont ext and user i nt ent .
• Mai nt ai ns C onversat i on Hi st ory: The assi st ant rem em bers
previ ous i nt eract i o ns, al l owi ng for c ont i nuous and cohe rent
di al ogues.
• Underst ands Me ani ng B eyond Words : Inst e ad of m er el y
respondi ng t o keyw ords, t he assi st ant anal yzes use r i nt ent t o
provi de m ore r el eva nt answers.
• Person al i zed E xp eri en ce : Over t i m e, t he assi st ant adapt s t o
user pr efe renc es, m a ki ng responses m or e t ai l ored and ef fect i ve.
Thi s feat ur e enhan c es t he user expe ri en ce by m aki ng t he as si st ant

6
feel m or e i nt e ract i v e and i nt el l i gent , ra t her t han j ust respo ndi ng
wi t h i sol at ed answer s.

. G estu re -B ased Con trol s : AI assi st ant s offer voi ce -base d

cont rol s but l ack an al t ernat i ve m et hod for i nt eract i ng wi t h


appl i cat i ons. Our syst em fi l l s t hi s gap by i nt roduci ng gest ure -
based cont rol s, t hou gh onl y for a l i m i t ed set of appl i cat i ons .
• Why Gest ure C ont ro l s?
o Voi ce com m ands m i ght not be i deal in noi s y
envi ronm ent s.
o S om e users m ay prefe r si l ent cont r ol m et hods over
speaki ng al oud.
o Users wi t h speech i m pai rm ent s m ay fi nd gest ures a m ore
acc essi bl e al t ernat i v e.
• Where Gest ure C ont rol s Work:
o Musi c and Vi deo P l ayers – C ont rol pl ayback (pl ay, pause,
st op, vol um e adj ust m ent s).
o YouTube Vi deos – Navi gat e t hrough vi deos usi ng si m pl e
hand gest ures.
o Docum ent S crol l i ng – Move bet we en p a ges or s crol l t ext
hands -fr ee.
Whi l e gest ure -base d com m ands enhan ce usabi l i t y, t hey a re not
appl i cabl e t o al l fu nct i onal i t i es —t hey pri m ari l y work fo r m edi a
and docum ent -r el at e d appl i cat i ons.
5. Onl i ne -Onl y Model for Enhan ced Fun ct i onal i t y
Our assi st ant req ui res an i nt ernet conne ct i on to f unct i on
effe ct i vel y, as i t :
• P rocesses re al -t i m e queri es dynam i cal l y i nst ead of rel yi ng on a
pre-st ored d at abas e.

7
• Fet ch es i nform at i on from onl i ne sources t o ensure accu racy and
rel evan cy.
• Enabl es bet t er m ac hi ne l earni ng ad ap t at i on by cont i nuousl y
i m provi ng responses .
Unl i ke offl i ne A I m odel s, whi ch oft en provi de l i m i t ed res ponses
due t o st at i c d at as et s, our assi st ant i s const ant l y l ea rni ng and
evol vi ng, m aki ng i t m ore i nt el l i gent and responsi ve.

1. 4.Ad van tages of t h e d evel op ed system


The proposed AI assi st ant i nt roduces a sm a rt er, m ore
i nt eract i ve, and ef fi ci ent approach t o v i rt ual assi st ance by
i nt egrat i ng re al -t i m e dat a ret ri eval , hum an -l i ke
conversat i ons, and gest ure -bas ed appl i cat i on cont rol . Thes e
i nnovat i ons provi de sever al adv ant ages t hat enh ance user
experi en ce and us ab i l i t y.

More Na tu ral an d E n gagi n g In teracti on


Our syst em m oves beyond basi c com m and -response A I by
m aki ng i nt eract i ons m ore fl ui d and hum an -l i ke.
Un d erstan d s Con text : Inst e ad of r espondi ng to i sol at ed
com m ands, t he ass i st ant rem em bers r ecent i nt er act i ons ,
i m provi ng conversat i on fl ow.
E n gages i n H um an -L i k e Con versati on s : Users feel l i ke
t hey’re i nt er act i ng wi t h a real assi st ant rat her t han a bot .
R educes t he Ne ed for R epet i t i ve C om m ands: The A I c an
i nfer what t he us er m eans wi t hout requi ri ng exact phr asi ng.

8
Fl exi b l e In p u t Op tion s for Users
Tradi t i onal A I assi s t ant s rel y on a si ngl e i nput m et hod, usual l y
voi ce. Our syst em offers m ul t i pl e i npu t opt i ons, m aki ng i t
adapt abl e t o di ff ere nt envi ronm ent s.
S upport s B ot h Voi ce and Gest ur es: Use rs can i nt er act i n a
way t hat best sui t s t hei r surroundi ngs.
Al l ows Con trol Wi th ou t T yp in g : B enefi ci al for hands - fre e
operat i on when wor ki ng or m ul t i t aski ng.
Mi ni m i zes Frust rat i on i n Noi sy Envi ronm ent s: Users can
rel y on gest ures wh en voi ce i nput i s di ffi cul t .
In tel l i gen t T ask Execu ti on B ased on Real -T i m e Data
Unl i ke A I m odel s t hat rel y onl y on pre -s t ored i nform at i on, o ur
assi st ant fet ches r ea l -t i m e dat a from t he i nt ernet t o m ake
bet t er deci si ons.
Provi d es L i ve Up d ates : Del i vers up -t o -dat e i nform at i on
i nst ead of rel yi ng o n out dat ed knowl ed ge.
Dyn am i cal l y Ad apts to User Need s : C an adj ust responses
based on new t r ends , event s, or user h ab i t s.
Red u ces S tal e Res p on ses : Ensures t hat i nform at i on i s
al ways fr esh and r el evant .
Practi cal G es tu re - B ased Con trol for S p eci fi c Ap p l i cati on s
Whi l e voi ce com m ands offer conv eni ence, gest ur es enha nce
hands -fr ee ope rat i o n for key appl i c at i ons.
Fast & E ffor tl ess Con trol : Al l ows users t o pl ay, pause, or
st op appl i cat i ons i nst ant l y.
Id eal for Med i a & Prod u cti vi ty Ap p l icati on s : S upport s
si m pl e hand gest ure s for cont rol l i ng vi deos, m usi c, and
docum ent s.
Red u ces O ver rel i an ce on Voi ce : Gi ves users al t ern at i ve

9
ways t o i nt eract wi t h t he syst em .

Ad ap tab l e an d L earn i n g -B ased S ystem


R at her t han fol l owi ng predefi n ed rul es, our assi st ant l earns
from i nt eract i ons t o i m prove over t i m e.
Adapt s t o User B eh avi our:
The m ore i t i nt era c t s, t he bet t er i t unde rst ands user
prefe renc es.
P ersonal i zed Expe r i enc e:
C ust om i zes responses based on past con versat i ons and
usage pat t erns.
Mi ni m i zes R edundant Answers: Lea rns t o avoi d repeat i ng
t he sam e i nform at i o n unnecessa ri l y.
Enhanced C ont rol O ver Appl i cat i on Man agem ent
The A I assi st ant doe sn’t j ust provi de basi c i nform at i on —i t
al so act s as a real -t i m e appl i cat i on cont r ol l er.
Manages Appl i c at i ons Int el l i gent l y: Use rs can i ssue
com m ands t o cont ro l m edi a, open or cl o se appl i cat i ons, and
navi gat e bet w een t a sks.
Red u ces Man u al E ffort :
Aut om at es repet i t i v e t asks, savi ng t i m e and effo rt . Im proves
Mul t i t aski ng: Al l ows users t o cont rol ap pl i cat i ons whi l e
focusi ng on ot her t a sks.
1 . 5.Descri p ti on of Mod u l es
Our project consists of six main modules:
1. Speech Recognition Module
• Functionality: Captures voice input, converts speech to text, and passes the
output to the classification module.

10
• Input: User's spoken words.

• Output: Transcribed text for further processing.

• Dependencies: Feeds into BertTextClassification for categorization.

2. Battery Checking Module


• Functionality:
Monitors the battery status and provides alerts when the battery is low.

• Input:

o Battery percentage retrieved from psutil.sensors_battery().

o Charging status (plugged in or not).

• Output:

o If the battery is ≤ 30% and not charging, a low battery warning is issued.

o If the battery is < 25% and not charging, all applications close.

o Otherwise, the battery percentage is displayed.

• Dependencies:

o psutil for retrieving battery information.

o sys for handling program termination.

o speak() function (external) for text-to-speech alerts.

3. BertTextClassification Module
• Functionality: Classifies the transcribed text into three categories – Internet
Query, Automation, and Conversation.

• Input: User's transcribed text.

• Output: Categorized text that determines the appropriate response module.

11
• Dependencies: Connects with Internet Query, Automation, and Conversational
modules.

Classification Process
• The system first recognizes speech from the user.

• The text is classified using BertTextClassification into one of the three


categories: Internet Query, Automation, or Conversational.

• The system then responds using the appropriate module and delivers the output
via the speak() function.

4. Internet Query Module


• Functionality: Processes classified text related to internet queries and provides
real-time updates without using APIs.

• Input: Text categorized as an Internet Query.

• Output: Returns relevant real-time information.

• Dependencies: BertTextClassification module.

5. Automation Module
• Functionality: Controls system applications based on classified user commands.

• Input: Text categorized as an Automation command.

• Output: Executes the respective system function.

• Dependencies: BertTextClassification module.

• Sub-Modules:

o YouTube Play/Pause: Controls video playback.

12
o Closing Applications: Closes specific system applications.

o Media Control: Adjusts volume and manages playback using MediaPipe


gestures.

6. Conversational Module
• Functionality: Provides human-like conversational responses, emotional
support, and answers general queries.

• Input: Text categorized as a conversation request.

• Output: Contextually relevant response spoken aloud.

• Dependencies: BertTextClassification module.

7. Speak Module (Text-to-Speech)


• Functionality: Converts system-generated text responses into speech output.

• Input: Text response generated by Internet Query, Automation, or


Conversational modules.

• Output: Audible response spoken to the user.

• Dependencies: All response-generating modules.

13
2.LITERATURE SURVEY
2.1 Software Engineering Concepts Used
• Our project follows a modular software engineering approach, ensuring each
module is independent and scalable.

• We applied Incremental Development Methodology, where each component


was developed and tested in stages.

• Agile Principles were followed to allow iterative improvements based on


testing and user feedback.

2.2 Tools and Technologies Applied


• Speech Recognition: Used for capturing user input.

• BERT Text Classification: Used for categorizing user commands.

• Transformers (RoBERTa): Used for sentiment analysis, providing emotional


support if the user's tone is dull.

• Speak Module: Uses Pyttsx3 for text-to-speech conversion, generating audible


responses for user interactions and running continuously with the main
program.

• Psutil: Continuously monitors the battery status in the background, running


with the main program in a separate thread.

• Internet Query Module: Provides responses for real-time queries like weather
updates, cryptocurrency prices, and newss

• conversational model.

a. Uses sentiment analysis to determine the mood of the conversation.


b. If the input is negative, a supporting text is spoken before generating a
response from the large language model (LLM).

14
c. If the input is positive or neutral, a natural response is generated
from the conversational model.

Automation:
• PyAutoGUI: Used for keyboard functions.

• MediaPipe: Controls volume, play/pause, and closes applications like


YouTube, Music, and Word.

• Closing Applications: Runs on a separate thread but only when Microsoft Word
is open, allowing for controlled application closure as needed.
• YouTube Play/Pause: Runs on a separate thread when automation is triggered
to control media playback, including playing/pausing videos, adjusting volume,
and closing YouTube Music and YouTube videos.

2..3 Brief Methodology Overview


1. Speech recognition captures voice input.

o The system listens to the user and converts spoken words into text using
the Speech Recognition module.

2. Psutil continuously monitors battery status in the background.

o It runs in a separate thread and alerts if battery levels are critical

3. Text is classified into one of three categories: Internet Query, Automation, or


Conversation.

o The transcribed text is passed to the BertTextClassification module,


which determines the intent.

o If classified as an Internet Query, the system fetches real-time


information.

15
o If classified as Automation, it triggers commands to control
applications.

o If classified as Conversation, the system prepares a response based on


sentiment analysis.

o Based on classification, the appropriate module processes the


request.

o The Internet Query module retrieves dynamic information.

o The Automation module executes system control commands.

o The Conversation module generates human-like responses

4. For conversational inputs, sentiment analysis is performed.

o If the user's tone is detected as negative using RoBERTa, the system


provides emotional support.

o Otherwise, normal conversation processing is followed..

5. The response is generated and spoken aloud using the Speak Module.

o Pyttsx3 converts the processed text into speech output.

o The system ensures a seamless conversation flow by speaking responses


naturally.

16
3.FEASIBILITY STUDY
Defi n i ti on
A f easi bi l i t y st udy assesses t he pra ct i cal i t y and vi abi l i t y of a
pr oposed proj ect b y eval uat i ng t e chni ca l , economi c, and op erat i onal
as pect s t o det ermi ne i t s success pot ent i al .
3.1 Economic Feasibility
Economic feasibility assesses the cost-effectiveness of the system. Since this is a minor
project, no direct financial investments were required for development. The project
makes use of open-source technologies, eliminating licensing costs. The system runs
on a standard computing system, negating the need for additional hardware. Thus, the
implementation cost is minimal while providing high utility.
• Since this is a minor project, no financial investments were required for
development.

• All tools and technologies used in the project are open-source and free to use.

• The project runs on a standard computing system, eliminating the need for
additional hardware or costly infrastructure.

3.2 Technical Feasibility


Technical feasibility determines whether the available technology, tools, and
infrastructure are sufficient to develop and maintain the system. The project is
implemented using Python and various open-source libraries such as Speech
Recognition, BERT, RoBERTa, Pyttsx3, MediaPipe, and PyAutoGUI, ensuring broad
compatibility.
Additionally, the system requires an active internet connection for execution, as
functionalities like Speech Recognition and Internet Query depend on real-time data
processing. The classification and automation modules, however, function locally. No
specialized hardware is needed apart from a computer with a microphone and speakers,
making the project technically feasible within standard computing environments.

17
• The project is implemented using Python and various open-source libraries,
ensuring compatibility with different operating systems.
• The tools used, such as Speech Recognition, BERT, RoBERTa, , zero shot
classification , facebook/bart-large-mnli and PyAutoGUI, are readily available
and well-supported.

• No specialized hardware is needed beyond a standard computer with a


microphone and speakers.

• The system requires an active internet connection for execution, as Speech


Recognition, Internet Query, and other functionalities rely on real-time data
processing.

• The classification and automation modules operate locally, but some features
(e.g., fetching real-time updates) require internet access.

• (Availability of technology and resources.)

3.3 Operational Feasibility


Operational feasibility examines whether the system is user-friendly, efficient, and
practical for real-world use. The system is designed to be intuitive, requiring minimal
user effort. Speech-based interaction ensures ease of use for individuals of all
backgrounds, enhancing accessibility.
The system runs efficiently without significantly impacting system performance,
making it feasible for daily use. Psutil ensures continuous battery monitoring in the
background, providing real-time status updates without interfering with the main
processes. The automation features allow seamless control of applications, improving
user convenience. Overall, the system's usability, efficiency, and performance make it
highly operationally feasible.
• The system is designed to be user-friendly with minimal setup required.

18
• Speech-based interaction allows accessibility and ease of use for users of all
backgrounds.

• The system runs efficiently without impacting performance, making it suitable


for real-world applications.

• Psutil ensures continuous battery monitoring in the background, running


independently to provide alerts when battery levels are low.

Us er Ad op ti on & E xp eri en ce:


• Nat ural Int er act i o n: The chat bot underst ands co nt ext ,
sent i m ent , and use r i nt ent , m aki ng i t engagi ng and us er -
fri endl y.

• C ont i nuous Learni ng: It ad apt s t o user prefe renc es over t i m e,


i m provi ng accur acy and effi ci ency.

Wi t h its wi de ra nge of appl i cat i o ns, adapt abi l i t y, a nd user -


fri endl i ness, t he ch a t bot i s operat i onal l y feasi bl e.

19
4.SOFTWARE REQUIREMENTS SPECIFICATION
4 . 1. In trod u cti on
IEE E 830 -1993 S t an dard for S oft w are R equi rem ent s S peci fi cat i on
Thi s sect i on defi n es t he fun ct i onal and non -funct i onal r equ i rem ent s
of t he syst em , i ncludi ng t echni cal const rai nt s and expect e d syst em
behavi or. The S R S ensures t hat al l aspect s of t he pro j ect are
s t ruct ured, docum en t ed, and i m pl em ent e d corre ct l y.
4.2 Fu n cti on al Requ i rem en ts
The funct i onal r eq ui rem ent s descri be t he essent i al oper at i ons and
capabi l i t i es of t he syst em . Each m o dul e has a di st i nc t rol e i n
processi ng user i npu t s and gener at i ng ap propri at e r esponses.

1. S p eech Recogn i ti on Mod u l e


C apt ures user spe ec h i nput t hrough a m i crophone.
C onvert s spoken wo rds i nt o t ext usi ng A ut om at i c S peech R e cogni t i on
(AS R ) t echni ques.
Ensures ac curat e sp eech -t o -t ext conve rs i on for furt he r proc essi ng.

2. B attery Ch e ck i n g Mod u l e
The B at t ery C hecki n g Modul e m oni t ors t he syst em 's bat t ery l evel and
provi des voi ce al ert s based on i t s st at us . It ensures user awa reness of
l ow bat t ery l ev el s and prev ent s syst e m crash es by i ni t i a t i ng safe
s hut down procedur e s.
R et ri eves bat t ery st at us usi ng t he psut i l l i brary.
P rovi des real -t i m e al ert s t hrough T ex t -t o -S peech (T TS ) usi ng t he
s peak funct i on.
Warns t he us er whe n t he bat t ery i s l ow.

20
C l oses appl i cat i ons when bat t ery rea ch es a cri t i c al l evel .
Ensures syst em st a bi l i t y by prevent i ng unexpect ed shut do wns.

3. T ext Cl assi fi cati on Mod u l e


Anal yzes t he t r ans cri bed t ext i nput a nd cat egori z es i t based on
predefi n ed cat egori e s.
Ut i l i zes B ER T ( B i di rect i onal Enc oder R epr esent at i ons from
Transform e rs) fo r n at ural l anguag e und erst andi ng.
C l assi fi es i nput i nt o one of t he fol l owi n g t hree c at egori es:
Int e rnet Qu ery: Whe n t he user se eks re al -t i m e i nform at i on.
Aut om at i on: When t he user i nt ends t o c ont rol speci fi c appl i cat i ons.
C onversat i on: When t he syst em engages i n hum an -l i ke i nt era ct i ons.

4. In tern et Q u ery Mod u l e


P rocesses use r que ri es rel at ed t o w e at her, n ews, and fi nanci al
m arket s.
Ensures up -t o -dat e a nd rel evant i nform at i on ret ri eval .

5. Au tom ati on Modu l e


C ont rol s and m anages appl i cat i ons such as YouTube, Musi c,
M i crosoft Word, an d ot her syst em funct i ons.
Execut es keybo ard a ut om at i on t o perfor m vari ous t asks effi ci ent l y.
Enabl es seam l ess i nt egrat i on wi t h co m m onl y used appl i cat i ons for
us er conveni ence.

6. Con versati on al Mod u l e


Engages i n m eani ngful i nt eract i ons wi t h users by respondi ng t o
general qu eri es.
P rovi des em ot i onal support by anal yzi ng user sent i m ent t hrough

21
s ent i m ent anal ysi s t echni ques.
Adapt s responses based on t he em ot i onal st at e det ect ed, ensuri ng a
m ore hum an -l i ke i nt eract i on

7 . S en ti m en t An al ysi s
The assi st ant anal ys es user spe ech t o d e t erm i ne sent i m ent ( posi t i ve,
neut ral , or neg at i ve) and adj ust s respons es acco rdi ngl y.
Funct i onal i t y i t d et e ct s em ot i onal t one f rom t he user ’s voi ce . Adj ust s
conversat i on fl ow b ased on sent i m ent . C an offe r support i ve r esponses
or suggest i ons i n ca se of negat i ve s ent i m ent .

8 . S p eak Mod u l e
C onvert s t ext -based responses i nt o speech usi ng P yt t sx3, a t ext -t o -
s peech conv ersi on l i brary.
Ensures a n at ural an d sm oot h voi ce out put t o enhance us er e xperi enc e.
Faci l i t at es cl ea r and eff ect i ve com m uni c at i on bet we en t he syst em and
t he user.
4 . 3 Non -Fu n cti on al Req u i rem en ts
We have ensured t h at t he syst em m eet s t he fol l owi ng non -funct i onal
requi rem ent s, fo cusi ng on perform ance, usabi l i t y, and rel i ab i l i t y:
P erform anc e:
We have opt i m i zed t he syst em t o process and respond wi t hi n 2 -3
s econds of r ecei vi ng user i nput .
B ackground se rvi ce s l i ke bat t ery m oni t ori ng run e ffi ci ent l y wi t hout
i nt erferi ng wi t h pri m ary funct i ons.
Us abi l i t y:
The syst em i s desi gned t o be i nt ui t i ve, requi ri ng m i ni m al effort from
t he user.
S peech -bas ed i nt er a ct i on i s i m pl em ent e d t o be n at ural and r esponsi ve.

22
R el i abi l i t y:
We have h andl ed error sc enari os sm oot hl y, ensuri ng meani ngful
fal l back r esponses.
Int e rnet -d ependent m odul es provi de cl ea r not i fi cat i ons when
connect i vi t y i s l ost .
S ecuri t y:
No sensi t i ve user da t a i s st ored perm an e nt l y t o m ai nt ai n privacy.
We have m i ni m i z ed syst em perm i ssi ons to prevent securi t y
vul nerabi l i t i es.
S cal abi l i t y:
The a rchi t ect ur e i s desi gned t o support fut ure enhanc em ent s , such as
addi t i onal aut om at i on feat ur es, ensu ri ng l ong -t erm usabi l i t y and
adapt abi l i t y.
4.4 Process Description

1 . Mai n Program
• pyttsx3 → A text-to-speech library that enables the system to convert text into
speech.

• speech_recognition as sr → Captures and processes voice input.

23
• batteryStatus.check_battery → Monitors the battery level and provides status
updates.

• BertTextClassification.classify_text → Uses BERT to categorize user input


into different types (e.g., Internet Query, Automation, or Conversation).

• conversation_module.convo → Manages conversational interactions (e.g.,


answering questions or responding to emotions).

• Internetquery → Handles real-time online searches like weather updates,


cryptocurrency prices, and news retrieval.

• system_applications* → Manages system automation, such as controlling


applications (YouTube, Music, Word).

• warnings.simplefilter("ignore") → Suppresses unnecessary warning messages


to keep the console output clean.

• sentiment_anal.sentiment → Analyzes the sentiment of user input to determine


emotional tone.

Function: Text-to-Speech (speak function)

• Initiates Pyttsx3 for speech synthesis.


• Sets a specific voice (ZIRA) for a consistent user experience.
• Adjusts speech rate to 178 for optimal speed.

Converts text to speech and waits until the speech output is complete.

24
Funct i on: speech_ re cog()
P urpose: C apt ures u ser spee ch, pro cesse s i t , cl assi fi es t he i nput , and
t ri ggers t he approp r i at e m odul e.
a . In i ti ati n g In ternet Q u ery i n a S ep arate T h read
• openi ng_i nt ernet = t hre adi ng.Thread(t arget =i nt ernet ,
daem on= Fal se) → C reat es a new t hr ead for t he i nt ernet () fu nct i on.
• openi ng_i nt ernet .st a rt () → S t art s t he t hr ead, ensu ri ng real -t i m e
query handl i ng wi t hout bl ocki ng t he m ai n process.
b . In i ti al i zin g S p eech Recogn i ti on
• r = sr.R ecogni zer( ) → C reat es an i nst ance of t he speech re c ogni t i on
engi ne.
• wi t h sr.Mi crophon e() as sourc e: → Us es t he syst em m i cro phone t o
capt ure voi c e i nput .
• r.pause_t hreshol d = 1 → Al l ows a 1 -second pause be fore st oppi ng
l i s t eni ng.
• audi o_dat a = r.l i s t en(source, 0, 15 ) → Li st ens for up t o 15 seconds
of user spe ech i nput .

25
c. Proc essi n g S p eech In pu t
• q = r.r ecogni z e_g oogl e(audi o_dat a, l a nguage=" en" ) → C o nvert s t he
capt ured audi o i nt o t ext usi ng Googl e S peech R e cogni t i on.
• pri nt (f" User spok e: {q}" ) → Di spl a ys t he re cogni zed t ext for
debuggi ng.
d . Cl assi fyi n g User In p u t
• x = cl assi fy_t ex t (q) → Ini t i at es B ER T t ext cl assi fi cat i on t o
det erm i ne t he i nput t ype.
e. H an d l i n g Di fferen t Q u ery T yp es
• If cl assi fi ed as " Int e rnet Query"
o C al l s sendkeys(q, speak) t o pro cess gen e ral user que ri es.
o P ri nt s: " cal l i ng i nt ernet query" .
• If cl assi fi ed as " C onversat i on"
C al l s sent i m ent (q, speak) t o an al yze s en t i m ent .
o C al l s convo(q, spea k) t o gener at e a con versat i onal respons e .
o P ri nt s: " cal l i ng conversat i onal m odul e" .
• If cl assi fi ed as " Aut om at i on Query"
o C al l s appl i cat i on(q, speak) t o ex ecut e aut om at i on t asks (e .g.,
openi ng/ cl osi ng appl i cat i ons).
o P ri nt s: " cal l i ng aut om at i on query" .
f. H an d l i n g E rrors
• ex cept Ex cept i on as e: C at ches any sp eech re cogni t i on er r ors (e.g.,
uncl ear audi o, no i nt ernet ).
• pri nt (" S peech re c ogni t i on error: " , st r(e) ) → Di spl ays t he er ror
m essage for d ebuggi ng.
Fu n cti on : cl assi fy_text(q )
P urpose: C at egori ze s user i nput i nt o Int ernet Qu ery, Aut om at i on, or
C onversat i on usi ng a fi ne -t uned B ER T m odel .
2 . T rai n i n g BE RT for T ext Cl assi fi cati on

26
• We t rai ned B ER T on a cust om dat as et cont ai ni ng l abel ed q ueri es for
accur at e cl assi fi cat i on.
• The m odel l ea rns pat t erns i n user i nput and assi gns t h e cor re ct
cat egory.
a . Processi n g User In p u t
• S peech R ecogni t i on capt ures us er i np ut and convert s i t t o t ext .
• cl assi fy_t ext (q) s ends t he t ext t o t he B ER T m odel , whi ch ret urns a
cat egory.
b . Cl assi fi cati on Ou tp u t & Rou ti n g
• " Int e rnet Que ry" → P assed t o i nt erne t
• " C onversat i on" → P assed t o sent im ent anal ysi s & conversat i on
m odul e.
• " Aut om at i on" → Rout ed t o t he syst em aut om at i on m odul e t o execut e
com m ands
B attery ch e ck i n g mod u l e :
The scri pt m oni t ors t he bat t ery st at us of a syst em and t akes n ecessa ry
act i ons based on t he bat t ery l evel . It us es t he psut i l m odul e t o ret ri eve
bat t ery i nfo rm at i on and i m pl em ent s l ogi c t o check bat t ery p ercent age
and charg er st at us.

27
1. Retrieve Battery Status:

o The function check_battery(speak) is called, which prints "Checking


battery status...".

o The psutil.sensors_battery() function is used to get battery details.

a.Check for Battery Availability:

o If no battery information is available (battery_status is None), the


function prints "Battery information not available." and exits.

b.Extract Battery Details:

o The battery percentage is stored in battery = battery_status.percent.

o The charger status (whether plugged in or not) is stored in


charger_plugin = battery_status.power_plugged.

c.Battery Level Conditions and Actions:

o If battery ≤ 30% and charger is NOT plugged in:

28
▪ The assistant speaks: "Power levels are critically low,
[battery]%."

o If battery < 20% and charger is NOT plugged in:

▪ The assistant speaks: "Low power detected. Closing all active


applications..."

▪ The system exits using sys.exit(), stopping further execution.

o Else (battery > 30% or charger is plugged in):

▪ The function prints the current battery percentage: "Battery


capacity: [battery]%"

2. Internet Query Module


In this section, we describe how the Internet Query Module works. This module
processes real-time user queries by automating web interactions and retrieving
responses dynamically. Below is a line-by-line explanation of the code used for this
module.
Importing Required Libraries’

29
• selenium.webdriver → Automates web interactions by controlling a browser.

• webdriver_manager → Automatically installs and manages the required


browser driver.

• By, WebDriverWait, EC → Helps locate elements on the webpage and wait for
them to load.

• Options → Used to configure the browser, such as enabling headless mode.

• time → Adds necessary delays in processing.

• Options() → Creates a configuration object for the web driver.

• options.add_argument("--headless") → Runs the browser in headless mode


(without a visible window) for better efficiency.

• webdriver.Edge(…) → Initializes the Edge browser driver using


WebDriverManager, ensuring compatibility.

internet() function → This function is responsible for opening the target website.
driver.get("URL") → Directs the browser to the given URL.
print("accessing website") → Logs the start of the process.
print("accessed website") → Confirms that the page has loaded successfully.

30
Function: sendkeys(x, speak)
Purpose: Sends the user’s query to the webpage, retrieves the response, and speaks it
aloud.
1. Initial Processing
• Notifies the user that the request is being processed.

• Locates the input box on the webpage where the query will be entered.

2. Formatting the Query


• Adds "in Hyderabad" if the query is related to weather.

• Appends "in few words" to get a concise response.

3. Sending the Query


• Enters the formatted query into the input field.

• Locates and clicks the submit button to process the request.

4. Waiting for the Response


• Introduces a delay to allow the webpage to generate a response.

31
• Constructs an XPath expression to locate the response dynamically.

5. Extracting the Response


• Waits until the response is visible on the page.

• Retrieves the latest response from the webpage.

• If no response is found, informs the user.

6. Error Handling
• If an unexpected issue occurs (e.g., missing elements, loading failure), asks the
user to repeat the query.

• Conversational Module
• The Conversational Module is responsible for generating responses to general
user queries. It also provides emotional support by analyzing the user's
sentiment before responding. Below is a breakdown of how this module
functions.

1. Purpose of the Conversational Module


• Handles user queries that do not require real-time information or automation.

• Uses sentiment analysis to determine the user’s emotional tone.

• Generates human-like responses based on query classification.

2. Execution Flow
1. Speech Recognition captures the user’s query and converts it into text.

2. The text is classified using BERT to determine if it belongs to the


Conversational Module.

3. If classified as a conversation, the system checks the user's sentiment:

o If the sentiment is negative, it provides emotional support before


responding.

32
o If the sentiment is neutral or positive, it directly generates a response.

4. The generated response is then passed to the Speak Module for output.

Handling Negative Sentiment in the Conversational Module


If the user's sentiment is detected as negative, the system takes additional steps to
provide emotional support before generating a response.
1. Identifying Negative Intent
• The system analyses the user’s input and matches it with predefined intents
related to negative emotions such as:

o Self-doubt, Fear of rejection, Overthinking, Demotivation, Losing hope,


Anger, Facing difficult times

2. Providing Guidance and Practical Solutions


• Once an intent is identified, the system provides insights and practical advice
inspired by philosophical and psychological approaches.

• It offers what Krishna said (spiritual or motivational perspective) and how to


implement it practically in daily life.

3. Generating a Conversational Response


• After providing emotional support, the Conversational Module generates a
response relevant to the user’s concern.

• The system ensures the response is encouraging, supportive, and actionable.

33
• transformers (Hugging Face) → Loads pre-trained NLP models for sentiment
classification and response generation.
• AutoTokenizer → Converts text into tokens for processing.
• AutoModelForSequenceClassification → Loads a pre-trained RoBERTa model
for sentiment analysis.
• softmax (scipy.special) → Converts raw model outputs into probabilities.
warnings → Suppresses unwanted warning messages.

34
random, json → May be used for generating responses dynamically.
Function: sentiment(x, speak)
• Suppresses warnings to prevent unnecessary console messages.

• Takes user input (x) and stores it in sentence.

• Prints the sentence for debugging purposes.

• Uses a pre-trained RoBERTa model (cardiffnlp/twitter-roberta-base-


sentiment) specifically trained on social media data for sentiment classification.
• AutoTokenizer → Converts user text into a format suitable for model
processing.
• AutoModelForSequenceClassification → Loads the actual sentiment
classification model

• Tokenization: Converts user input into a format that the RoBERTa model can
process.
• return_tensors="pt" → Returns tokens in PyTorch tensor format for model
compatibility.

35
• Feeds tokenized text into the model to generate sentiment scores.
• Extracts raw logits (numerical values) from the model output.
• Converts logits into a NumPy array (scores) for further processing.

• The sentiment list contains the three possible sentiment categories.


• scores.argmax() returns the index of the highest probability, mapping it to
"Negative", "Neutral", or "Positive".
• predictedscore stores the final sentiment classification.
• If the user's sentiment is detected as negative, the system takes additional steps
to offer emotional support.
• facebook/bart-large-mnli → A zero-shot classification model used to identify
the user's specific emotional concern.
• Loads a JSON file (krishna_arjuna_conversations.json) containing
motivational quotes and practical solutions.
• This list defines common emotional struggles that the system can recognize
and respond to.
• These categories allow the model to pinpoint the user’s emotional state.

36
• get_intents() function
• Uses the zero-shot classification model to match the user’s input with the most
relevant intent.
• Returns the best-matching emotional category from the list of predefined
intents.

• Calls get_intents() to analyze the user's input and determine their emotional
state.
• Stores the identified intent in detected_input.
• Randomly selects a response from the JSON dataset related to the identified
intent.
• temp stores:
• A motivational quote related to the user’s struggle.
• Practical advice on how to implement a solution in daily life.
• The speak() function delivers this guidance to the user.

Handling Positive and Neutral Sentiment in the Conversational Module


The Conversational Module is designed to generate human-like responses to user
queries that do not require real-time internet data or system automation. This
module continuously improves over time by adapting to user interactions and
learning from previous conversations, making responses more relevant to
individual preferences.
Below is a detailed, line-by-line breakdown of how this function works.

37
Explanation :
• random → Used to introduce variability in system responses, ensuring
conversations feel natural and dynamic.
• time → Helps track the duration of response generation and manage
delays for an optimized experience.
• ollama → A library used to communicate with the AI model for
conversational response generation.
• convo() function → This function handles user interactions and generates AI-
based responses.
• x (input parameter) → Stores the user’s spoken query (converted into text).
• speak() function → Converts text into speech, delivering the AI-generated
response audibly.
• complete_response → Stores the final AI-generated response that will be
spoken to the user.

38
• client = ollama.Client() → Initializes the Ollama AI client, which is
responsible for querying the AI model and retrieving responses.
• model = "jarvis1" → Specifies the AI model used for generating
conversational replies.
• prompt = x → The user’s query (x) is passed as an input prompt to the AI
model.
• start_time = time.time() → Records the exact time when the response
generation begins.
• client.generate(...) → Calls the AI model (jarvis1) to generate a response based
on the user’s input.
• stream=True → Enables real-time streaming, meaning the response is
processed in chunks instead of waiting for full completion.
• shown_messages = set() → Keeps track of the system-generated updates during
response generation.
• for part in response: → The function iterates through the AI-generated
response as it streams in chunks.
• elapsed_time = time.time() - start_time → Calculates the time elapsed since
response generation began.

How the System Evolves with User Interaction


Adaptive Responses: The more a user interacts with the system, the more
personalized responses become.
Better Context Understanding: Frequent topics are remembered to create
more natural conversations.
Improved Efficiency: The system fine-tunes its ability to detect user
preferences and respond accordingly.

39
Automation Module
The Automation Module processes user requests for system-related tasks such as
playing music, opening applications, or setting breaks. It uses cosine similarity to match
user input with predefined commands and then executes the corresponding action.
It can open applications like :
• Open applications such as YouTube, YouTube Music, and Microsoft Word.
• Control media playback using hand gestures (play, pause, close applications).
• Set timed breaks with an audible reminder.
• Match user input with predefined automation tasks using cosine similarity for
intent recognition.
This module ensures hands-free automation, making it easier to control the system
using voice commands. Below is a detailed explanation of how it functions.

• application() function → Handles automation-related tasks like playing music,


opening YouTube, or launching Microsoft Word.
• x (input parameter) → Represents the user’s spoken command (converted into
text).
• speak() function → Used to provide verbal feedback to the user.

40
automated_tasks dictionary → Stores predefined automation commands and
their possible variations.
Each task (key) is mapped to multiple ways a user might request it (list of
phrases).

Types of Supported Tasks:


"play a video": [ "open YouTube and play a video", "play a video on YouTube",
"start playing a clip", "launch a video", "show me a video on YouTube", "play
a YouTube video", "can you play a scene from YouTube?" ]
Handles requests to play videos on YouTube.
Recognizes multiple ways users might phrase the command.
Playing Music
"play music": [
"play some music", "start playing songs", "can you turn on some music?" "put
on a song" "start my playlist" "I want to listen to music","play a song for me"]
• Processes requests to play songs.
Opening Microsoft Word:
"open a word document": [ "create a new Word file", "open a blank Word
document", "can you take down notes for me?", "start a new document", "open
Microsoft Word", "I need to write something, open Word", "launch a text
editor" ]
Recognizes commands related to launching a new Word document
Setting a Break
"set a break": [ "set a break for me", "pause my work", "stop working for now",
"remind me to take a break", "wait a moment", "hold on for some time", "tell
me when to resume work" ]
Processes requests to set a break or pause work.

Processing and Executing Automation Commands

41
Encodes each predefined task name into a vector representation using the model
: all-MiniLM-L6-v2
.Stores these encoded values in the automated_tasks dictionary.

Encoding Commands for Comparison


• Each automation task is converted into a numerical representation using a
language model.
• These encoded values are stored for efficient comparison with user input.
Encoding User Input and Finding the Best Match
• The user’s spoken command is also encoded into a vector.
• Cosine similarity is used to compare the user’s input with predefined tasks.
• A list of similarity scores is generated to find the closest matching command.
Handling Low Similarity Scores

42
• If no command closely matches the user’s input (similarity score < 0.4), the
system:
o Clears the similarity scores.
• Asks the user to repeat the command or informs them that the application

does not exist.

• The task with the highest similarity score is selected.


• The system determines the index of the best-matching command and retrieves
its corresponding action.
• Executing the Matched Task
• If the command is related to YouTube, the system opens YouTube and plays
the requested video.

• If the command is for music playback, the system starts playing a song.

43
• If the command is for Microsoft Word, the system opens a new Word document
• If the command is for setting a break, the system activates the break timer and
informs the user.

44
Hand Gesture Control for Media and Applications
The Hand Gesture Control Module allows users to control media playback,
volume, and application closing using hand gestures detected via a webcam.
It utilizes OpenCV, MediaPipe, and PyAutoGUI to interpret hand movements
and trigger the corresponding system actions.

The following image provides a clear depiction of the gestures, helping you
recognize each move effortlessly

Pause the YouTube Video / Music

Play the YouTube Video / Music To Close the Application

45
Importing Required Libraries
• OpenCV (cv2) → Captures real-time video from the webcam.
• MediaPipe (mp) → Detects and tracks hand landmarks.
• PyAutoGUI (py) → Simulates keyboard actions for media control.
• Threading (threading) → Enables gesture recognition to run in a separate
thread.
• Math (math) → Used for distance calculations in volume control.
• Time (time) → Introduces delays to prevent multiple unintended inputs.
Initializing Global Variables
• stop_thread → A flag to stop the gesture detection thread when needed.
• scissors_event → A threading event used to detect and trigger the closing
application gesture.

Function: play_pause()
1. Setting Up the Webcam and MediaPipe Hands Model
• Opens the webcam for capturing live video.
• Initializes the MediaPipe Hands model to detect hand landmarks.
2. Processing Video Frames for Gesture Recognition
• Converts each video frame from BGR to RGB (needed for MediaPipe
processing).
• Detects hand landmarks and overlays the skeleton representation on the hand.
3. Extracting Finger Landmark Positions
• Retrieves the y-coordinates of key finger joints:
o Index, Middle, Ring, and Pinky fingers (PIP & TIP joints).
o Thumb tip for volume control calculations.

46
Gesture-Based Controls
1. Play/Pause Media Control
• If all fingers are folded (except the thumb) → Triggers Play (space key).
• If all fingers are extended → Triggers Pause (space key).
2. Volume Control Gesture
• If only the index finger is folded while others are extended → Activates Volume
Control Mode
• Calculates the distance between the thumb and index finger:
o If distance > 40 → Increases volume (volumeup key).
o If distance < 40 → Decreases volume (volumedown key).
Closing the Application (Scissor Gesture)
• If only the index and middle fingers are folded while others are extended →
o Displays "Scissors (Closing App)" message.
o Triggers Ctrl + W → Closes the active application.
o Triggers Enter → Confirms the action if needed.

47
o Signals the scissors_event → Ends the gesture detection loop.

.Breaking the Loop and Closing Resources


• The loop stops when:
o The "Scissors" gesture is detected.
o User presses the 'q' key.
• Releases the webcam and closes all OpenCV windows.

Hand Gesture Control for Closing Microsoft Word


This module detects a specific hand gesture (Scissors gesture) to save and close a
Microsoft Word document using keyboard shortcuts (Alt + F4, Ctrl + F2). The
system tracks hand landmarks in real-time and interprets finger positions to
recognize the gesture accurately.

48
Recognizing the "Scissors" Gesture for Closing and Saving
• The system analyzes finger positioning to determine if the user is making the
"Scissors" gesture:
o Index and Middle Fingers: Folded (Tip below PIP joint).
o Ring and Pinky Fingers: Extended (Tip above PIP joint).

• When the "Scissors" gesture is detected:


o Displays "Scissors Gesture – Closing Current Application" on the
screen.
o Triggers the shortcut Alt + F4 to close Microsoft Word.
o Waits for a confirmation prompt (Enter to confirm).

Saving the File Before Closing


• If a save prompt appears, the system:
1. Generates a timestamped filename (e.g., Notes_12-07-2024_14-30-
45.docx).
2. Writes the filename into the save dialog using py.write().
3. Presses Enter to confirm the save action.
4. Waits 8 seconds to ensure the document is saved.
5. Presses Ctrl + F2 to finalize the save operation.

49
Ending the Gesture Detection Loop
• Once the closing and saving gestures are executed, the system stops detection:
o Flags application_closed = True to indicate that the process has
completed.
o The webcam feed is released, and OpenCV windows are closed.

50
5.STRUCTURED SYSTEMS ANALYSIS AND DESIGN
(SSAD)
1. In trod u cti on
S t ruct ured S yst em s Anal ysi s and Des i gn (S S AD) i s a s yst em at i c
approach t o anal yz i ng and desi gni ng an i nfo rm at i on s yst em by
breaki ng i t down i nt o m anageabl e com ponent s. Thi s m et hod focuses
on dat a fl ow, proce sses, and dat a st ora ge t o ensure a st ruc t ured and
effi ci ent desi gn. S S AD em pl oys graphi cal m od el s such as
Decom posi t i on Di agram s, Dat a Fl ow Di agram s ( D FD), Ent i t y -
R el at i onshi p (E -R ) Di agram s, and S t ru ct ure C hart s t o i l l ust rat e t he
s ys t em ’s archi t ect u r e.
The purpose of usi ng S S AD i n t hi s project i s t o provi de a cl ear vi sual
repres ent at i on of t he dat a fl ow, pro cess st ruct ure, and l ogi cal
rel at i onshi ps wi t hi n t he ch at bot syst em . The chat bot r el i es on A I -
dri ven Nat ur al Lang uage P roc essi ng (N LP ), S ent i m ent Ana l ysi s, and
R esponse Gene rat i on, m aki ng S S AD essent i al for und erst andi ng
s ys t em i nt eract i ons .
5 .1 Decom p osi ti on Di agram
A De com posi t i on Di agram (Top -down approach ) br eaks t h e syst em
i nt o sm al l er sub -co m ponent s t o underst and t he hi e rar chy o f processes .
The chat bot syst em can be brok en down i nt o:
Us er Input H andl i ng – C apt ures user t ex t and prepro cesses i nput .
S ent i m ent Anal ysi s – Det erm i nes i f t he user’s m essage i s P osi t i ve,
Neut ral , or N egat i ve .
Int ent C l assi fi c at i on – Id ent i fi es t he pur pose of t he m essag e .
R esponse Gene rat i o n – P roduces an app ropri at e ch at bot res ponse.
Us er Int er act i on S t orage – Mai nt ai ns conversat i on hi st ory for
l earni ng.

51
5 .2 Data Fl ow Di ag ram (DFD)
A DFD r epres ent s how dat a m oves t hrough t he syst em . It i s
cat egori z ed i nt o:
Level 0 - C ont ext D i agram
The User i nt e ract s wi t h t he syst em by provi di ng t ext i nput.
Ext ernal syst em s (such as dat abases o r sent i m ent m odel s) m ay be
i nt egrat ed.
Level 1 - D et ai l ed D at a Fl ow
Us er ent e rs i nput → P assed t o NLP proc essi ng.
Tokeni zat i on and S ent i m ent Anal ysi s → Det erm i nes t he em ot i on
behi nd t he m essage.
Int ent R ecogni t i on Modul e → C l assi fi es t he purpose (e.g., quest i on,
greet i ng, com pl ai nt ) .
R esponse Gene rat i o n → A I gene rat es an appropri at e repl y.
C hat bot sends respo nse t o user.
Thi s DFD ensur es l ogi cal fl ow and se am l ess i nt era ct i ons bet we en
com ponent s.

52
5 .3 E n ti ty -Rel ati on sh i p (E -R) Di agram
An E -R Di agram de fi nes t he rel at i onsh i p bet ween di ff eren t ent i t i es
wi t hi n t he syst em .
Ent i t i es and At t ri but es
Us er (Us er_ ID, Nam e, Message )
C hat bot (A I_Model , R esponse_ ID, Ti m e st am p)
S ent i m ent Anal ysi s (Message_ ID, S ent i m ent _Type)
Int ent R e cogni t i on (Int ent _ ID, Int ent _C at egory)
R el at i onshi ps
A User sends m ul t i pl e m essages t o t he c hat bot .
Each m essa ge u ndergoes S ent i m e nt Anal ysi s and Int ent
C l assi fi cat i on.
The C hat bot gener at es a respons e based on sent i m ent and i nt ent .
Thi s st ruct ure ensures effi ci ent dat a m anagem ent an d syst em
s cal abi l i t y.

53
5 . 4 S tru ctu re Ch ar t
A S t ruct ure C hart i s a hi erar chi cal re present at i on of t he m odul ar
com ponent s of t he s yst em .
M ai n S yst em
├ ─ ─ User Input P rocessi ng
│ ├ ─ ─ Tokeni zat i on
│ ├ ─ ─ Text P reprocessi ng
├ ─ ─ S ent i m ent Anal ysi s
│ ├ ─ ─ NLP -based sent i m ent det ect i on
├ ─ ─ Int ent R ecogni t i on
│ ├ ─ ─ AI m odel cl assi fi cat i on
├ ─ ─ R esponse Gen erat i on
│ ├ ─ ─ AI Model (B l enderbot -400M)
│ ├ ─ ─ Zero -shot cl assi fi cat i on
Thi s m odul ar desi gn enhanc es m ai nt ai na bi l i t y and debuggi ng.

54
5 . 5 S S AD Meth od ol ogy Used i n th e Project
1 . Req u i rem en t An al ysi s
Id ent i fy chat bot funct i onal i t i es (sent i m ent anal ysi s, i nt ent
cl assi fi cat i on, respo nse gener at i on).
Gat her syst em r equi rem ent s (soft w are, h ardwa re, A I m odel s) .
2 . S ystem Desi gn Usi n g S S AD
Decom pose t he syst em i nt o st ruct ured c om ponent s.
Desi gn Dat a Fl ow and Ent i t y R el at i onshi ps t o ensure sm oot h
i nt eract i on.
3 . S ystem Imp l em en tati on
Devel op m odul ar pr ogram s for t ext pro cessi ng, A I -bas ed s ent i m ent
anal ysi s, and respon se gener at i on.
4 . T esti n g & Mai n ten an ce
Uni t t est i ng for ea ch m odul e (e.g., N LP , AI p rocessi ng, respons e
generat i on).
Int eg rat i on t est i ng t o ensure prop er syst em -wi de fun ct i onal i t y.
5 . Ad van tages of U si n g S S AD
S t ruct ured Appro ac h: Ensures a syst e m at i c breakdown o f syst em
funct i onal i t i es.
B et t er Dat a Fl ow Underst andi ng: G raphi cal m odel s ( DFD, E -R
Di agram ) si m pl i fy syst em i nt eract i ons.
Im prov ed Mai nt ai n abi l i t y: Modul ar st ruct ure fa ci l i t at es debuggi ng
and fut ure upg rades.
S cal abi l i t y: S upport s i nt egrat i on of addi t i onal AI m odel s or
funct i onal i t i es i n t he fut ure.

55
OBJECT-ORIENTED ANALYSIS AND DESIGN (OOAD)
1. In trod u cti on
Obj ect -Ori ent ed A nal ysi s and Desi g n (OOAD ) is a soft ware
engi neeri ng appro ac h t hat uses obj ect s as t he fundam ent al bui l di ng
bl ocks of a syst em . It focus es on i dent i fyi ng real -worl d ent i t i es, t hei r
at t ri but es, behavi or s , and rel at i onshi ps t o cre at e a m od ul ar and
s cal abl e syst em . The OOAD m et hodol ogy i s used t o m odel t he chat bot
s ys t em , al l owi ng for reusabi l i t y, m ai nt ai nabi l i t y, and an effi ci ent
s oft ware st ru ct ure.
The ch at bot syst em , power ed by Nat ur al Languag e P roc essi ng (NLP ) ,
S ent i m ent Anal ysi s, and AI -d ri ven re sponse generat i on, benefi t s
s i gni fi cant l y from O OAD pri nci pl es. Thi s approach ensures t hat each
funct i onal com pone nt , such as User Input P rocessi ng, Sent i m ent
Anal ysi s, Int ent C l assi fi cat i on, and R esponse Gener at i on, is
encapsul at ed i nt o di st i nct obj ect s t hat i nt eract se am l essl y.
2. Ph ases of O O AD
OOAD consi st s of t wo key phases:
1 . Obj ect -Ori ent ed Anal ysi s (OOA ) – Ident i fi es s yst em
requi rem ent s usi ng obj ect s and t hei r i nt eract i ons.

2 . Obj ect -Ori ent ed D esi gn (OOD) – S t ruct u res t hese obj ect s i nt o a
wel l -defi ned syst em archi t ect u re.

3. O O AD Com p on en ts Used i n th e Ch atb ot S ystem


3.1 Use C ase Model
A Use C as e Model defi nes ho w di ff er ent act ors i nt era ct wi t h t he
s ys t em .

56
Act ors:
• User: S ends m essag es t o t he chat bot .

• C hat bot S yst em : P rocesses user i nput and generat es a r esponse.

• Dat abase: S t ores co nversat i on hi st ory a nd user pre fer ences.

Us e C ases:
• User i nput s t ext

• S yst em processes se nt i m ent anal ysi s

• C hat bot cl assi fi es user i nt ent

• C hat bot generat es a nd ret urns an appro pri at e respons e

• C onversat i on hi st ory i s updat ed

Thi s m odel ensures a cl ea r underst andi n g of syst em funct i onal i t y


from a user ’s persp e ct i ve.

57
3.3 Act i vi t y Di agra m
An Act i vi t y Di agra m represent s t he wo rkfl ow of t he chat bo t syst em .
1 . User ent e rs i nput

2 . Text prepro cessi ng and t okeni zat i on oc cur

3 . S ent i m ent anal ysi s i dent i fi es t he t one

4 . Int ent recogni t i on d et erm i nes t he purpo se

5 . R esponse gener at i on creat es a r epl y

6 . R esponse i s di spl ayed t o t he user

Thi s st ruct ured fl o w vi sual i zes t he st ep -by -st ep ex ecut i o n of t he


chat bot syst em .

58
3.5 S equence and C ol l aborat i on Di agra m s
S equence Di agram
A S equence Di agr am shows t he i nt eract i on order bet ween com ponent s .
Us er → C hat bot → S ent i m ent Anal ysi s → Int ent C l assi fi cat i on →
R esponse Gene rat i o n → User
Thi s represent at i on hel ps underst and m essage fl ow ac ros s syst em
m odul es.

C ol l aborat i on Di agr am
A C ol l aborat i on D i agram depi ct s ho w obj ect s work t o get her t o
execut e a funct i on.
• The User obj ect i nt e ract s wi t h t he C hat b ot obj ect .

• The C hat bot obj e ct cal l s S ent i m ent Anal ysi s and Int ent
R ecogni t i on obj ect s.

• The R esponse Gen er at i on obj ect form ul a t es a repl y.

Thi s di agram hel ps vi sual i ze obj ect dep endenci es and i nt era ct i ons.

59
3.6 C l ass Di agram
A C l ass Di agram de fi nes obj ect s, at t ri b ut es, and t hei r rel at i onshi ps.
Key C l asses i n t he C hat bot S yst em :
• User C l ass: S t ores user ID, m ess age hi st ory.

• C hat bot C l ass: Manages conve rsat i on fl ow.

• S ent i m ent Anal ysi s C l ass: Anal yzes use r em ot i ons.

• Int ent R e cogni t i on C l ass: Ident i fi es use r i nt ent .

• R esponse Gene rat i o n C l ass: P roduces a ppropri at e r esponse s.

60
Thi s st ruct ure enabl es a m odul ar and re usabl e desi gn.

3.7 S t at e C hart Di ag ram (i f ne eded)


A S t at e C hart Di agram repres ent s how t he chat bot changes st at es
bas ed on i nt era ct i ons.
P os si bl e S t at es:
• Idl e → Wai t i ng for user i nput

• P rocessi ng → P erfor m i ng sent i m ent and i nt ent anal ysi s

• Gener at i ng R esponse → P roduci ng out put

• R espondi ng → Di spl ayi ng m essage t o u ser

61
Thi s ensures t hat al l chat bot funct i onal i t i es t ransi t i on sm oothl y.

3.8 C om ponent Di agram


A C om ponent Di agram shows syst em com ponent s a nd t hei r
rel at i onshi ps.
M aj or C om ponent s:
1 . Front end ( User Int er face - C hat wi ndow)

2 . B ackend (P ro cessi n g & AI m od el )

3 . Dat abase (S t ores us er i nt era ct i on dat a)

62
B y m odul ari zi ng c om ponent s, t he syst em rem ai ns scal abl e and
adapt abl e.

3.9 Depl oym ent Di a gram


A Depl oym ent Di ag r am repr esent s ha rdw are and soft w are d ep l oym ent .
Depl oym ent Model
1 . User Devi ce (Mobi l e/ Deskt op) – S ends m essage

2 . Web S erve r – P roc e sses A I-bas ed respo nse

3 . Dat abase S e rver – S t ores i nt eract i on hi s t ory

Thi s st ruct ure ensur es sm oot h i nt egrat i on bet ween com pone nt s.

63
4. Advant ages of O OAD
• Encapsul at i on: Dat a and fun ct i ons ar e bundl ed i nt o obj e ct s ,
i m provi ng securi t y.

• R eusabi l i t y: C om ponent s can be reused across m ul t i pl e chat bot


versi ons.

• S cal abi l i t y: The syst em can easi l y i nt egrat e new A I m od el s and


funct i onal i t i es.

• Mai nt ai nabi l i t y: M odul ar obj ect s m ak e debuggi ng and upda t i ng


easi er.

64
FORM DESIGNING
1. Introduction
Form desi gn pl ays a cru ci al rol e i n ensuri ng a se am l ess user
experi en ce by st ruc t uri ng i nt eract i ons effi ci ent l y. In our A I ch at bot
s ys t em , t he form i s desi gned t o be i nt ui t i ve and voice -dri v en,
el i m i nat i ng t he need for m anual i nput . U nl i ke t radi t i onal ch a t bot s t hat
rel y on but t ons or t ext fi el ds, our s yst em pri ori t i ze s speech
recogni t i on, al l owi ng users t o i nt eract n at ural l y.
Our A I Assi st ant syst em rel i es on sp eech recogn i ti on (sp eec h _recog)
for user i nt er act i on, el i m i nat i ng t he need for m anual i nput but t ons.
The desi gn ensur es a seam l ess and n at ural conve rsat i on exp eri enc e.

Us er In p u t H an d l ing
• Voi ce In p u t O n l y : Users i nt era ct t hrou gh speech r ecogni t i on.
• Au tom ati c Detecti on : The syst em l ist ens for user speech
wi t hout requi ri ng a m anual t ri gger.
• Cl ari fi cati on Req u ests : If i nput i s uncl ear or un re cogni zed, t he
chat bot asks for cl ar i fi cat i on.

65
Gesture-Based Controls :
o R ecogni zes hand g e st ures usi ng a webc am .
o Execut es pred efi ned act i ons based on det ect ed gest ur es
(e.g., pl ay, paus e, cl ose an appl i cat i on).
o P rovi des vi sual feed back t o t he user.

Con versati on al Mod u l e:


• Enabl es re al -t i m e i nt eract i on usi ng voi ce -b ased
com m un i cati on on ly (no t ext i nput ).
• Underst ands user com m ands and responds in a nat ural
conversat i onal m ann er.
• Fet ch es r eal -ti m e d ata (e.g., weat h er , news, rem i nders ) and
del i vers respons es verbal l y.
• Ensures sm oot h co m m uni cat i on by handl i ng cl ari fi cati on s
when voi ce i nput i s uncl ear.

66
Au tom ati on T ask for O u tp u t:
o Execut es a ct i ons l i ke:
▪ Pl ayi n g Mu si c: If t he use r s ays, "P l ay som e m usi c ," t he
assi st ant wi l l open t he def aul t m edi a pl a yer or pl ay a song fr om
an onl i ne source.
▪ O p en i n g Ap p li cati on s: C om m ands l ike " Open Word" or
" Launch YouTube" wi l l t ri gger t he resp ect i ve appl i cat i ons.
▪ Con trol l i n g Med i a : Users can say " P a use vi deo" , " In cre a s e
vol um e" , or use gest ures t o perfo rm t hes e act i ons.

67
DATA DICTIONARY
1. In trod u cti on
A Dat a Di ct i onary i s a st ruct ured col l ect i on of dat a el em ent s used
wi t hi n a syst em .
For t he A I ch at bot syst em , t he dat a di ct i onary i ncl udes fi el d s for user
i nput , chat bot responses, sent i m ent anal ysi s, conversat i on hi st ory,
and syst em l ogs. Thi s docum ent at i on i s essent i al for devel opers ,
dat abase adm i ni st rat ors, and an al yst s t o underst and dat a rel a t i onshi ps
and depend enci es.
2. Im p ortan ce of a Data Di cti on ary
• Ensures C onsi st enc y: Defi nes st andard form at s fo r st ori ng and
ret ri evi ng dat a.

• Im prov es D at a In t egri t y: Hel ps en f orce const rai nt s a nd


val i dat i on rul es.

• Enhances S yst em Do cum ent at i on : S erves as a gui de for dat ab as e


m anagem ent .

3. Data Di cti on ary T ab l e S tru ctu re


A dat a di ct i onary t ypi cal l y i ncl udes t he fol l owi ng at t ri but es for each
dat a fi el d:
• Fi el d Nam e: Th e na m e of t he dat a el em ent .

• Dat a Type: S peci fi e s whet her t he fi el d i s a st ri ng, i nt eger, dat e,


et c.

• S i ze: Defi nes t he m axi m um charact er l engt h or num ber va l ue


range.

• Descri pt i on: A bri ef expl anat i on of t he dat a fi el d’s purpos e .

• C onst rai nt s: S peci fi es uni que, nul l , or r equi red val ues.

68
• Defaul t V al ue: Th e i ni t i al val ue ass i gned i f no i nput i s
provi ded.

• Exam pl e Val ues: S a m pl e dat a t o i l l ust rat e fi el d usage

4 . Data Di cti on ary fo r AI Ch atb ot S yste m


Variable Name Data Descriptio Possible Usage Constraint
Type n Values s
Stores user
User_Input String speech Any spoken Speech None
input after phrase recognition
recognition.
Classifies "Internet Must match
Classification String input into Query", Decides predefined
automation, "Automation", module labels
internet, "Conversation execution
etc. "
Holds the Must be
Battery_Level Integer current 0 to 100 Battery within 0-
battery check 100 range
percentage.
Indicates if
Power_Plugged Boolea the charger True/False Battery None
n is check
connected.
The Speech
Response_Text String assistant’s Any text output None
generated
response.
The Must be a
Automation_Tas String command "Open Applicatio valid
k for YouTube", n control command
executing "Word"
automation.
Indicates if
Thread_Status Boolea a True/False Automatio None
n background n tasks
thread is
running.

69
4.1 User Input D at a
Field Name Data Descripti Example Usage Constraint
Type on Values s / Rules
Stores the Capturing
Input_Text raw user "What’s the user None
String speech weather commands
input. today?"
Classifies Determines Must
Input_Type the type "Question", response match
String of user "Command behavior predefined
input. ", categories
"Greeting"
Represent Determines
Confidence_Scor s accuracy of Must be
e confidenc recognition between 0
Float e level of 0.0 - 1.0 -1
speech
recognitio
n.
Identifies Language Must
Language the "en", "es", processing match
String language "fr" supported
of the languages
input.
The Used for
Processed_Text cleaned- "Play a processing None
String up version song"
of user
input.
Detected Personalize Must
Emotion_Detecte emotion "Neutral", d response match
d String from user "Angry", generation predefined
input. "Happy" emotions

70
4.2 C hat bot R esponse Dat a
Field Name Data Descripti Example Usage Constrain
Type on Values ts / Rules
The AI-
Respons_text String generated "Sure, Displays
response opening output to None
to user YouTube..." user
input.
Categoriz "Answer", Must
Response_Type String es the "Command Helps in match
response. Execution", response predefined
"Clarificatio handling categories
n"
Emotion "Neutral", Must
Emotion_Tone String conveyed "Supportive Personalizes match
in ", AI tone predefined
response. "Encouragin tones
g"
Indicates
Execution_Statu Boolea if a Tracks
s n command success/fail None
was True, False ure of
successful actions
ly
executed.
Indicates Enables
Followup_Requi Boolea if AI multi-turn
red n needs to True, False conversatio None
ask a ns
follow-up.

71
4.3 S ent i m ent Anal ysi s Dat a
Field Name Data Description Example Usage Constraints
Type Values / Rules

The text or "I'm feeling Sentiment


String speech input a bit down analysis None
User_Input from the today." input
user.

Cleaned and "feeling Improves Stop words,


preprocessed down today" model punctuation
Processed_Text String version of accuracy removed
input.

The AI- "Positive", Helps Must match


Sentiment_Label String determined "Neutral", classify predefined
sentiment "Negative" emotions categories
category.

Numerical -0.75, 0.0, Helps in Range: -1


score 0.85 intensity (negative)
Sentiment_Score Float representing measurement to +1
sentiment. (positive)

Specific "Sadness", Provides Must match


emotion "Happiness", deeper predefined
Emotion_Detected String identified in "Anger", emotional emotion
text. "Confusion" insight categories

72
4.4 Int e rnet C l assi fi cat i on Dat a
Data Descri p ti on E xamp l e
Fi el d Name Typ e Val u es Usage Ru l es

S t ri ng The user's " Who i s t he Di rect None


Us er_I n p u t search C EO of i nput for
request . OpenAI ?" processi ng

S t ri ng P reprocess ed " C EO Enhances S t op words ,


Proces s ed _Q u ery query for OpenAI ret ri eval punct uat i on
bet t er 2025" effi ci ency rem oved
resul t s.

S t ri ng The pl at form " Wi ki pedi a" , Hel ps in Must be a


or websi t e " Offi ci al source predefi n ed
S ou rce used for Websi t e" , sel ect i on opt i on
ret ri eval . " News
Art i cl e"

73
4.6 Error Logs and S yst em Moni t ori ng
Field Name Data Description Example
Type

Error_Type String Type of error encountered "Speech Recognition


Error"

Error_Message String Detailed error description "Could not understand


audio"

Module_Affected String The system module where the "Speech Recognition"


error occurred

Error_Code Integer Unique identifier for error types 101

Resolution_Status String Whether the error was resolved "Pending"

5. Ad d i ti on al Con sid erati on s


To ensur e t he rel i abi l i t y and e ffi ci en cy of our A I assi st ant , t he
fol l owi ng val i dat i on rul es have b een i m pl em ent ed:
• S p eech Recogn i ti on Fi l teri n g: B ackg ro und noi se i s fi l t ered t o
i m prove accu ra cy be fore pro cessi ng com m ands.
• User In p u t Con strai n ts: Text i nput s shoul d not exceed 300
ch aracte rs t o opt i m i ze proc essi ng t i m e.
• B attery S tatu s V al i d ati on : The syst em checks bat t e ry

74
percent age wi t hi n t he 0-100% ran ge t o t ri gger appropri at e
power -s avi ng act i on s.
• Au tom ati on Comman d Veri fi cati on : E xecut ed com m ands m us t
m atch p red efi n ed au tom ati on task s t o prevent uni nt en ded
act i ons.
• S en ti m en t An al ysi s Accu racy : Em ot i on det ect i on m us t
corre ct l y cl assi fy t ext i nt o predefi n ed cat egori es: Posi ti ve,
Neu tral , or N egati v e.

75
TEST CASES
1. In trod u cti on
Tes t cases ar e st ru ct ured sc enari os us ed t o veri fy t h e co rrect n ess ,
funct i onal i t y, and perform an ce of a syst em . They hel p i dent i fy
pot ent i al bugs, ensure soft wa re m e et s requi rem ent s, and i m prove
rel i abi l i t y. Each t e st case consi st s of t est i nput s, execut i on st eps ,
expect ed out put s, an d act ual resul t s.
For t he A I chat bot syst em , t est cases w i l l cover fun ct i onal i t i es such
as user i nput h andl i ng, chat bot r esponses, sent i m ent anal ys i s, i nt ent
cl assi fi cat i on, er ror handl i ng, and syst em perform an ce.
2. Im p ortan ce of T est Cases
• Ensures Fun ct i onal i t y: C onfi rm s t hat al l feat ur es work as
i nt ended.

• Im prov es Qual i t y: Det ect s er rors and i nconsi st enci es before


depl oym ent .

• Enhances S ecuri t y: Ident i fi es vul ner abi l i t i es and prevent s


syst em breach es.

• Faci l i t at es R eg ressi on Test i ng: Hel ps m ai nt ai n st abi l i t y when


updat i ng t he syst em .

3. T est Case S tru ct u re


Each t est c ase i ncl u des:
1. Test C ase ID: A uni que i dent i fi er fo r t he t est case.

2. Test S cenari o: A bri ef desc ri pt i on of wh at i s bei ng t est ed.

3. P recondi t i ons: R equi rem ent s t hat m ust be m et befo re


execut i on.

4. Test S t eps: Act i ons t o be perform ed.

76
5. Expect ed R esul t : Th e ant i ci pat ed out put .

6. Act ual R esul t : The observed out com e af t er exe cut i on.

7. S t at us: P ass/ Fai l based on com pari son bet ween exp ect ed a nd
act ual resul t s.

4. T est Cases for A I Ch atb ot S ystem


Test
Case ID Scenario Steps Expected Result Status

UIH-01 Valid Input Enter a proper query Generates a relevant Pass


response

UIH-02 Invalid Enter an


Input empty/random Prompts user for valid Pass
string input

UIH-03 Unknown Enter an Asks for clarification or Pass


Phrase unrecognized query gives a default response

77
4.2 C hat bot R esponse Test C ases

Test Scenario Steps Expected Result Status


Case ID

General Query Ask a common Provides a relevant and


CBR-01 Response question accurate response Pass
Follow-up Maintains context and
CBR-02 Question Ask a related gives an appropriate Pass
follow-up answer

Greeting Say "Hello" or Responds with a friendly


CBR-03 Response "Hi" greeting Pass
Sentiment Provides an empathetic or
CBR-04 Adaptation Express an supportive response Pass
emotion (e.g., sad)

Unknown Query Asks for clarification or


Handling Enter an provides a fallback Pass
CBR-05 unfamiliar phrase response

Long Input Enter a very long Processes it correctly or


CBR-06 Handling message asks for a concise input Pass
Internet-Based Ask for online Retrieves and presents
CBR-07 Query information accurate data Pass
Delayed Maintains response
CBR-08 Response Ask multiple accuracy without lagging Pass
Handling questions quickly

78
4.3 S ent i m ent Anal ysi s Test C ases
Tes t
Test Test E xp ected Actu al
Cas e Precon d i ti on s S tatu s
S cen ari o S tep s Resu l t Resu l t
ID

1. Ent er
S yst em
Det ect "I am S ent i m ent
T C- S yst em is det ect s
posi t i ve happy" det ect ed P as s
007 runni ng posi t i ve
sent i m ent 2. corre ct l y
sent i m ent
S ubm i t

1. Ent er S yst em
Det ect S ent i m ent
T C- S yst em is "I am det ect s
negat i ve det ect ed P as s
008 runni ng sad" 2. negat i ve
sent i m ent corre ct l y
S ubm i t sent i m ent

1. Ent er
S yst em
Det ect " It i s a S ent i m ent
T C- S yst em is det ect s
neut ral norm al det ect ed P as s
009 runni ng neut ral
sent i m ent day" 2. corre ct l y
sent i m ent
S ubm i t

79
4.4 Int ent C l assi fi c a t i on Test C ases

Test Scenario Steps Expected Result Status


Case
ID
IC-01 General Inquiry Ask a basic question Correctly identifies
intent (e.g., general info) Pass

IC-02 Command Give a command (e.g., Executes the correct


Recognition "Open calculator") action Pass

IC-03 Sentiment- Express emotion (e.g., Identifies sentiment and


Based Intent "I'm feeling down") responds appropriately Pass

IC-04 Small Talk Engage in casual Detects small talk and Pass
Handling conversation responds accordingly
IC-05 Task-Based Request a specific task Classifies correctly and
Intent (e.g., "Set a processes the request Pass
reminder")
IC-06 Ambiguous Provide an unclear Requests clarification or
Input phrase selects the closest intent Pass

IC-07 Multi-Intent Enter multiple Recognizes multiple


Handling requests in one intents and processes Pass
message correctly
IC-08 Internet Query Ask an online-related Identifies and fetches
Detection question data from the internet Pass

80
4.5 Error H andl i ng Test C ases
Test Scenario Steps Expected Result Status
Case
ID

Enter gibberish or Responds: "Can you say Pass


EH-01 Invalid Input unknown text that again?"

Repeats speech_recog() Pass


EH-02 Empty Input No input detected function

Provide an Responds with an Pass


EH-03 Unexpected unsupported appropriate fallback
Command command message

Simulate internal Displays a friendly error Pass


EH-04 System Error processing failure message without crashing

81
CONCLUSION
The development of our AI assistant marks a significant step toward intelligent and
interactive virtual assistance. This project integrates various technologies, including
speech recognition, automation, and natural language understanding, to create a system
that efficiently processes user commands and executes appropriate actions. By
leveraging Python, SpeechRecognition, psutil, and threading, we have built a
framework that can listen, interpret, and respond in real time, enhancing user
experience and productivity.
One of the core aspects of this AI assistant is its ability to classify user inputs and
respond accordingly. The system can identify whether a query pertains to internet
searches, general conversation, or task automation, and based on this classification, it
directs the request to the appropriate module. This structured approach ensures that
commands are processed accurately and efficiently, minimizing unnecessary
computations and optimizing response time.
Additionally, task automation plays a crucial role in this project. The assistant can open
and control applications, execute system commands, and manage various workflows,
reducing the need for manual intervention. This feature is particularly beneficial for
users who seek a hands-free, voice-controlled experience for multitasking.
Furthermore, by integrating threading, the assistant can run automation tasks in the
background without interrupting the core speech recognition functionality.
A key enhancement in our AI assistant is its ability to provide real-time battery
monitoring. Using the psutil library, the system continuously checks the device’s
battery level and alerts the user when power is critically low. If the battery percentage
drops below a threshold without a charger plugged in, the assistant notifies the user and
takes necessary action, such as closing active applications. This proactive approach
helps in preventing system shutdowns due to low power, ensuring a smooth user
experience.
Beyond functionality, the AI assistant is designed to offer emotional support and

82
engaging interactions. Inspired by virtual assistants like J.A.R.V.I.S. and F.R.I.D.A.Y.,
it delivers responses with a professional yet friendly tone, making interactions feel
natural. The assistant can recognize when the user needs a motivational boost,
encouragement, or even a break reminder. Notifications for resuming work, such as
"Alright, boss, time to get back to work. Let’s get things moving," ensure that the
assistant acts as both a productivity enhancer and a supportive companion.
While the current system offers a robust framework for intelligent assistance, there are
several areas for future improvements. Enhancing contextual understanding, improving
multi-turn conversation handling, and integrating external APIs for expanded
capabilities are potential upgrades that can make the assistant even more efficient.
Further refinement in machine learning-based intent recognition and better
personalization based on user preferences can significantly improve the assistant’s
adaptability.
In conclusion, this AI assistant serves as a strong foundation for intelligent virtual
assistance. Its real-time speech recognition, automation capabilities, battery
management, and engaging personality make it a valuable tool for users seeking
efficiency and convenience. As technology advances, expanding its capabilities will
further solidify its role as a reliable and intelligent digital assistant.
Key Takeaways & Future Enhancements
• Real-time Speech Recognition: The assistant listens and responds instantly,
ensuring smooth interaction.
• Multi-Tasking Capabilities: Background automation runs independently while
allowing new commands.
• Battery Monitoring & Power Management: Ensures smooth operation by
preventing shutdowns.
• Emotional & Motivational Support: Engages users with encouraging and
personalized responses.
• Advanced Task Execution: From opening applications to internet searches, it
streamlines workflows.

83
Upcoming Upgrades:
• Expansion of automation support for third-party applications and IoT devices.
• Enhanced security and personalization to tailor the experience to individual
users.
• We are planning to add more system applications for automation, enabling
control over system settings, file operations, and u

84
BIBLIOGRAPHY

Book
• Reema Thareja, Python Programming, Oxford University Press.
Tools & Technologies
• Python: Programming Language. [https://www.python.org/]
• OpenCV: Open Source Computer Vision Library. [https://opencv.org/]
• SpeechRecognition: Python library for speech-to-text processing.
• Mediapipe: Framework for gesture recognition.
[https://developers.google.com/mediapipe]
• LLaMA: Open-Source Large Language Model
Datasets & APIs Used
• Google Speech-to-Text API (for speech recognition).
• Pre-trained Machine Learning models for sentiment analysis.
• Open-source datasets for training gesture-based commands.
Inspiration Behind the Project
• Our AI assistant was inspired by the futuristic technology depicted in Iron Man,
particularly J.A.R.V.I.S. While our model is not as advanced, it takes a small
step in that direction by integrating real-time data fetching, conversational AI,
and application control through gestures and voice commands.

85
• tilities

86

You might also like