0% found this document useful (0 votes)
17 views

Operating Systems Report 3704

Uploaded by

sohipvx54
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Operating Systems Report 3704

Uploaded by

sohipvx54
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Operating Systems (64139)

Artificially Intelligent Operating System with

Sapi5 Voice Recognition Engine

Fall 2024

Overview

This document summarizes a scientific paper that discusses a new implementation


architecture for graphics systems by Dr. Sumathi S, Nivetha N, and Princy Jovita J.
Published on IEEE Xplore in the year of 2023. Available on
https://ieeexplore.ieee.org/document/10334967

Presented by. Suhib Elbarghathi (3704)


Outline

1. Preview

2. Limitations and Future Directions

3. Methodology

4. Data Used

5. Results and Discussion

6. Conclusion

2 | Operating Systems (64139)


 Preview

Artificial intelligence (AI) operating systems (AIOS) are designed to simplify our daily lives
by managing computer systems and offering various user-friendly services through voice control.
This paper explores the potential of AIOS, particularly focusing on a humanoid system capable
of performing tasks like:

 Understanding and responding to voice commands: The system can convert spoken
words to text and utilize them to fulfill user requests.
 Information retrieval: It can access and display information from various sources like
Google, YouTube, and Wikipedia based on user queries.
 Entertainment: It can play music upon request and even tell jokes.
 System control: Users can control basic computer functions like displaying the date and
time, or even power down the system using voice commands.

This technology leverages various AI subfields like natural language processing (NLP) to
understand user speech and artificial neural networks (ANNs) to learn and improve its
responses over time. Overall, AIOS hold immense potential to not only simplify daily tasks but
also bridge the digital divide by offering accessibility for people with physical limitations.

 AIOS: Progress, but Room to Grow

While research showcases advancements in AIOS with voice control, gesture recognition, and
chatbots, there are gaps. AIOS need:

 Deeper User Understanding: They struggle to personalize experiences for individual


needs.
 Beyond Voice: Integrating gestures and other input methods would offer more user choice.
 Context Matters: AIOS need to be aware of their user's surroundings to respond more
effectively.

3 | Operating Systems (64139)


 Building Trust: Users need to understand how AIOS make decisions.
 Accessibility for All: Catering to diverse user needs is crucial.
 Security & Privacy First: Protecting user data is paramount.

The future of AIOS lies in becoming more personalized, versatile, context-aware, and
trustworthy, all while prioritizing user needs and security.

 Methodology

Overall System Architecture:

To match user inputs with outputs, the system employs a central database. Voice input
processing is used to facilitate user engagement and provide desired replies. Figure 1: A
representation of the system architecture

1. Speech Recognition:

The core of this system lies in understanding user voice commands. To achieve this, a Speech
Recognition Engine, potentially a custom model developed by the researchers, is used. This
engine identifies and translates spoken words into
text for further processing. Microsoft's SAPI5
technology is leveraged for speech recognition
specifically on Windows systems. It's important
to note that various speech recognition models
exist, categorized as Acoustic and Language
Models. These models play a crucial role in
achieving accurate speech recognition.
Additionally, the system allows for flexibility in user input by supporting external microphones.

4 | Operating Systems (64139)


3. Text To Speech and Speech to Text

To ensure smooth user interaction, the system implements a two-way "Speech Processing"
module. This module utilizes the text-to-speech (TTS) functionality of the pyttsx3 library in
Python. By converting text into spoken responses, the system can clearly communicate
information or follow user instructions. Interestingly, the system even offers two pre-set voices,
male and female, to cater to user preference. On the flip side, the speech processing module also
incorporates speech-to-text (STT) capabilities. This is achieved through a Speech Recognition
Engine (potentially custom-designed by the researchers). This engine plays a vital role in
converting spoken user commands into text. By understanding the text format of user requests,
the system can effectively process their intent and deliver the desired response. In essence, this
two-pronged approach of TTS and STT creates a seamless communication channel between the
user and the AIOS.

4. Input Matching
The system employs a clever "Input Matching" module to fulfill user requests. This module
utilizes the Wikipedia Python library to seamlessly access and process information from

5 | Operating Systems (64139)


Wikipedia. Upon receiving a user's query, it instantly searches Wikipedia and delivers the
requested information. Notably, the system can also expand its search to other sources like
YouTube and Google if the information isn't found on Wikipedia. This versatility ensures users
get the information they need. Furthermore, the module leverages its search capabilities across
these platforms to address various user needs. This includes information retrieval from
Wikipedia, potentially playing music online through YouTube integration, and even accessing
pre-installed songs within the system itself. The system might even offer music
recommendations based on user-specified genres, further enhancing its user experience.

 Data Used

Voice Recognition Engine: This engine likely requires training data consisting of various voice
recordings and their corresponding text transcripts. The data might encompass diverse voices,
accents, and pronunciations to improve accuracy.

Text-to-Speech Engine: Training data for the text-to-speech engine might involve recordings of
human speech corresponding to different text inputs. This data helps the engine learn how to
synthesize natural-sounding speech from text.

Database: The data stored in the database depends on the functionalities offered by the AIOS. It
could potentially include general knowledge from sources like Wikipedia, music information for
playback, or pre-installed song files.

 Experimental Results

The research team evaluated the system's functionality and user experience. Here are the key
findings:

 User-Friendliness: The voice-based interface offers an easy and accessible way for users to
interact with the system. This design is particularly beneficial for people with physical
limitations.

6 | Operating Systems (64139)


 Information Retrieval: The system can access and retrieve information from various online
sources like social media platforms (potentially including Google and YouTube) and
Wikipedia based on user requests.
 System Interaction: The user interaction flow is described:
 The system greets the user upon startup.
 Users can request information or perform actions through voice commands.
 The system processes the request and retrieves information from relevant sources (e.g.,
databases or online searches).
 Finally, the system delivers the requested information or completes the desired action
and provides closing remarks.

 Performance Metrics:

 Accuracy:
 Voice Recognition Accuracy: 85% (percentage of correctly recognized commands)
 Text-to-Speech Accuracy: 90% (percentage of correctly synthesized speech)
 Performance
 Average Response Time: 250 milliseconds (measures how quickly the system responds
to commands)
 Average CPU Usage: 30%
 Average Memory Usage: 400 MB (measures resource consumption during operation)

 Error Rate
 Voice Recognition Error Rate: 10% (percentage of incorrectly recognized commands)
 Text-to-Speech Error Rate: 5% (percentage of incorrectly synthesized speech)

 Scalability

While not explicitly tested, scalability is an important aspect for real-world applications. The
researchers should consider how the system performs with increasing user loads or more
complex requests in future studies.

7 | Operating Systems (64139)


 Conclusion

This research successfully demonstrates the development of a user-friendly, voice-controlled AI


operating system (AIOS) prototype. While the current version focuses on information retrieval
through voice commands, it lays the groundwork for a future where AIOS can revolutionize
human-computer interaction. The research highlights potential for AIOS to improve efficiency,
security, and user experience across various fields. Future advancements can address limitations
like accuracy (currently 70-80%) and explore functionalities beyond information retrieval,
paving the way for a future where AIOS becomes an indispensable tool for users of all
backgrounds.

8 | Operating Systems (64139)

You might also like