Operating Systems Report 3704
Operating Systems Report 3704
Fall 2024
Overview
1. Preview
3. Methodology
4. Data Used
6. Conclusion
Artificial intelligence (AI) operating systems (AIOS) are designed to simplify our daily lives
by managing computer systems and offering various user-friendly services through voice control.
This paper explores the potential of AIOS, particularly focusing on a humanoid system capable
of performing tasks like:
Understanding and responding to voice commands: The system can convert spoken
words to text and utilize them to fulfill user requests.
Information retrieval: It can access and display information from various sources like
Google, YouTube, and Wikipedia based on user queries.
Entertainment: It can play music upon request and even tell jokes.
System control: Users can control basic computer functions like displaying the date and
time, or even power down the system using voice commands.
This technology leverages various AI subfields like natural language processing (NLP) to
understand user speech and artificial neural networks (ANNs) to learn and improve its
responses over time. Overall, AIOS hold immense potential to not only simplify daily tasks but
also bridge the digital divide by offering accessibility for people with physical limitations.
While research showcases advancements in AIOS with voice control, gesture recognition, and
chatbots, there are gaps. AIOS need:
The future of AIOS lies in becoming more personalized, versatile, context-aware, and
trustworthy, all while prioritizing user needs and security.
Methodology
To match user inputs with outputs, the system employs a central database. Voice input
processing is used to facilitate user engagement and provide desired replies. Figure 1: A
representation of the system architecture
1. Speech Recognition:
The core of this system lies in understanding user voice commands. To achieve this, a Speech
Recognition Engine, potentially a custom model developed by the researchers, is used. This
engine identifies and translates spoken words into
text for further processing. Microsoft's SAPI5
technology is leveraged for speech recognition
specifically on Windows systems. It's important
to note that various speech recognition models
exist, categorized as Acoustic and Language
Models. These models play a crucial role in
achieving accurate speech recognition.
Additionally, the system allows for flexibility in user input by supporting external microphones.
To ensure smooth user interaction, the system implements a two-way "Speech Processing"
module. This module utilizes the text-to-speech (TTS) functionality of the pyttsx3 library in
Python. By converting text into spoken responses, the system can clearly communicate
information or follow user instructions. Interestingly, the system even offers two pre-set voices,
male and female, to cater to user preference. On the flip side, the speech processing module also
incorporates speech-to-text (STT) capabilities. This is achieved through a Speech Recognition
Engine (potentially custom-designed by the researchers). This engine plays a vital role in
converting spoken user commands into text. By understanding the text format of user requests,
the system can effectively process their intent and deliver the desired response. In essence, this
two-pronged approach of TTS and STT creates a seamless communication channel between the
user and the AIOS.
4. Input Matching
The system employs a clever "Input Matching" module to fulfill user requests. This module
utilizes the Wikipedia Python library to seamlessly access and process information from
Data Used
Voice Recognition Engine: This engine likely requires training data consisting of various voice
recordings and their corresponding text transcripts. The data might encompass diverse voices,
accents, and pronunciations to improve accuracy.
Text-to-Speech Engine: Training data for the text-to-speech engine might involve recordings of
human speech corresponding to different text inputs. This data helps the engine learn how to
synthesize natural-sounding speech from text.
Database: The data stored in the database depends on the functionalities offered by the AIOS. It
could potentially include general knowledge from sources like Wikipedia, music information for
playback, or pre-installed song files.
Experimental Results
The research team evaluated the system's functionality and user experience. Here are the key
findings:
User-Friendliness: The voice-based interface offers an easy and accessible way for users to
interact with the system. This design is particularly beneficial for people with physical
limitations.
Performance Metrics:
Accuracy:
Voice Recognition Accuracy: 85% (percentage of correctly recognized commands)
Text-to-Speech Accuracy: 90% (percentage of correctly synthesized speech)
Performance
Average Response Time: 250 milliseconds (measures how quickly the system responds
to commands)
Average CPU Usage: 30%
Average Memory Usage: 400 MB (measures resource consumption during operation)
Error Rate
Voice Recognition Error Rate: 10% (percentage of incorrectly recognized commands)
Text-to-Speech Error Rate: 5% (percentage of incorrectly synthesized speech)
Scalability
While not explicitly tested, scalability is an important aspect for real-world applications. The
researchers should consider how the system performs with increasing user loads or more
complex requests in future studies.