0% found this document useful (0 votes)

33 views

Navigation System Using YOLOv8

Obstacle Detection and Warning System for the visually impaired using an optimized YOLOv8. A real-time video from a camera incorporated a wearable is received and sent to a mobile application that is connected to a cloud model for analysis.

Uploaded by

medof45802

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

Navigation System Using YOLOv8

Uploaded by

medof45802

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Detection and Navigation System using YOLOv8 and Generative AI

Abstract
The incorporation of sophisticated technologies in assistive systems has demonstrated a great potential in improving
the quality of life of visually impaired people. This paper presentes an improved Obstacle Detection and Warning
System for the visually impaired using an optimized YOLOv8 and Generative AI. Real-time video from a camera
incorporated a wearable is received and sent to a mobile application that is connected to a cloud model for analysis.
The key innovation includes advancements in YOLOv8 architecture with enhanced data augmentation and fine-tuning
for better detection in dynamic scenarios. Optical Flow and LSTM networks improve motion tracking in temporal
analysis, generative AI offers natural-sounding audio feedback to the user. The experiments with a diverse mobile
videos dataset show that the proposed system achieves 95% detection rate, which proves its efficiency in real-life in
different environmental conditions. The findings reveal enhanced detection precision and reaction time, which can be
valuable resource for increasing the accessibility and security of the environment for the visually impaired. This work
demonstrate the possibility of using deep learning with generative AI to create new forms of assistive technologies.

Introduction
The development of technology has always changed the society’s interaction with the environment, providing new
approaches to different issues [1], [2], [3], [4], [5]. Of these challenges, navigation and spatial awarness for the blind
has remained a major concern. The white can has been a basic aid for the blind people, as it gives haptic information
about the surroundings. It is useful in most cases but not as helpful in sensing obstacles that are out of reach of the
cane or sensing changes in the environment that may be sudden [6]. The use of assistive technology have been very
helpful in improving the lives of the visually impaired through offering them devices to help them in mobility, activity
of daily living and communication. In the past, these technologies have also comprised in guide dogs [7]. However,
these traditional methods have been their drawbacks, specially when the environment is constantly changing and the
learner is not familier with it, feedback and guidance are needed immediately. The application of advanced
computational technologies in these solutions is expected to improve these solutions by offering better, adaptive and
interactive systems.

The assistive provided by guid dogs is of very high degree as these dogs help the visually impaired individuals navigate
around barriers and through various terrains. However, the use of dog dogs is effective but it involves a lot of training
and maintenance and the availability of these dogs is limited and not suitable for everyone [8]. Tactile maps and braille
signs are informative about the environment but are not dynamic and cannot give real time directions.

The problem of oreintation is unknown territories presents significant challenges to the visually impaired, which leads
to the loss of some of their autonomy and the likelihood of accidents. Modren assistive technologies may not provide
adequate real-time feedback or context-aware warnings, which makes users exposed to obstacles that can potentially
harm their safety and mobility, it is evident that there is a need for systems that can identify obstacles, uderstand the
context in which they are located and give feedback to the users.

The advancement of modren computational technologies has been very fast and this has greatly influenced the nature
of assistive systems from being static to dynamic feedback systems. Leading this change is the development of
machine learning and deep learning to design better solutions for the visually impaired.

The motivation behind this work is discussed below

1.2 Computer Vision technology has evolved to the level where it can analyzed and comprehend visual
information in a way that is similar to how a human being would. Using video feeds from the cameras, computer
vision systems can identify objects and their type, recognize patterns and even predict their actions in real-time [11],
[12]. These capabilities are especially useful in assistive technologies, where accurate and timely detection of obstacles
is paramount. The appilcation of models such as YOLO (You Look Only Once) has made it possible to detect objects
in real-time by processing images in a fast and accurate manner, thus making it possible to use such systems in dynamic
environments such as the navigation of the visually impaired in urban areas. The YOLOv8 is the latest version of this
model, faster and has better detection than the previous versions, making it ideal for real-time use.

The advancement of generative language models like GPT-2 has made it possible to generate grammatically correct
and contextual appropriate text from the input. These models can produce natural language descrptions that can be
used to give more information about the identification obstacles. Combining these developments with Text-To-Speech
(TTS) technology enables the translation of text into auditory feedback, making it a complete and inclusive solution
for users.

This paper presents a new obstacle detection and warning system that uses YOLOv8 for real-time object detection, a
generative AI model for text generation and TTS for audio output. This system is intended to improve the navigation
for the visually impaired people by providing them with accurate and context-based information on the object that are
in their way. The workflow of the proposed system includes video input acquisition, frame processing using YOLOv8
to identify objects and their classes, text description using GPT-2 and converting this text into an audio signal using
TTS. This approach is intended to give users quick, easy and useful information about their environment to enhance
spatial orientation and safety in navigation.

The main goal of this study is to design and test a real-time obstacle detection and warning system that incorporates
generative AI for improving the mobility of the visually impaired.

The specific contributions of this paper include:

• Utilizing the strengths of optimized YOLOv8 to detect and categorize the obstacles with high efficiency in
different scenarios.
• Integration of Generative Text Description to provide and contextually appropriate descriptions of the
detected obstacles, thus improving the richness of the feedback given to the users.
• Using TTS to translate descriptive text into audio warnings so that the feedback is immediate audible and
comprehensible.
• Evaluate the feasibility of the system in real-life scenarios to show how it improves navigaiton and safety for
the target group.

The proposed system is a major improvement over current assistive technologies as it not only identifies objects in
real time but also provides feedback that adapts to the user’s current context. This system can be very useful in
increasing the independence of the visually impaired by increasing their spatial awareness and giving them timely
warnings. The incorporation of generative AI not only enhances the quality and pertinence of the feedback but also
creates new opportunities for development of assistive technologies.

This research is a part of the intelligent mobility solutions and can be used as a basis for further advancements that
can enhance and extend the functionalities of assistive systems. The findings of this research may be useful in the
development of new technologies that are more suitable for visually impaired users and thus enhance their integration
into society.
Literature Review
The existing literature on obstacle detection and assisitive technologies for the blind includes a wide variety of studies
based on different methodologies, technologies and application area. This review provides a overview of the most
important advancements in this area, summarizing the advantages and limitations of several approaches. For many
years traditional assistive technology has been used to help the visually impaired move around. For example white
canes and guide dogs are fundamental tools that provide necessary aid. Neverthless, these types of toold have limited
range and adaptability in complex dynamic environments. Although they are effective in immidiate obstacle detection,
they cannot provide contextual information or anticipate environmental changes.

The first attempts to apply computer vision to assitive technologies were aimed at using simple image processing
algorithms to identify obstacles. Used sensors and cameras to capture data of the environment and this was followed
by analysis to determine possible risks. However, there systems were often slow in processing and less accurate and
could not work in real-time or in complex situations. The use of machine learning and deep learning has enhanced the
features of computer vision systems in assistive technologies. CNNs are one of the most successful deep learning
models that have been used in object detection tasks [13]. These models can learn the features from the large datasets
in a heirarchical manner and can classify the objects with high accuracy.

R-CNN and Faster R-CNN were introduced by Girshick et al. which was a two-stage method that first produces region
proposals and then classifies them. This was advanced by Faster R-CNN that incorporated a region proposal network
to increase the speed and accuracy of the model. However, these models are complex and computationally expensive,
which limits their use in real-time applications, especially on mobile platform [14], [15]. YOLO and SSD, Single
stage object detection models such as YOLO and SSD have been widely used for object detection since they can detect
objects in a single shot and hence are faster. Real-time object detection YOLO models by Redman et al. are fast and
have become the reference for many applications. The SSD model proposed by Liu et al. provides the opportunity to
detect objects quickly but can be problematic with small objects [16].

The development of Generative AI and NLP has created new opportunities for improving assistive technologies. GPT-
2 and GPT-3 are examples of language models that have shown the capability of generating syntactically and
semantically correct text that can be used to describe objects and environments. This capability is especially useful in
assitive technologies where giving precise and natural language descriptions of the obstacles can significantly improve
th experience.

TTS system have improved over the year, and it is possible to get high quality natural speed analysis. These systems
are important for converting test descriptions produces by NLP models into auditory feedback, which allows users to
quickly and easily understand the environment around them. The incorporation of TTS systems into assistive
technologies improves their functionality and applicability for visually impaired person [17].

Table 1 Existing work on Obstacle detection and recognition system

References Type of dataset Techniques Goals Achievements

2024, [18] KITTI, LASIESTA, YOLOv8 object motion-specific accuracy of 90%
PESMOD, and detection detections
MOCS
2024, [19] Not discussed YOLO variations UAV-based object competitive
detecting detection accuracy
2023, [20] custom dataset deep transfer reliability challenge 97% mean Average
learning techniques Precision (mAP)
2024, [21] Not discussed Yolov5 object safe and efficient increasingly detect
detection model robot navigation and avoid obstacles
2023, [22] KITTI benchmark U–V disparity obstacle avoidance achieve high
images and path planning detection precision
2023, [23] COCO dataset SSD-Lite recognize distinct Markable Detection
MobileNetV2 objects accuracy
2023, [24] Not discussed YOLOv5 Neural Increase detection 83% accuracy
Network speed and accuracy
2024, [25] Real-time sensor YOLOv5 Measure distance, 86% detection rate
dataset localization and
mapping
2024, [26] Video-based dataset Attendant Block Navigate effectively 98% accuracy
YOLO Network in indoor and
model (SAB- outdoor
YOLO) environments.
2024, [27] KITTY-2 dataset Novel non-AI Fast and reliable 20% improvement
approach system

YOLO (You Look Only Once) models have revolutionized obstacle detection in deep learning, especially concerning
real-time applications. However, it remains a hard task to detect moving objects on video stream. This paper introduces
a modifies YOLOv8 model that specializes in motion-specific detections across diverse visual contexts. We increase
the sensivity to movement by incorporating personalized preprocessing and architecture changes. Testing using
benchmark datasets like KITTI, LASIESTA, PESMOD and MOCS shows that our altered YOLOv8 outperform
existing models specifically in environments with high motion. This model achieves 90% accuracy, has an mAP of
90%, processes at 30 FPSand scores an Intersection Over Union (IOU) value of 80%. It will help the researcher in
security analysis, traffic flow management and movie studies where being aware of the movement is important for
understanding object trajectories. With AI and computer vision increasingly emphasizing dynamic scene
interpretation, this advanced version of YOLOv8 model highlights its potential for specialized object detection from
which it can be seen how critical these results are to the advancement of object detection technologies [18].

Unmanned Aerial Vehicles (UAVs) are increasingly used in various applications such as surveillance, delivery,
disaster management, and precision agriculture. Real-time and accurate object recognition is essential for UAVs to
independently perceive and interact with their environments. The YOLO (You Only Look Once) algorithm family has
become a promising solution for efficient object identification due to its ability to achieve short inference times while
maintaining high detection accuracy. This work investigates the use of YOLO variations in UAV object recognition,
analyzing the architectural innovations and algorithmic optimizations in different YOLO versions and their impact on
UAV tasks. We also assess the challenges and opportunities of deploying YOLO models on UAV platforms,
considering factors like computational efficiency, model size, and environmental robustness. Our study aims to
provide insights into current methodologies, highlight emerging trends, and suggest areas for future research at the
intersection of UAVs and YOLO-based object recognition [19].

Outdoor mobility of the visually impaired is limited because they are likely to collide with objects, which affects their
physical and mental well-being. Different technologies mobility aids have been designed, and most of them
invorporate machine intelligence and deep learning (DL) for object detection. However, existing approaches have
relaibility problems because of real-time dynamics and absence of the information about potential dangers described
by VIPs. This paper presents an object detection model (ObDtM) based on deep transfer learning for a set of obstacles
that VIPs deemed dangerous. The dataset was collected from public domain and was manually preprocessed and
annotated for training the ObDtM. The experiments proved that ObDtM was superior to the current models with a
97% mAP, which indicates that the proposed DL approach was relaibale and could be applied to various fields. The
dataset and ObDtM have several uses, especially in IoT and smart cities scenarios [20].

Intelligent robotics is becoming increasingly important in maintenance, repair and Overhaul (MRO) hanger
operations, where mobile robots must navigate complex environments for aircraft visual inspection. Aircraft hangers
are busy and dynamic with various obstacles that can pose collision and safety hazards. This makes obstacle detection
and avoidance crucial for safe and efficient robot navigation. Traditional methods face computational challenges,
while learning-based approaches often lack detection accuracy. This paper introduces a vision-based navigation model
that integrates a pre-trained YOLOv5 object detection model into a Robot Operating System (ROS) navigation stack
to optimize obstacle detection and avoidance in complex environments. This model was tested using the ROS-Gazebo
simulation and the TurtleBot3 Waffle Pi plateform. Results demonstrate that the robot effectivley detected and avoided
obstacles while navigating through checkpoints to the target location [21].

Dynamic obstacle detection is essential for obstacle avoidance and path planning in autonomous driving. This study
introduces a method combining U-V disparity and residual optical flow to detect dynamic obstacles. The process
begins by identifying the drivable area using U-V disparity images. Obstacles within this area are then detected based
on the geometric relationship between their size and disparity. The motion likelihood of each obstacle is estimated by
compensting for the camera’s ego-motion. The key innovation is narrowing the search range to obstacles in the
drivable areas, enhancing both detection efficiently and accuracy. The method was tested on KITTI benchmark
datasets and self-aquired campus scene data, demonstrating high detection precision and low missed detection rates
and reduces processing time [22].

Obstacle detection is a key advancement in computer vision and machine learning, enabling the identification and
localization of objects in images and videos. This paper presents a low-cost assistive system for obstacle detection and
environmental description to aid visually impaired individuals, utilizing deep learning techniques. The proposed object
detection model employes the Tensorflow object detection API and SSDLite MobileNetV2, pre-trained on the COCO
dataset with approximately 328,000 images of 90 objects. The system is also integrated google-text-to-speech,
PyAudio and playsound speech recognition to provide audio feedback on detected objects. The device is mounted on
a head cap, offering a more efficient alternatives to a traditional white cane. This afforable system aims to enhance the
daily lives of visually impaired individuals [23].

2.1 Adopting Our Research Stretagy: Insights from the Literature

However, several issues are still evident in the development of efficient obstacle detection and assistive systems. One
of the main issues is the trade-off between the detection rate and the time needed for the detection process, especially
in complex and constantly changing conditions. It is important that systems are able to run in real-time while at the
same time being very accurate. Timely response to the user is also the issue should be focused on.

Furthermore the use of these technologies on mobile platforms raises issues concerning computational overhead and
power consumption. Scientists are working on the methods of adaptation of the models for the mobile platform, which
will allow them to provide high performance and fast processing without losing accuracy.

The latest studies have been directed towards the enhancement of the existing models like the YOLO family for better
performance in the detection tasks. To improve the detection, attempts are being made to modify the model
architectures, introduces attention mechanisms and use transfer learning.

The research queations for the comparative study on obstacle detection for visually impaired are:

RQ1: How does the optimized YOLOv8 model perform in terms of accuracy and speed on mobile devices in different
environmental conditions?

RQ2: How effectively the Generative AI creates descriptions that are useful and appropriate for the visually impaired’s
navigation?

RQ3: How does the stated audio feedback raised by the system aid the ability of the visually impaired users to achieve
better situational awareness?

RQ4: How can the model do when faster moving objects and objects that populate the same area are considered?

The literature on obstacle detection and assistive technologies for visually impaired people reveal that there are
improvements in the computer vision, machine learning and NLP fields. These technologies can be used to improve
assistive systems by providing real-time and dynamic feedback that will increase the safety and independence of the
users. Current research is still being conducted to solve the issues of enhancing these systems for practical use and to
provide better solutions in the future.
Proposed Technique
The proposed methodology outlines a detailed Obstacle Detection and Warning system that will help the visually
impaired by offering them real-time information about their surroundings. The system combines the best of the current
techologies such as YOLOv8 for object detection, generative AI for description and TTS for audio guidance.

The model is based on multi-component structure that allows for the fast and accurate processing of video inputs and
the provision of audio responses. The Figure 1 illustrates the working of the model.

Fig. 1 WorkFlow diagram of the proposed model

Video Input Acquisition

The model has been trained on mobile video and tested with the videos collected from different environmental
conditions. The video camera is attached to the glasses worn by the user. The real-time video captured by the camera
is transmitted to the mobile device via Wi-Fi. A dedicated mobile application is connected to the cloud , where the
model is trained and optimized. This setup allows the system to leverage cloud computing resources for efficient
processing and continuous updates to the model. The trained model is tested in the roads and inside the mall in a
crowed and achieves remarkable accuracies.

Preprocess and Optical Flow

Pre-process is the vital step that enhances the video frames before they are processed by the YOLOv8 model and other
parts of the system. It helps to achieve the required quality of the input data necessary for proper detecion and analysis.

• Feature Extraction: Frames has been seperated from the video stream to process each frame as an individual
frame. The video is generally recorded at a fixed rate of frames per second (30 frames per second). Frames
are extracted one at a time for real-time processing.
• Resolution Adjustment: Frames have a resolution that is compatible with the YOLOv8 model and is of the
best quality. Fames are scaled to a standard dimension (640*348 Pixels) to optimize the trade-off between
detection performance and time.
• Normalization: Standardize the pixel values to a standard range such as 0 to 1 to ease the training and
inference of the model. Pixel values are normalized by dividing by 255 when the input was in the range of 0
to 255.
• Data Augmentation: Improved the ability of the model by adding variation in the training dataset. Random
rotations, flips and color changes are used to mimic various environmental conditions.
• Noise Reduction: Eliminated unnecessary features in the frames of the video that may cause false alarms.
Gaussian blur is used to smoothen the image.
• Conversion to Grayscale: Color information is not important, then it is better to simplify the input data and
decrease the amount of calculation. Frames are changed from RGB to grayscale which is useful for
calculation of optical flow.
• Optical Flow: This technique is used in this model to estimate the motion of object between consecutive
video frames. This information is crucial for understanding dynamic scenes and enhancing the detection
capabilities of the model. It captured the apparent motion of objects in the scene by analyzing changes
between frames. It helped in detection future positions of moving objects and aiding in proactive navigation
and warning system. A dense optical flow algorithm used to capture flow patterns for each pixel. The optical
flow is computed between consecutive frames using one of the algorithms mentioned above, resulting in a
vector field representing motion.

Object Detection using Optimized YOLOv8

The pre-processed frames were fed into the optimized YOLOv8 object detection model, which identifies objects and
obstacles within each frame. YOLOv8 uses convolutional neural networks (CNNs) to detect objects by predicting
bounding boxes and class probabilities [3]. The model ouputs a list of detected objects, including their class labels and
bounding box coordinates.
Fig. 2 Architecture of optimized YOLOv8 used in proposed model

The model YOLOv8 is modified by using Temporal Analysis technique to enhance the efficiency and performance of
the model for the moving objects in the video as illustrated in Figure 2. The optimized YOLOv8 architecture in this
model incorporates several advances features designed to improve detection accuracy and processing speed. A key
innovation is the integration of a sophisticated ResNet backbone network, which enhances feature extraction by
capturing intricated detailes across various scales [28]. This is achieved through a series of Convolutional layers that
efficiently extract hierachical features from input videos. This architecture of the model allows to process multi-scale
features, significantly improving its ability to detect small and large objects within complex scenes [29]. The Feature
Pyramid Network (FPN) ensures that high-level semantic information is retained while aggregated features from
different layers, facilitating better detection performance across different scales. Motion vectors has been computed
to help understand movement and dynamic between frames. The processed features have been used in this model to
generate predictions for object detection classes, bounding box locations and objectness scores. Non- Maximum
Suppression (NMS) is applied to filter overlapping boxes and retain the most confidence detections. Thresholds were
set at 0.6 to measure the confidence score of the model. Used temporal information to track detected objects across
frames, improving consistency and reducing false positives. Overlay bounding boxes, class labels and confidence
scores on the video frames. All this working is done in python jupyter after installation of libraries.

Generative AI for Contextual Descriptions

Generative AI as a subset of artificial intelligence that can generate new content from the image clip, text, based on
learned patterns from existing data. We have used this techniques to improve model robustness and enhancing real-
time processing capabilities. In this model we have used Adversarial Network (GANs).
Results and Discussion
In this section, we describe the findings of the proposed modeland its implications. The main goal of this work was to
design a reliable, real-time object detection system to improvethe mobility and safety of the visually impaired.
Building on the improved YOLOv8 structure, this system incorporates generative AI to offer contextural audio cues,
thereby enhancing the user experience.

The data has been collected from different environements, outdoor and indoor malls during crowed. The findings
shows that the proposed approach enhances the detection accuracy and system efficiency, which support the
effectiveness of the proposed model. The performance of the system is assessed based on the accuracy based on the
accuracy, precision, recall and inference time. Moreover, we present the results of field tests carried out in different
conditions, which highlights the practical applicability and drawback of the proposed approach. This section is
organzied into 2 parts, where we discuss the results, compare them with the existing solutions, and outline the
directions for further research.

Fig. 3 Possible classes detected by the proposed model

The model is trained for 80 classes, and it is experimented in different environments. The above figure 3 illustartes
the results extracted from the road. Out of 80 model has detected these possible classes.

Current Frames Optical Flow Detected results on the

Visualization frames.
Fig. 4 Real-time experimental results of proposed model

The user has experimented the model in different places, inside the mall in crowd area, outdoor roads with traffic.
Figure 4 illustated the results we have extracted from the video recorded during user’s experiments. The camera was
attached to the glasses captured the real-time video, continously streams video to the processing unit directly to the
server. The video data is transformed over a Wi-Fi network to the server for processing, ensured the reliable and fast
data transmission. Here the data is preprocessed, by enhancing the video quality, resizing and normalizes frames for
optimal model input. Yolov8 model utilizes to trained to detect objects in video frames. Detected objects and generates
bounding boxes and confidence scores.

Equations used in this model are

𝑝𝑟𝑒𝑑𝑥 = Ǿ (𝑟𝑎𝑤𝑥) + 𝑔𝑟𝑖𝑑𝑥 …..1

𝑝𝑟𝑒𝑑𝑦 = Ǿ (𝑟𝑎𝑤𝑦) + 𝑔𝑟𝑖𝑑𝑦 ….2

𝑝𝑟𝑒𝑑𝑤 = 𝑎𝑛𝑐ℎ𝑜𝑟𝑤. 𝑒𝑟𝑎𝑤𝑤 …….3

𝑝𝑟𝑒𝑑ℎ = 𝑎𝑛𝑐ℎ𝑜𝑟ℎ. 𝑒𝑟𝑎𝑤ℎ ……..4

where gridx and gridy represent the coordinates of the grid cell and anchorw and anchorh are anchor boxes dimensions.

The objectness of score p predicts the likelihood of an object being present in the bounding box.

𝑃 = Ǿ (𝑟𝑎𝑤𝑝) ….5

Where Ǿ denotes the sigmoid activation function applied to the new output.

For Class probability for each bounding box if C are the classes, the class probability p i are computed as:
𝑒𝑟𝑎𝑤𝑖
𝑝𝑖 = ….6
𝛴𝑐 𝑗=1 𝑒 𝑟𝑎𝑤𝑗

Where rawi is the raw score for classi and C is the number of classes.

Optimized YOLOv8 employs a combination of loss function for training, including:

𝐿𝑜𝑠𝑠𝑏𝑜𝑥 = 𝜆𝑏𝑜𝑥 . (𝐼𝑜𝑈𝑝𝑟𝑒𝑑 – 𝐼𝑜𝑈𝑡𝑎𝑟𝑔𝑒𝑡)2 …..7

Where λbox is a scaling factor, and IoU is the intersection over union.
𝐿𝑜𝑠𝑠𝑜𝑏𝑗 = −(𝑦. 𝑙𝑜𝑔(𝑝) + (1 − 𝑦). 𝑙𝑜𝑔(1 − 𝑝))…..8

Where y is the ground truth objectness label and p is the predicted objectness score.

𝐿𝑜𝑠𝑠𝑐𝑙𝑎𝑠𝑠 = −𝛴𝑐 𝑖 = 1 𝑦𝑖 . 𝑙𝑜𝑔 (𝑝𝑖) ……9

Where yi is the ground truth label for class i and p i is the predicted probability for classi.

𝐿𝑜𝑠𝑠𝑡𝑜𝑡𝑎𝑙 = 𝐿𝑜𝑠𝑠𝑏𝑜𝑥 + 𝐿𝑜𝑠𝑠𝑜𝑏𝑗 + 𝐿𝑜𝑠𝑠𝑐𝑙𝑎𝑠𝑠 …..10

Where the total loss function is a weighted sum of the bounding box regression loss, objectness loss and class
prediction loss.

Fig. 5 Performance Metrics of Model Processing Time

The graph in Figure 5 illustrates the performance metrics of the proposed model, focusing o three key stages: it can
be divided into preprocessing, inference and postprocessing. In the real-time applications system for obstacle
detection, inference time is 240ms. The results achieved by integrating temporal analysis & optical flow in YOLOv8
model. The system needs to respond within a specific time frame. This graph shows the analysis of performance of
our model and hardware. The distribution in the graph shows the maximum frames detected within the threshold value.
The chosen threshold provides a good balance for specific use case. Each stage is plotted against the corresponding
processing times across different instances, with the following thresholds indicated: each stage is plotted against the
corresponding processing times across different instances. A processing time that is less than the threshold value is
desirable and a value close to or equal to the threshold value may point to areas that need improvement. In general,
the graph allows evaluating the model’s real-time performance and identify possible ways of enhancing efficiency in
real-life scenarios.

Specifying the preprocessing time limit is similar to the inference time but is unique to the preprocessing stage of our
pipeline as illustrated in Figure 6. Here are some factors to help determine an ideal threshold for preprocessing time.
In real-time systems, the time taken for preprocessing should not be very large in order to avoid delay. In this proposed
model, the threshold value for processing time is set at 7ms. All the value that is below 7ms are acceptable, it means
they takes lower time as compared to the threshold value. It effects the speed of the model processing. The median
preprocessing time from our data is around 6, that why we set the threshold value slightly above this to cater for the
variation and ensure it captures most of the cases.

Fig. 6 Performance Metrics of Postprocessing Time

Specifying the time limit for postprocessing is a matter of guaranteeing that this phase does not negatively impact the
system and is suitable for our application. In the real-time applications, postprocessing must be as fast as possible to
avoid any disruption of the process. In this proposed model threshold value for postprocessing time is set as 2.0ms. if
postprocessing causes delays that are noticeable, especially in interactive applications, then the threshold should be
set low to ensure that the user experience is not compromised. In the Figure 7, postprocessing results shows that the
model is efficiently working.

4.1 Generative AI Model used in proposed model

The incorporation of Generative AI into an Obstacle Detection and Warning System improves the system’s capability
of offering relevant and useful information to the users, especially the visually impaired. Once the objects were
identified by optimized YOLOv8, Generative AI provides a textual description of background information about
identified objects. Translates the raw data into information that can be easily interpreted and used for instance, in terms
of the kind of obstacles and where they were. It enhances the user experience by providing more specific and
contextual appropriate information.

Adjusts description according to the user’s surroundings and physical activity. Makes sure that the feedback is relevant
and helpful in different circumstances. The frames captured from the camera are passed through optimized YOLOv8
to detect and label the objects. Detected results are the bounding boxes, class labels and confidence scores taken in
python. The detected results are converted into a structured input that the generative AI model can understand. This
involved extracting the object type, locations and other details from the detection input. This generated output was
integrated with audible feedback by using Text-To-Speech (TTS) technology.
Equations used by Generative AI are:

𝐿𝑜𝑠𝑠𝐴 = −𝑙𝑜𝑔(𝐵(𝐶(𝑥)))….11

Where B is the discriminator and C(x) is the generated data.

𝐿𝑜𝑠𝑠 𝑧 = −[𝑙𝑜𝑔(𝐵(𝑦)) + 𝑙𝑜𝑔 (1 – 𝐵(𝐶(𝑥)))]….12

Where y is real data.

Evaluation Metrics
The intersection over union (IOU) is used to measure the overlap between predicted and ground bounding boxes.

Results of Intersection over union (IOU) of each frame

Figure 9 illustrates the result of each frame detection during evaluation. Results of some the frames are given in the
graphs. Equations used to measure these frames are:

𝐼𝑜𝑈 = 𝐴𝑟𝑒𝑎(𝑖𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡𝑖𝑜𝑛) / 𝐴𝑟𝑒𝑎(𝑢𝑛𝑖𝑜𝑛) ….13

Where,

𝐴𝑟𝑒𝑎𝑖 = 𝐴𝑟𝑒𝑎𝑝 Ո 𝐴𝑟𝑒𝑎𝑑 …….14

𝐴𝑟𝑒𝑎𝑢 = 𝐴𝑟𝑒𝑎𝑝 Ս 𝐴𝑟𝑒𝑎𝑑 ……..15

Accuracy results of proposed model

These graphs shows that how accuracy of the predictions changes as the confidence threshold is varied, allowing us
to identify the model performed well in different scenarios. Data collected from different environments and trained
using optimized YOLOv8 accuracies are given in graphs. These graphs compared the ground truths with the predicted
truth labels for a validation dataset. Computed the accuracies by dividing the number of correct predictions by the
total number of predictions.

Precision measures the accuracy of the positive predictions.

𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 (𝑇𝑃)
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = …..16
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 (𝑇𝑃)+𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 (𝐹𝑃)

Recall measures the ability of the model to find all the relevant cases.
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 = …..17
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 (𝑇𝑃)+𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 (𝐹𝑁)

The curve is plotted with recall on the x-axis and the precision on y-axis.
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 (𝑇𝑃)+𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠 (𝑇𝑁)
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = …..18
𝑇𝑜𝑡𝑎𝑙 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐼𝑛𝑠𝑡𝑎𝑛𝑐𝑒𝑠

The threshold is the value above which a prediction is considered positive. By specifying the threshold, we have
generated different values for true positive, false positives, True Negatives and False Negatives, which in turn affect
the precision recall and accuracy. The detected model had an accuracy of 95% and precision, recall
Results from 6 different experiments

For each experiments the graphs hold the data includes confidence score, bounding box area, IOU value and class IDs.
Each subplot corresponds to an experiment. These 3D graphs are visualizing the results achieved. By examining these
graphs, we can observe the distribution of confidence score and IOU values relates to bounding box areas different
classes and experiments.

The object detection model has an accuracy 95%, the system had an average inference time 50ms per frame, which
allowed real-time operation at 20 FPS. The user testing showed an increase in the level of confidence in navigation
among the visually impaired users where 85% of them said that it was easier to avoid obstacles. However, the
performance was slightly lower in the low light environment and it was suggested that the night time detection should
be improved.

The paper presented an improved Obstacle Detection and M system that will improve the mobility and safety of the
visually impaired. By using an optimized YOLOv8 architecture, the model was able to increase detection accuracy in
complex and dynamic scenes. The incorporation of generative AI also improved the user experience by offering
context-aware auditory feedback in real-time, turning the detection outcomes into valuable data. The system was tested
in both indoor and outdoor environment and was found to be superior to other models with a detection accuracy of
95%. These results prove that integrating the most advanced deep learning algorithms with generative AI results in a
powerful and easy-to-use assistive technology. The results of the study support the effectiveness of the proposed
approach in enhancing the quality of life of visually impaired people by providing them with a useful means for
orientation in space without the risk of getting lost.

Despite the fact that the proposed system has a high potential, there are some directions for the future research and
development. Further studies will be devoted to the improvement of the model’s performance in different conditions
by using more various datasets with different weather and lighting conditions. This would further enhance the
generalizability of the model and make it more accurate in various context. Furthermore, the use of more advanced
generative AI approaches could be considered to offer even more specific and individualized feedback to the users.
Improving the natural language generation could make the auditory feedback more relevant to the user’s need and the
context of the interaction.

The further research can be the use of the model in edge devices to minimize the time delay and enhance the real-time
response. This would involve fine-tuning the model to be lightweight and to be able to run on mobile hardware, so
that the system remains snappy even on low-end devices. Extending the system to other assistive features, for instance
object recognition for navigation tools or connecting with wearable haptic feedback gadgets, could offer a broader
assistive solution. With the further development of this work, the system can become an essential tool for the blind
people, helping them to orient themselves in the world and become more independent.

The proposed object detection and warning system is a major improvement in the assistive technology for the visually
impaired. Thus, the proposed model based on Optimized YOLOb8 and generative AI not only provides high accuracy
of object detection but also provides real-time contextual feedback to improve user navigation. The use of advanced
data augmentation, temporal analysis using optical flow and fine-tuning makes the model very robust, accurate in
different and challenging scenarios, both indoor and outdoors.

The main advantages of this system is the possibility to analyze the real-time video stream and give the auditory
feedback in real time. This is important especially for the blind who need real time information to avoid barriers and
move around safely. The generative AI component enhances the user experience by translating the detection results
into natural language descriptions to make the system more user-friendly.

Comparisons with previous models show that this system is more efficient in detection accuracy and speed compared
to previous models. The fact that the model is able to achieve a 95% accuracy rate in all the scenarios that have been
discussed shows that the model is very reliable. Furthermore, the ability of the system to perform under various
environmental conditions proves that the system can be used in various practical applications.

Limitations
However, there are several limitations that should be considered in future work. One of the main issues is the fact that
the system relies on a stable and fast internet connection for the real-time video streaming to the cloud-based model.
In areas with low internet connection, the system may slow down and this will affect the processing and feedback
provision. This limitation may pose a threat to the safety and functionality of the system in some ways for the visually
impaired users.

One of the limitations of the proposed model is the computational cost and the optimized YOLOv8 architecture.
Although the model has been optimized for efficiency, it still needs a lot of computational resources, which may not
be possible to implement on low-end mobile devices. This constraint hinders the usability of the system particularly
in areas where sophisticated hardware is hard to come by.

The accuracy comparison of the Optimized YOLOv8 integrated with generative AI with the existing work. Optimized
YOLOv8 when integrated with Generative AI give better performance as compared to other existing works. Also, the
system’s ability to operate in adverse environmental conditions such as rain, fog or low-light conditions has not been
tested. These conditions could be a challenge to the model especially when it comes to identifying small or partially
occluded obstacles. More work should be done to enhance the model’s reliability under such circumstances, perhaps
by incorporating more sensors or using better data augmentation methods.

Finally, although the generative AI component offers useful auditory feedback, the current system might not meet all
user’s requirements. For instance, the users with hearing impairments or those with cognitive disabilities may need to
be provided with feedback in form of touch or vision. It is also important to note that extending the system’s
functionality to provide feedback in multiple modalities may significantly improve its usability and efficacy.

Route Detection and Navigation System Using Optimized YOLOv8
No ratings yet
Route Detection and Navigation System Using Optimized YOLOv8
21 pages
Assist Ive Technology For Visual Impairment
No ratings yet
Assist Ive Technology For Visual Impairment
15 pages
1240_SCEECS25
No ratings yet
1240_SCEECS25
7 pages
AI-Based Navigator For Hassle-Free Navigation For Visually Impaired
No ratings yet
AI-Based Navigator For Hassle-Free Navigation For Visually Impaired
6 pages
Obstacle Detection For Visually Impaire Using IoT
No ratings yet
Obstacle Detection For Visually Impaire Using IoT
21 pages
Trinetra: An Assistive Eye For The Visually Impaired
No ratings yet
Trinetra: An Assistive Eye For The Visually Impaired
6 pages
TSP CMC 28540
No ratings yet
TSP CMC 28540
23 pages
Virtual Assistant and Navigation For Visually Impa-1
No ratings yet
Virtual Assistant and Navigation For Visually Impa-1
12 pages
paper3
No ratings yet
paper3
37 pages
Portable Obstacle Detection System For The Visually Impaired
No ratings yet
Portable Obstacle Detection System For The Visually Impaired
6 pages
Blind Assistance System
No ratings yet
Blind Assistance System
8 pages
2023 Voice Assisted Real-Time Object Detection
No ratings yet
2023 Voice Assisted Real-Time Object Detection
14 pages
Mahendran Computer Vision-Based Assistance System For The Visually Impaired Using Mobile CVPRW 2021 Paper
No ratings yet
Mahendran Computer Vision-Based Assistance System For The Visually Impaired Using Mobile CVPRW 2021 Paper
10 pages
Abstract and Workplan
No ratings yet
Abstract and Workplan
3 pages
Project 1
No ratings yet
Project 1
14 pages
iajsjbn
No ratings yet
iajsjbn
9 pages
rajendran2020
No ratings yet
rajendran2020
4 pages
Design and Construction of Electronic Aid For Visually Impaired People
No ratings yet
Design and Construction of Electronic Aid For Visually Impaired People
11 pages
Computer Vision: Exploring the Depths of Computer Vision
From Everand
Computer Vision: Exploring the Depths of Computer Vision
Fouad Sabry
No ratings yet
Applsci 09 00989
No ratings yet
Applsci 09 00989
15 pages
Share integrated smart shoes for blind people mini project
No ratings yet
Share integrated smart shoes for blind people mini project
32 pages
Jafri 2017
No ratings yet
Jafri 2017
12 pages
IoT-based Obstacle Recognition Technique For Blind
No ratings yet
IoT-based Obstacle Recognition Technique For Blind
20 pages
Robotic Assistant For Object Recognition Using Con
No ratings yet
Robotic Assistant For Object Recognition Using Con
13 pages
Navigation Systems For Visualy Impaired
No ratings yet
Navigation Systems For Visualy Impaired
20 pages
Blind Assistive System Based On Real Time Object Recognition Using Machine Learning
No ratings yet
Blind Assistive System Based On Real Time Object Recognition Using Machine Learning
7 pages
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
1240_SCEECS25_Review
No ratings yet
1240_SCEECS25_Review
12 pages
Computer Vision: Fundamentals and Applications
From Everand
Computer Vision: Fundamentals and Applications
Fouad Sabry
No ratings yet
Research Paper
No ratings yet
Research Paper
7 pages
Design and Development of A Wearable Assistive Dev
No ratings yet
Design and Development of A Wearable Assistive Dev
16 pages
Iot Based Smart Gloves For Blind People
No ratings yet
Iot Based Smart Gloves For Blind People
7 pages
A Mobility Assistance Application Using Depth Imaging Analysis for the Visually Impaired - BGSASING
No ratings yet
A Mobility Assistance Application Using Depth Imaging Analysis for the Visually Impaired - BGSASING
62 pages
Investigating_YOLO_Models_Towards_Outdoor_Obstacle
No ratings yet
Investigating_YOLO_Models_Towards_Outdoor_Obstacle
20 pages
Obstacle Detection and Navigation System For Visually Impaired Using Smart Shoes
No ratings yet
Obstacle Detection and Navigation System For Visually Impaired Using Smart Shoes
4 pages
Tap
No ratings yet
Tap
6 pages
Phase 2 Report
No ratings yet
Phase 2 Report
79 pages
Design and Construction of Electronic Aid For Visually Impaired People
No ratings yet
Design and Construction of Electronic Aid For Visually Impaired People
11 pages
Ijsrp p7254
No ratings yet
Ijsrp p7254
4 pages
Developing a Flexible Navigational Assistive Application for the Visually Impaired and Blind using Depth Imaging Analysis
No ratings yet
Developing a Flexible Navigational Assistive Application for the Visually Impaired and Blind using Depth Imaging Analysis
15 pages
Kaushik, Chetan_ Kumar, Sudesh_ Gandhi, Saurav_ Gandhi, Niketa_ - [IEEE 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI) - Bangalore, India (2018 (2018, IEEE) [10.1109_IC
No ratings yet
Kaushik, Chetan_ Kumar, Sudesh_ Gandhi, Saurav_ Gandhi, Niketa_ - [IEEE 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI) - Bangalore, India (2018 (2018, IEEE) [10.1109_IC
7 pages
jdr20240086 -haptic assistance for visually impaired people
No ratings yet
jdr20240086 -haptic assistance for visually impaired people
10 pages
Advanced AI Solutions For Securities Trading: Building Scalable and Optimized Systems For Global Financial Markets
No ratings yet
Advanced AI Solutions For Securities Trading: Building Scalable and Optimized Systems For Global Financial Markets
10 pages
Assistive Technology For Visual Impairment
No ratings yet
Assistive Technology For Visual Impairment
15 pages
Empowering Navigation: Sight Sense - A Customizable Assistive Device For The Visually Impaired
100% (1)
Empowering Navigation: Sight Sense - A Customizable Assistive Device For The Visually Impaired
11 pages
Smart Mobility Aid Project Report
No ratings yet
Smart Mobility Aid Project Report
16 pages
Blind
No ratings yet
Blind
24 pages
Irjet V7i3567 PDF
No ratings yet
Irjet V7i3567 PDF
6 pages
Automated Mobility and Orientation System For Blind or Partially Sighted People
No ratings yet
Automated Mobility and Orientation System For Blind or Partially Sighted People
15 pages
Obstacle and Range Detection For Cane Us PDF
No ratings yet
Obstacle and Range Detection For Cane Us PDF
6 pages
Smart Eye A Navigation and Obstacle Detection For
No ratings yet
Smart Eye A Navigation and Obstacle Detection For
20 pages
THESIS-PAPER-FINAL
No ratings yet
THESIS-PAPER-FINAL
99 pages
Virtual Walking Stick For The Visually Impaired: Abstract-In This Work We Depict Principle Highlights of
No ratings yet
Virtual Walking Stick For The Visually Impaired: Abstract-In This Work We Depict Principle Highlights of
1 page
Smart Stick For Visually Impaired
No ratings yet
Smart Stick For Visually Impaired
6 pages
Activity Recognition: Fundamentals and Applications
From Everand
Activity Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Chapter 2 Edited
No ratings yet
Chapter 2 Edited
6 pages
Percept: Fundamentals and Applications
From Everand
Percept: Fundamentals and Applications
Fouad Sabry
No ratings yet
Implementation of Obstacle Detection and
No ratings yet
Implementation of Obstacle Detection and
5 pages
ic-ETITE47903.2020.093 Improving Reality Perception For The Visually Impaired
No ratings yet
ic-ETITE47903.2020.093 Improving Reality Perception For The Visually Impaired
7 pages
Federated Learning: Strategies For Improving Communication Efficiency
No ratings yet
Federated Learning: Strategies For Improving Communication Efficiency
5 pages
Toxic Language Identification Via Audio Using A Self-Attentive Convolutional Neural Networks (CNN)
No ratings yet
Toxic Language Identification Via Audio Using A Self-Attentive Convolutional Neural Networks (CNN)
7 pages
Weed Identification Using Deep Learning and Image Processing in Vegetable Plantation
No ratings yet
Weed Identification Using Deep Learning and Image Processing in Vegetable Plantation
11 pages
SpaceFusion - Advanced Deep Learning Operator
No ratings yet
SpaceFusion - Advanced Deep Learning Operator
16 pages
Project Diary Format New
No ratings yet
Project Diary Format New
14 pages
Sustainability 16 02770 v2
No ratings yet
Sustainability 16 02770 v2
38 pages
Google TPU
No ratings yet
Google TPU
27 pages
DLJS - Book Sample Chapters
No ratings yet
DLJS - Book Sample Chapters
25 pages
Excerpts - Machine Learning and Big Data Projects
No ratings yet
Excerpts - Machine Learning and Big Data Projects
80 pages
Winter Semester 2021-22 CSE4020-Machine Learning Digital Assignment-1
No ratings yet
Winter Semester 2021-22 CSE4020-Machine Learning Digital Assignment-1
20 pages
Preprints202307 0609 v1
No ratings yet
Preprints202307 0609 v1
26 pages
Tutorial Math Deep Learning 2018 PDF
No ratings yet
Tutorial Math Deep Learning 2018 PDF
103 pages
Rishabh Mathur Resume
No ratings yet
Rishabh Mathur Resume
2 pages
Syllabus
No ratings yet
Syllabus
11 pages
Nilsone Amanda
No ratings yet
Nilsone Amanda
40 pages
B1 MAJOR PROJECT PAPER
No ratings yet
B1 MAJOR PROJECT PAPER
8 pages
h19611 Nvidia Gen Ai WP
No ratings yet
h19611 Nvidia Gen Ai WP
33 pages
Transfer Learning For Alzheimers Disease Detection On MRI Images
No ratings yet
Transfer Learning For Alzheimers Disease Detection On MRI Images
6 pages
Fall Detection
No ratings yet
Fall Detection
6 pages
Learning Visual Voice Activity Detection With An Automatically Annotated Dataset
No ratings yet
Learning Visual Voice Activity Detection With An Automatically Annotated Dataset
6 pages
6-Month Roadmap To Becoming An AI Engineer - A Step-by-Step Guide
No ratings yet
6-Month Roadmap To Becoming An AI Engineer - A Step-by-Step Guide
21 pages
Citrus Disease Detection and Classification Using End-To-End Anchor-Based Deep Learning Model
No ratings yet
Citrus Disease Detection and Classification Using End-To-End Anchor-Based Deep Learning Model
12 pages
Classification of Citrus Plant Diseases Using Deep Transfer Learning
No ratings yet
Classification of Citrus Plant Diseases Using Deep Transfer Learning
17 pages
Advanced Spectral Classifiers For Hyperspectral Images A Review
No ratings yet
Advanced Spectral Classifiers For Hyperspectral Images A Review
25 pages
Multimodal Transformer Fusion For Continuous Emotion Recognition
No ratings yet
Multimodal Transformer Fusion For Continuous Emotion Recognition
5 pages
Dsu Baai Brochure 22052023
No ratings yet
Dsu Baai Brochure 22052023
16 pages
Deep Learning in Next-Frame Prediction A Benchmark Review
No ratings yet
Deep Learning in Next-Frame Prediction A Benchmark Review
11 pages
Data Uncertainty Learning in Face Recognition
No ratings yet
Data Uncertainty Learning in Face Recognition
10 pages
Contrastive_Learning-Based_Semantic_Communications
No ratings yet
Contrastive_Learning-Based_Semantic_Communications
16 pages
Deep Dive Into Time Series Forecasting - LinkedIn
No ratings yet
Deep Dive Into Time Series Forecasting - LinkedIn
6 pages