Blind Assistive System Based On Real Time Object Recognition Using Machine Learning
Blind Assistive System Based On Real Time Object Recognition Using Machine Learning
HIGHLIGHTS ABSTRACT
• A system that helps blind people discover
Healthy people carry out their daily lives normally, but the visually impaired and
the objects around them using one of the
the blind face difficulties in practicing their daily activities safely because they
deep learning algorithms called Yolo.
are ignorant of the organisms surrounding them. Smart systems come as
• The system consists of two parts: the
solutions to help this segment of people in a way that enables them to practice
software, represented by the Yolo algorithm,
their daily activities safely as possible. Blind assistive system using deep
and the hardware part is the Raspberry Pi.
learning based You Only Look Once algorithm (YOLO) and Open CV library for
• The proposed system is characterized by detecting and recognizing objects in images and video streams quickly. This
high accuracy and good speed. work implemented using python. The results gave a satisfactory performance in
detecting and recognizing objects in the environment. The results obtained are
ARTICLE INFO the identification of the objects that the Yolo algorithm was trained on, where the
persons, chairs, oven, pizza, mugs, bags, seats, etc. were identified.
Handling editor: Muhsin J. Jweeg
Keywords:
Objects detection
Open CV
Python
YOLO
Deep neural network
1. Introduction
Visually impaired people have a difficult time moving around safely and independently, which prevents them from
participating in uniform professional and public activities both indoors and out. Similarly, kids have had a lot of practice
identifying the principles of the surrounding environment. According to a WHO (World Health Organization) statistical
analysis research, roughly 285 million people are blind or have amblyopia around the world, with 246 million having major
vision impairment [1]. Blind people have a tough time interpreting their surroundings, which is one of the most serious issues.
One of the biggest problems is that blind people face difficulty understanding the environment around them. Blind people
depend in their daily lives on other people, guide dogs, or electronic devices. Object detection is one of the primary tasks in the
field of computer vision. The ideal solution to address this problem is to train the object detector that works on a specific part
of the image and then apply these discoveries in a very fast way; this method achieved high success due to its maximum speed
and high accuracy. Recently, several techniques have been proposed to help blind and visually impaired people discover
objects around them [2], [3]. For example, some researches have relied on artificial intelligence techniques, others have used
ultrasound signals, others have used deep learning, and there are a lot of researches dealing with helping people with visual
impairment to discover objects surrounding them and avoiding obstacles. Several featured work has been advanced in the form
of blind sailing systems: voice system [4]. With the help of a GPS system, a portable computer, and laser inputs, traditional
eyeglasses are connected to a camera by which things in video images are recognized and converted to sound. The Tiflis
prototype [5] comprises of two cameras, a sensor attached to black glasses, and a vibration array that informs the user [6],
[7].With the help of a GPS gadget, a laptop computer, and RFID (Radio Frequency Identification) technology [8]. XASBliP
(Cognitive Aid System for Blind People) [9] is another system that includes a pair of glasses and a helm, as well as a tiny
laptop and an FPGA hand. In order to tackle object detection difficulties, several navigation systems rely heavily on machine
learning [10].Researchers frequently employ classification techniques that are close to global test model optimization, such as
159
http://doi.org/10.30684/etj.v40i1.1933
Received 21 November 2020; Accepted 13 February 2021; Available online 25 January 2022
2412-0758/University of Technology-Iraq, Baghdad, Iraq
This is an open access article under the CC BY 4.0 license http://creativecommons.org/licenses/by/4.0
Mais R. Kadhim et al. Engineering and Technology Journal 40 (01) (2022) 159-165
SVMs (Support Vector Machines) and Adobos (Adaptive Boosting), which have been widely employed in vision applications
[11].To detect things such as vehicles and persons, SVM (Support Vector Machine) has been utilized to sort hear wavelet
coefficients describing Gabor and edges properties [12]. Based on a similarity characteristic, Adobos was utilized to detect cars
and individuals in [13]. The combination of Hear-like feature extraction and Adobos has been utilized to detect the car's rear
end using edge features [14]. Knowledge Distillation [15] is a strategy that is referred regarded as a "teacher-student network."
The "teacher network" is a more complicated network with exceptional performance and generalization ability. This network is
frequently used to teach the "student network," which is less computationally intensive and easier to implement. The
knowledge of the "teacher network" is refined into a minimum model by studying the class distribution of the "teacher
network." The "student network" then performs similarly to the "teacher network." This strategy drastically decreases the
number of procedures and parameters required. However, there are a few flaws as well. This approach can only be used for
soft ax loss function classification jobs, which limits its utility (e.g., object detection). Another issue is that the model's concept
is far too rigid; resulting in poor performance. The authors presented a method for detecting objects that may be used to
support the region proposal network. As a feature extractor, NoCs approved Google Net and Resents. The precision of object
detection systems is also improving as neural networks become more complex. However, following the ROI pooling layer,
researchers don't pay much attention to merit fusion. The feature fusion module's job is to sort and localize object suggestions.
In the Fast/Faster RCNN, the merits fusion module is commonly a multi-layer perceptron. As a result, NoCs investigates the
effects of numerous feature fusion modules and discovers that they are just as significant as producing object proposal. The
importance of sorting and identifying object proposals is simply highlighted by NoCs. They proposed merit fusion modules,
which are extremely complex convolutional neural networks with much more operations and parameters than Faster RCNN
[16].Adobos are faster in the testing phase, while SVM are much faster in the learning and training phases. In this research, an
application will be designed that uses the detection of objects that use Deep learning algorithm that is programmed in Python
language as it is very fast and well used in deep learning. In this research, a system was designed to detect objects using the
Yolo algorithm, which is one of the deep learning algorithms, using the Open CV library and using the Python language, and
this system is used to help the blind and visually impaired to discover the objects around them. The algorithm was trained on a
set of images taken from COCO Dataset, and based on these images, the objects it contains, the algorithm detects the objects
that have been trained on it, and this algorithm has proven its high accuracy in detecting objects at a very high speed. The
possible applications of this research are Bladafah as it is a system that helps the blind in discovering the objects around them,
as this system can be used in large meeting rooms and it knows the people who attended the meeting by training the algorithm
on the shapes of people and their names, which is used in residential buildings and give a warning in the event that a stranger
enters in addition to other application especially that need high speed in real time
This paper is arranged as follows: section 2 present the Motivation, section 3 present the design and operation of the
system. Section 4 present results and discussion and section 5 present the Conclusion.
2. Problem statement
Visually impaired people have difficulty moving safely and independently, which prevents them from participating in
routine professional and social activities both inside and outside the home. In addition, as demonstrated in Figure 1, individuals
have difficulty identifying principles of the surrounding environment.
160
Mais R. Kadhim et al. Engineering and Technology Journal 40 (01) (2022) 159-165
𝑿𝑿 𝒊𝒊𝒊𝒊 𝑿𝑿 > 𝟎𝟎
∅ (𝑥𝑥) = � (1)
𝟎𝟎. 𝟏𝟏𝑿𝑿 𝒊𝒊𝒊𝒊 𝑿𝑿 < 𝟎𝟎
Figure 3: Bounding boxes with dimension priors and location prediction [18]
t x , t y , t w , and t h , is what the network outputs, Cx ,and Cy , and are the top-left coordinates of the grid, pw and ph are
anchors dimensions for the box. The operation of YOLO algorithm is illustrated in Figure 4.
Рh = 𝐞𝐞𝐭𝐭𝐭𝐭 (6)
161
Mais R. Kadhim et al. Engineering and Technology Journal 40 (01) (2022) 159-165
162
Mais R. Kadhim et al. Engineering and Technology Journal 40 (01) (2022) 159-165
Because each grid cell can only include two boxes and have one class, YOLO imposes severe spatial limits on bounding
box prognoses. Our model's ability to forecast the number of nearby items is limited by this geographic constraint. This model
has trouble with little things in groupings, such as flocks of birds. Our approach fails to generalize to new or uncommon aspect
ratios or configurations since it learns to predict bounding boxes from data. Because the design incorporates several down
sampling layers from the input picture, this model also uses fairly coarse characteristics for divining bounding boxes. Finally,
the training procedure is based on a loss function that sacrifices detection performance, handling errors as valet in small
bounding boxes versus large bounding boxes. A minor error in a large box is usually unnoticeable, whereas a minor error in a
small box has a significantly bigger impact on IOU. Wrong localizations are the most common source of error for us.
163
Mais R. Kadhim et al. Engineering and Technology Journal 40 (01) (2022) 159-165
5. Conclusion
In this research, a research review using one of the deep learning algorithms specializing in the objects detection of was
reviewed and we hope that we have helped people who have vision problems to discover the objects surrounding them. This
model is based on training convolutional neural networks, which consist of several layers, each layer is specialized in a specific
work, and a COCO dataset has been used. The YOLO algorithm has been trained on these datasets, and we hope in the future
that this work will be combined with some developer and find the best results so that they can be applied in various fields and
address many problems and facilitate things in various fields.
Author contribution
All authors contributed equally to this work.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
164
Mais R. Kadhim et al. Engineering and Technology Journal 40 (01) (2022) 159-165
Conflicts of interest
The authors declare that there is no conflict of interest.
References
[1] Global Data on Visual Impairments 2010. Available online: ,accessed, 23 ( 2017).
[2] A. Nada, M. Fakhr , A. Seddik, Assistive Infrared Sensor Based Smart Stick for Blind People, Proc. IEEE Tech. Spon. Sci.
Infor. Conf., (2015).
[3] A. Nada, S. Mashaly, M. Fakhr, A. Seddik, Effective Fast Response Smart Stick for Blind People, Sec. Int. Conf. Adv. Bio-
informatics, Envir. Eng., (2015) .
[4] P. B. L. Meijer, a Modular Synthetic Vision and Navigation. System for the Totally Blind. World Congress Proposals,
(2005).
[5] N.G. Bourbakis, D. Ravraki, Intelligent assistants for handicapped people's independence: case study, Proc. IEEE Int. Joint
Symp. Intell. Syst., (1996) 337-344. https://doi.org/10.1109/IJSIS.1996.565087
[6] N. Bourbakis , P. Kakumanu, Skin-based Face Detection-Extraction and Recognition of Facial Expressions, Appl. Pattern
.Recognit., 91 (2008) 3–27. https://doi.org/10.1007/978-3-540-76831-9_1
[7] D. Dakopoulos , N. Bourbakis, Preserving visual information in low resolution images during navigation of visually
impaired , Proc. Int. conf. PErvasive Technol. Relat. Assist. envir., (2008)1-6 . https://doi.org/10.1145/1389586.1389619
[8] N. Bourbakis, Sensing surrounding 3-D space for navigation of the blind - A prototype system featuring vibration arrays
and data fusion provides a near real-time feedback, IEEE Eng. Med. Biol. Mag., 27 (2008) 49-55.
https://doi.org/10.1109/MEMB.2007.901780
[9] V. S. Praderas, N. Ortigosa, L. Dunai , G. Peris Fajarnes, Cognitive Aid System for Blind People (CASBliP), Proc.of XXI
Ingehraf-XVII ADM Congress , 31 (2009).
[10] S. Sivaraman , M.M. Trivedi, Looking at vehicles on the road: a survey of vision-based vehicle detection, tracking, and
behavior analysis, IEEE trans. Intell. Transp. Syst., 14 (2013) 1773–1795. https://doi.org/10.1109/TITS.2013.2266661
[11] L. Shao, X. Zhen, D. Tao , X. Li, Spatio-temporal laplacian pyramid coding for action recognition, IEEE Trans. Cybern.,
44 (2014) 817–827. https://doi.org/10.1109/tcyb.2013.2273174
[12] W. Liu, X. Z. Wen, B. B. Duan , Rear vehicle detection and tracking for lane change assist, IEEE Intell.Vehicles Symp.,
(2007) 252–257. https://doi.org/10.1109/IVS.2007.4290123
[13] T. Liu, N. Zheng, L. Zhao , H. Cheng, Learning based symmetric features selection for vehicle detection, Proc. IEEE
Trans. Intell. Veh., (2005) 124–129. https://doi.org/10.1109/IVS.2005.1505089
[14] J. Cui, F. Liu, Z. Li, Z. Jia, Vehicle localization using a single camera, IEEE Intell. Veh. Symp., (2010) 871– 876.
https://doi.org/10.1109/IVS.2010.5548101
[15] G. Hinton, O. Vinyals, J. Dean, Distilling, the Knowledge in a Neural Network, Mach. Learn., (2015) 1503- 02531.
https://doi.org/10.48550/arXiv.1503.02531
[16] J.Redom, S.Farhadi, You Only Look Once: Unified, Real-time Objects Detection; 2016 IEEE Conf. Com. Vision .Patt.
Rec. (CVPR); Electronic ISSN: 1063-6919; Published on 12 December (2016).
[17] S .Ren, He. K. Girshick, R.B. Zhang, X.Sun, J. Object Detection Networks on Convolutional Feature Maps.
[18] IEEE Trans. Pattern Anal. Mach. Intell. (2017) 1476–1481.
[19] J. Redmon , A. Farha, Yolov3: An incremental improvement. AR Xiv, (2018).
[20] V. Ordonez, G,Kulkarni, T.Berg, Im2text: Describing images.
[21] T. Lin, M. Maire, S. Belondie, J. Hays, P. Perona, D. Ramanan P. Dollar, C. L. Zitnick, Microsoft COCO: Common
Objects in Context in ECCV, 2014.
[22] [Online]. https://Pjreddie.com/darknet/YOLO/ Accessed on 14 the July (2018).
[23] C. Arteta, V. Lempitsky, A. Zisserman. Counting in the wild. In ECCV, (2016).
165