0% found this document useful (0 votes)
47 views6 pages

Smart Glasses A Visual Assistant For The Blind

Uploaded by

Ajay Ks
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views6 pages

Smart Glasses A Visual Assistant For The Blind

Uploaded by

Ajay Ks
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2022 International Mobile and Embedded Technology Conference (MECON)

Smart Glasses: A Visual Assistant for the Blind


Simra Nazim Saba Firdous Swaroop Ramaswamy Pillai
Department of Electronics and Department of Computer Department of Electronics and
Telecommunication Engineering Science and Engineering Telecommunication Engineering
Amity University Dubai Amity University Dubai Amity University Dubai
2022 International Mobile and Embedded Technology Conference (MECON) | 978-1-6654-2020-4/22/$31.00 ©2022 IEEE | DOI: 10.1109/MECON53876.2022.9751975

Dubai, UAE Dubai, UAE Dubai, UAE


[email protected] [email protected] [email protected]

Vinod Kumar Shukla


Department of Engineering and
Architecture, Amity University, Dubai,
UAE
[email protected]

Abstract— Computer vision has helped systems gain high- recognition, a dataset of objects collected from everyday
level understanding in the field of image and video processing. scenes is generated. Object recognition is used to locate items
The Smart Glasses allows partially blind and partially sighted in the everyday scenes from an image of the environment,
individuals to identify and understand the workplace tools that such as motorcycles, couches, doorways, or desks, which are
surround them, which they can see through mini camera. Our common in blind scenarios. The camera is used to detect any
research aims to utilize the computer vision to detect objects objects based on their positions. The camera is capable of
using MS COCO dataset and trained a CNN (Convolutional recognizing faces and read out text to the blind. It also uses
Neural Network) model. It recognizes faces using deep learning
speech recognition in order to do simple tasks and help the
approach. It recognizes text using EAST (Efficient Accurate
user in their everyday life.
Scene Text Detector) and EASYOCR models and gives output
using Festival Speech synthesis. The glasses are provided by The proposed blind method aims to improve people's
Ultrasonic sensor which is used to measure the required distance chances of participating fully after losing their eyesight. The
between the users and object to avoid obstacle. The Smart main aim of our research is to propose and build visually
glasses start detecting using the wake word “UP” which is impaired glass based actual object detection. The system also
trained using CNN and TensorFlow and Vosk speech includes face recognition where it recognizes faces of people
recognition module for simple commands. The system is a nearby with which it has been trained. Once a known face is
complete visual assistance for the blind.
recognized it says out the name, otherwise “unknown face”.
Keywords— computer vision, machine learning, object
The system is able to recognize text from printed documents
detection, face recognition, text-to-speech OCR, sensor, speech using OCR (optical character recognition) technology. The
recognition. system uses festival text to speech drive to converts text from
speech and read out to the user. The system will read out the
I. INTRODUCTION distance of objects so that the blind can avoid obstacles. The
system works on word commands that it is trained with.
The human eye is indeed the organ that gives humans II. LITERATURE REVIEW
sight, which allows us to observe and comprehend more of our
surroundings than any other sense. People use their eyes in People with these kinds of visual impairments have the
almost every activity, including studying, commuting, same right to opportunities as everyone else. Daily life
watching tv, writing letters, driving a vehicle, and many activities would be difficult to carry out without
others. The ability to see or visualize is an individual's most communication. Communication is essential for getting a job,
valuable gift. The only way for a person to convey a message expressing emotions, teaching, and building relationships,
or share a thought or idea is through communication. among other things. People with certain disabilities are treated
However, some people are unfortunate enough to be denied in an unusual way by society, either unintentionally or on
this opportunity. There are roughly 285 million visually purpose. As a result, it is equally important that people with
impaired persons in the world today, with 39 million being certain disabilities be provided with such devices that allow
blind and 246 having limited vision. Approximately 80% of them to perform all of these functions normally on a daily
visually impaired people need to work to make a living, with basis.
the remainder being elderly or retired. The number of people
visually impacted by eye-related problems has declined over Vocal vision for visually disabled people: Vocal Vision
the previous 20 years, according to world estimates. India is is a technology developed for people who are visually
known for having the world's largest blind population [1]. The impaired. It's been seen from a long time ago that blind people
main objectives of the Smart Glasses assistant are to face a lot of obstacles in their company. This research is a
encourage a large challenge in computer vision, such as the sensory replacement device aimed to help blind people. Its
identification on a regular basis of objects from blind people's working concept is based on the conversion of 'picture to
practice of enclosing items. The camera placed on the jacket sound. The image in front of a blind individual is captured by
of the blind person is a largescale object detection, the viewing sensor. After that, the image is passed into
segmentation, MS COCO is used to provide the necessary MATrix LABoratory (MATLAB) for process. Method intuit
information about the external area. To apply the necessary examines the collected picture and improves key visualization

978-1-6654-2020-4/22/$31.00 ©2022 IEEE 621


Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on October 22,2024 at 09:44:14 UTC from IEEE Xplore. Restrictions apply.
2022 International Mobile and Embedded Technology Conference (MECON)

data. The microcontroller's database is then compared to the surrounding pixels. The Gabor filter methods gives spatial
produced image. The data is then converted into a structured frequency characteristics and localization. The face
way of an auditory signal and sent via earphones to a blind recognition results for PCA is 71.15 %, LDA 77.9%, LBP
receiver. The colour of the thing is determined using colour 82.94% and Gabor 92.35% [5].
data from the objects under investigation. The colour output is
recorded and sent via headphones to the blind person [2]. An Automatic Number Plate Recognition System using
OpenCV and Tesseract OCR Engine: The paper proposes a
Approach to Real-Time Objects Identification to Help method to use OpenCV, feature detection and edge detection
Blind People: In this paper the machine vision, such as for location number plates and further uses tesseract OCR
navigation and direction finding, blind assistance adds engine to identify the characters detected. The system defines
complexity. Throughout this paper, multi cameras are put with three stages: plate detection which is a very important stage as
free GPS and an ultrasonic sensor on blind people's spectacles. if it fails to detect the whole system fails. This detection
Local environment knowledge is necessary. A collection of includes detecting using features like shape, color , height and
items collected from daily scenes is built in order to apply the width and depends on the lighting and visibility. The
necessary recognition. Object recognition is a technique to experiment is done on Ghanaian plate numbers and are trained
for two types: long plate and square plates. The image
locate items from a real-world environment image in a blind
preprocessing involves gray scaling of image and removal of
scenario such as people, motorcycles, tables, doors or desks.
noise to locate the plate in the image. All areas that contain in
Both cameras establish depth by setting up a scene differential the images are considered as candidates using edge detection
charts, utilize GPS for the creation of clusters of objects from and template matching. Edge detection uses techniques to
place to place and utilize the sensor to identify impediments enhance and identify the image edges. Sobel Feldman kernels
over medium to long distances. The method description of the are applied to blur images, producing vertical and horizontal
Speed-Up Robust Characteristics is developed for recognition edge images. The objects are located in the images using
[3]. connected component analysis. Size filter to used on the
Vision Based Assistive System for Label Detection with images to take out too large or too small to be a number plate.
Voice Output : In the paper “Vision Based Assistive System Aspect ratio filter to select rectangles that correspond to the
for Label Detection with Voice Output”, a camera based specified plates. In template matching involves trained
advanced word recognition strategy is developed to help blind classifiers to detect features. 303 rectangular plate and359
people read labels and products from handheld objects in their square plates images are used for training. The system had a
daily residences. To isolate the object from cluttered successful recognition with 60 percent accuracy with rate of
backgrounds or other surrounding objects in camera view, by 0.2s and requires further training for better results [6].
asking the user to shake the object, we propose an efficient
and effective motion-based method to define a region of Deep Learning Approaches for Understanding Simple
interest (ROI) in the video. For the acquisition of text Speech Commands: The paper describes methods to
information, text localization and recognition are conducted in understand simple speech commands and recognize sounds
the extracted ROI. To automatically locate the text regions that were applied in the TensorFlow Speech Recognition
from the ROI object, we propose a new text localization Challenge organized by Brain team. The training dataset used
algorithm by learning gradient characteristics of stroke includes 60k audios which are labelled. The labels identified
orientations and edge pixel distributions in an Adaboost were yes, no, down, up, left, right, on ,off, stop, go and
model. Off-the-shelf optical character recognition software everything else detected is considered unknown. The file
then binarizes and recognizes text characters in the localized names are given by first element which is the speaker’s name
text regions. The recognized text codes are output in speech and second element which is the repeated commands. Short
for blind users [4]. time Fourier transform is used for joint time frequency
analysis. ResNet34/Resnet50 is used in the case of 1D C
Image-based Face Detection and Recognition: “State convolutional neural network and log spectrograms and mel
of the Art”: Face detection and recognition in video is a power spectrograms in case of 2D. 4old cross validation is
difficult task. The paper evaluates different detection and used for training the models and the data is separated between
recognition methods and provides an image-based methods the folds using the voice ID. VGG-16 , 1D gives a validation
have high accuracy and better response rate. The system accuracy 93.4 %, ResNet34, 1D gives 96.4% and
developed provides face detection and recognition in video for ResNet50,1D 96.6% for resolutions of 1x16384 [7].
surveillance application. The face detection algorithm utilizes
the AdaBoost classifier with Haar and Local Binary Pattern III. THE PROPOSED SYSTEM
features and SVM classifier is used with Histogram of This research supports people with visual impairments and
Oriented Gradients features. The Haar features make use of the idea of glasses is to help perform various tasks. The
representations of new images which generates large set of Raspberry Pi package is used for the simulation and coding of
features using AdaBoost. 3x3 neighborhoods of each pixel of image processing combined with machine learning. The main
an image are threshold by the LBP operator to label each pixel. focus is on object detection, face and text recognition in the
The LBP operator detects the micro patterns of each face system to encourage the blind. Detecting items from an image
image. The SVM classifier is used for HOG which is difficult due to the presence of various items in the
outperforms wavelets and degree of smoothing. 5 different environment, which include non-rigid picture generation and
datasets are used for experiment and the mean results for face severe deviation of the shapes.
detection experiments for Haar is 96.70%, LBP 89.3 % and
SVM 90.88%. For the face recognition experiment, 4 different
Our research aims to:
methods are used. LDA is used to reduce number of features
before recognition. LBP is an order set of binary comparisons
of pixel intensities between the center pixel and its eight

622
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on October 22,2024 at 09:44:14 UTC from IEEE Xplore. Restrictions apply.
2022 International Mobile and Embedded Technology Conference (MECON)

x Design and develop smart glasses system that online web pages, can be read aloud. There are numerous tools
visually disabled individuals may comfortably wear, and programs available to convert text to speech.
with an emphasis on cost-effectiveness.
x Show the feasibility of the image definition audio
processing techniques as a tool to help and provide
visually impaired people with greater independence
and use the state-of-the-art technology in the field of
machine learning.
Fig 3. Text to Speech
The main objectives of Smart Glasses:
We have used the Festivalspeech synthesis system [10]
Object Recognition – Object recognition (Fig 1) is a which offers a full text to speech system with various APIs
technology vision technology used to recognize items in and supports multiple languages.
photos or movies. Deep learning and machine learning
algorithms provide significant amounts of object Optical Character Recognition: OCR (Fig 4) is the
recognition[8]. optical character sensor which is an electronic conversion of
text from images or handwritten to text readable by a machine.
We have made use of two models for the detection and
recognition: EAST and EASYOCR. For the text detection in
the live camera, we used the EAST text detection model [11].
It is an OpenCV’s model of Text detection which is based on
deep learning which sensor a novel architecture and training
pattern.
Fig 1. Object Detection

We have used the MS COCO names dataset for the


training of object recognition model. Principally this COCO
names have the different names of the objects that will be used
for detecting the objects in the surrounding. We have utilized
SSD mobile net file and inference graph where the Mobilenet-
SSD model is a Single-Shot Multibox Detection (SSD)
network that is used to detect objects. The Caffe* framework
is used to implement this model. Freezing is the process of
identifying and saving all essential items (graphs, weights, and
so on) in a single, immediately accessible file
Fig 4. OCR (Optical Character Reader)
Face detection and recognition: Face detection and
recognition is a process of identifying human faces using the Once the text is detected, the system says “text detected”
dataset that it I rained with. Our research uses Pi camera on and if the user says “yes”, the camera captures an image of the
the raspberry pi to detect faces and recognize them from the text and uses EASYOCR model to recognize the text. The
database. It recognized faces in photos and live videos while system is able to work in real time and provides one of the best
the camera is on. It uses OpenCV and Python as its base for accuracies at a great pace.
the coding and deep learning model for the robust detection
and recognition. The face detection (Fig 2) uses deep learning Measure distance of objects using ultrasonic sensor
to detect the location of the face in the view the camera automatically: Ultrasonic sensor (Fig 5) use air to calculate
captures [9]. The model used quantifies a face in the image the distance to, or the presence of, a target object or material
using the Open-face package which is an implementation of without having to contact it [12]. This device is used to
python and torch. calculate the distance between two points. It can identify
objects within a 2cm – 450cm range and alerts the userfor
obstacles.

Fig 2. Face detection

Text-to-speech- Read objects from image Fig 5. Ultrasonic Sensor


automatically: Text-to-speech (TTS) technology (Fig 3) Speech recognition: The system uses speech recognition-
reads computer text vocally. It can transform text on based commands (Fig 6) to work for simple tasks. The system
computers, smartphones, and tablets into audio. Additionally, recognizes words like “yes” “no” and so on[13]. We have
any types of text files, including Word, Pages documents, and trained a model based on CNN and TensorFlow for the word

623
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on October 22,2024 at 09:44:14 UTC from IEEE Xplore. Restrictions apply.
2022 International Mobile and Embedded Technology Conference (MECON)

“up” to create a wake word for the system to start the detection As a result, the sensor module will have estimated the item's
by turning on the camera. distance at this point. It detects objects within a range of 2-
400 cm.
Earphones – We use normal wired earphones which is
attached to the blind person’s ear and helps them to hear the
output given by the raspberry pi.

Activity Diagram of the proposed system (Fig 8):

Fig 6. Speech Recognition

Once the smart glasses are switched on, it keeps listening


and once the word “up” is heard, it turns on the camera for
detection. It is based on machine learning as ML has made
speech recognition training a much simpler task. Next use of
speech recognition is using the Vosk speech recognition
module to take in simple inputs from user such as yes, no,
“what is the time”, “current weather”, “today’s date” etc., to
help user in simple tasks.
IV. SYSTEM DESIGN
Fig 8. Activity Diagram
Block Diagram of the proposed system (Fig 7):
x When the power is on, the camera is switched on
automatically.
x The mic is used to give speech command to start
detecting using the camera.
x If the camera captures an image, then it starts
extracting the information from the captured image.
x After extracting the information, if it contains a face,
then it checks with the data to recognize the name of
Fig 7. Block Diagram
the person. If it contains a random object, then it
Raspberry Pi 3 – Raspberry Pi is the size of the credit checks with the dataset to assign the name of the
card computer. It is necessary to connect and install a object.
keyboard, mouse, monitor, power and SD card. Raspberry Pi x If it contains text, it takes a picture is captured and
is an inexpensive embedded device capable of carrying out the text is recognized automatically.
many vital tasks. x The final output is given as a sound output through
Raspberry Pi Camera – The Raspberry Pi Camera v2 is the voice in the earphones.
a custom-built Raspberry Pi add-on board featuring an 8- x If the camera does not capture any image, then it will
megapixel Sony IMX219 image sensor and a fixed focus lens. continue searching for images.
Glasses – The glasses used are just normal sunglasses used x If it finds an image, it follows step 2. If it does not
which have no tech in it. find any image, it sends the voice output as Object
Not Detected.
Mic - The mic will be used to give speech commands to
the raspberry pi to start OCR program in case the blind has to x The ultrasonic sensor starts detecting for obstacles.
read long texts or books. This is done by speech processing x If objects and people are detected within the range,
where keywords will be given to the system which will start it alerts user by voice output
the text recognition process and convert it to speech which
will then be heard through the earphones. V. RESULT AND DISCUSSIONS
Object/Face/Text Processing – It is the code written in
the python in which uses the dataset to process the images Object Detection: The real time object detection using a
captured by the mini camera. It takes in the image as input and pre- trained inception model. This module detects objects
analyses the images for the recognition of object or faces. which are present in the pre-trained model using the pi camera
After the recognition of the image, it gives out the name of the and gives speech as output from the speaker. For instance, if
object/face as the output. an object say book is detected, the system will give an output
saying ‘I see a book’. The figures below (Fig 9) are few
Obstacle Detection – Using ultrasonic waves, ultrasonic demonstrations using 4 different objects- book, water bottle
sensors are used to measure distances. Ultrasonic sensors emit and a mobile.
ultrasonic pulses and detect the waves that are reflected back.

624
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on October 22,2024 at 09:44:14 UTC from IEEE Xplore. Restrictions apply.
2022 International Mobile and Embedded Technology Conference (MECON)

Fig 12. Ultrasonic Sensor Detection


Fig 9. Object Detection
Speech recognition: The mic takes in speech commands
Face Recognition: The real time face recognition using a whenever required and recognizes it to do the tasks. The word
pre- trained inception dataset. This module detects faces “up’ is recognized by the system which turns on the camera to
which are present in the pre-trained dataset using the pi start the detection. The system is always listening to the word
camera and gives speech as output from the speaker. For ‘up’ to start its process. Final Prototype is represented in
instance, if a face is detected then it says name of the face is following image (Fig 13).
detected, the system will give an output saying ‘I see a face’
(person name). The figures below (Fig 10) are few
demonstrations using two different faces with their names.
The recognition model shows highly accurate results as long
as good lighting is provided.

Fig 13. Final Prototype

Fig 10. Face Recognition VI. IMPORTANT IMPROVEMENTS MADE IN THIS RESEARCH
OCR: The real time text detection using the text Our research has utilized the advanced technology in the
recognition model. This module detects text which are present field of computer vision and machine learning attempting to
using the pi camera and gives speech as output from the achieve highly accurate results in image and video processing.
speaker. For instance, if text is detected then it says text is We have used the MS COCO dataset which is a broad set of
detected, when the user says “yes”, the system will read out classes such as motorcycles, couches, doorways, or desks,
the text recognized. The figures below (Fig 11) are which are common in blind scenarios for different object
demonstrations the image captured when text is detected and identification and trained a CNN model using it. Our system
then recognized. The text recognized is printed out in the code uses deep learning approach for face recognition along with
and simultaneously read out to the user on spot using Festival OpenCV with libraries called scikit-learn is able to detect
speech synthesis. The recognition is highly accurate when the faces and create 128-d embeddings of the face to quantify it.
text is held at the right position clearly. It then trained the SVM, support vector machine on the
embeddings created and then recognize those faces in real
time and in images. The Festival Speech Synthesis System is
a multi-lingual speech synthesis system which is used for text-
to-speech. Utilizing this module allows user to customize the
device as the TTS model is available in many languages. Our
system uses EASYOCR which is a simple python package
implemented using PyTorch and python for text recognition.
It supports recognition in about 80 languages. The languages
required can be specified in default or in the arguments and
once the model is loaded it recognizes the text in about few
Fig 11. OCR View seconds. Speech recognition is based on TensorFlow and
CNN. The sensor helps to avoid obstacles on the way which
Obstacle detection: Ultrasonic sensors are designed using is very important for the blind. Our research combines all
ultrasonic waves to measure the distance. The following these technologies to give an overall visual assistance to the
figure (Fig 12) shows the distance at which the laptop is blind which can be customized according the user’s needs.
detected and read it out to the user if any obstacle is
encountered.
VII. FUTURE WORK

Our system uses raspberry pi and python in order to


achieve its goals. Many of the its scope was achieved

625
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on October 22,2024 at 09:44:14 UTC from IEEE Xplore. Restrictions apply.
2022 International Mobile and Embedded Technology Conference (MECON)

including object detection, face recognition, text-to-speech, REFERENCES


speech-to text, text detection and recognition, and obstacle
detection. But every system comes with its drawbacks that can [1] V. K. Shukla and A. Verma, "Enhancing User Navigation Experience,
be solved by using better technology in order to increase the Object identification and Surface Depth Detection for "Low Vision"
with Proposed Electronic Cane," 2019 Advances in Science and
scope of the device. Engineering Technology International Conferences (ASET), 2019, pp.
A better model of OCR for different languages can be 1-5, doi: 10.1109/ICASET.2019.8714213.
applied to give multi language advantage to the user. For [2] Liu, Y.J., Wang, Z.Q., Song, L.P. and Mu, G.G., 2005. An anatomically
accurate eye model with a shell-structure lens. Optik, 116(6), pp.241-
object detection, using deep learning methods can improve the 246.
recognition. A better dataset for objects can be used to [3] Tepelea, L., Buciu, I., Grava, C., Gavrilut, I. and Gacsádi, A., 2019,
increase the scope by including more objects to recognize. For June. A Vision Module for Visually Impaired People by Using
face recognition, a dataset of known faces can be created in Raspberry PI Platform. In 2019 15th International Conference on
order to be used in a deep learning model which would Engineering of Modern Electric Systems (EMES) (pp. 209-212). IEEE.
improves its accuracy. Another method to Avoiding Obstacles [4] Suchetha, M., Vision Based Assistive System For Label And Object
by Using Auto-adaptive Thresholding. Mobile application to Detection With Voice Output.
keep track of locations, and emergency dial ups using speech [5] Ahmad, F., Najam, A. and Ahmed, Z., 2013. Image-based face
commands, saving texts that it read, etc., adding a GPS detection and recognition:" state of the art". arXiv preprint
arXiv:1302.6379.
location tracker and color identification can be added to the
[6] Agbemenu, A.S., Yankey, J. and Addo, E.O., 2018. An automatic
system which will be helpful for the blind. For the current number plate recognition system using opencv and tesseract ocr engine.
scenario Face recognition with mask, which we have avoided International Journal of Computer Applications, 180(43), pp.1-5.
in the scope as face recognition with the current tech reduces [7] Solovyev, R.A., Vakhrushev, M., Radionov, A., Romanova, I.I.,
accuracy by 50 percent, so once it is accurate it can be added Amerikanov, A.A., Aliev, V. and Shvets, A.A., 2020, April. Deep
to this system. learning approaches for understanding simple speech commands. In
2020 IEEE 40th International Conference on Electronics and
Acknowledging the issues, the visually impaired face, Nanotechnology (ELNANO) (pp. 688-693). IEEE.
developing a visual assistance for the blind has a lot of scope [8] Miah, M.R. and Hussain, M.S., 2018, November. A Unique Smart Eye
in the upcoming years giving the blind to feel the same as the Glass for Visually Impaired People. In 2018 International Conference
rest of the people [14]. on Advancement in Electrical and Electronic Engineering (ICAEEE)
(pp. 1-4). IEEE.
[9] Kalas, M.S., 2014. Real time face detection and tracking using opencv.
VIII. CONCLUSION international journal of soft computing and Artificial Intelligence, 2(1),
pp.41-44.
[10] Taylor, P., Black, A.W. and Caley, R., 1998. The architecture of the
People who are visually impaired are either completely festival speech synthesis system. In The Third ESCA/COCOSDA
blind or have very low vision that is legally defined as Workshop (ETRW) on Speech Synthesis.
blindness [15]. The number of visually impaired people has [11] Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W. and Liang, J.,
risen in recent decades, and the challenges they face in daily 2017. East: an efficient and accurate scene text detector. In Proceedings
life are becoming increasingly severe as a result of new of the IEEE conference on Computer Vision and Pattern Recognition
technologies, buildings, and population growth, among other (pp. 5551-5560).
factors. The proposed system is cost effective, portable and [12] J. Bai, S. Lian, Z. Liu, K. Wang, and D. Liu, “Smart guiding glasses
for visually impaired people in indoor environment,” IEEE
can help visually impaired individuals to easily use the Transactions on Consumer Electronics, vol. 63, no. 3, pp. 258–266,
product easily. The system interacts with the user via its 2017.
speech inputs and audio outputs, demonstrating the concept's [13] Warden, P., 2018. Speech commands: A dataset for limited-vocabulary
feasibility. Image processing techniques show promising speech recognition. arXiv preprint arXiv:1804.03209.
preliminary results that can be improved by implementing [14] Hassan, E.A. and Tang, T.B., 2016, July. Smart glasses for the visually
more optimized procedures. The proposed system makes impaired people. In International Conference on Computers Helping
room for a lot of scope by using the state-of-the-art People with Special Needs (pp. 579-582). Springer, Cham.
technology. The system can be customized as per the user's [15] Kim, J.H., Kim, S.K., Lee, T.M., Lim, Y.J. and Lim, J., 2019, October.
needs. It helps the blind to do everyday task like everyone else. Smart Glasses using Deep Learning and Stereo Camera. In 2019 IEEE
8th Global Conference on Consumer Electronics (GCCE) (pp. 294-
295). IEEE.

626
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on October 22,2024 at 09:44:14 UTC from IEEE Xplore. Restrictions apply.

You might also like