0% found this document useful (0 votes)
3 views

Eye-To-Text Communication Based on Human-Computer Interface Method

The document presents a system for eye-to-text communication aimed at assisting individuals with speech impairments or movement disabilities, utilizing a webcam, Raspberry Pi 3, and display screen to translate eye movements into text in real-time. The system employs various libraries such as OpenCV2 and Dlib for face and eye detection, achieving an accuracy of 93.7% in real-time eye tracking. It allows users to communicate by blinking or moving their eyes, effectively enabling hands-free computer interaction.

Uploaded by

sooraj.mekkunnel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Eye-To-Text Communication Based on Human-Computer Interface Method

The document presents a system for eye-to-text communication aimed at assisting individuals with speech impairments or movement disabilities, utilizing a webcam, Raspberry Pi 3, and display screen to translate eye movements into text in real-time. The system employs various libraries such as OpenCV2 and Dlib for face and eye detection, achieving an accuracy of 93.7% in real-time eye tracking. It allows users to communicate by blinking or moving their eyes, effectively enabling hands-free computer interaction.

Uploaded by

sooraj.mekkunnel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2023 IEEE 13th International Conference on System Engineering and Technology (ICSET), 2 October 2023, Shah Alam, Malaysia

Eye-to-Text Communication Based on Human-


Computer Interface Method
2023 IEEE 13th International Conference on System Engineering and Technology (ICSET) | 979-8-3503-4089-1/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICSET59111.2023.10295157

Mohammed Riyadh Abbas, Ammar Hussein Mutlag, Sadik Kamel Gharghan


Middle Technical University, Electrical Engineering Technical College-Baghdad, Iraq
[email protected] (M. R. A); [email protected] (A. H. M.); [email protected] (S.K.G.)

Abstract‫ـــ‬Eye-to-text communication is a technology that for overcoming the communication challenges faced by
has gained significant importance in recent years in the field of paralyzed people. They retain their mental faculties but
human-computer interaction (HCI), becoming increasingly cannot use their bodies in any way. Created an interactive
necessary for people with speech impairments or movement system that responds in real-time to eye blinks, allowing the
disabilities. Therefore, the webcam, Raspberry Pi 3, and paralyzed to communicate normally again. The device
display screen have been used to gather data from an eye's creates an alert signal in response to patient needs, like
movement in the left, right, top, bottom, and blink directions. asking for water or food; when the blink count reaches the
This research aims to design a system that converts eye-to-text maximum threshold, the system sounds an alarm and
communication to preview the typing on the display screen in
transmits an audio message. Applying a Haar Cascade
real-time, which is low-cost and non-invasive. The system
utilizes OpenCV2, Dlib, Numpy, and Pandas libraries for data
Classifier for face and eye detection enables real-time blink
collection from a webcam and enables the extraction of values detection. Then, the Euclidean distance between the eyes
from user eye movements on the Excel sheet. The system determines the eye-aspect ratio (EAR). We can determine
suggested that two stages are needed to calculate the ratio. how often a patient blinks in each frame with reliable eye
First stage: To detect eye blinking by the Dlib library, detect detection and face tracking. However, the system is
facial landmarks in the region of the eye, and then calculate the positioned in a well-lit area Srinivas et al. [7] for various
eye aspect ratio (EAR) between these landmarks. Second stage: lighting settings.
By using the OpenCV2 library to convert the image into Presented a system that allows people who are paralyzed
grayscale format, creating a black mask to detect the region of or have disabilities to use a computer without using their
the eye only, calculating all white pixels on the left side and hands, such as by utilizing a virtual keyboard or mouse. The
right side, and dividing the left pixels by the right pixels to get system captures facial expressions using a webcam as its
the ratio of pixels to select toward the eyes. According to our primary input system, especially eye and mouth movements,
experience, the algorithms produced satisfying results when to control the virtual keyboard and mouse. The technique
tested with an accuracy of 93.7% in real-time, including the used a Haar classifier to detect and extract the eye and facial
Dlib algorithm with an accuracy of 94.5% and the OpenCV2 regions. The system enables the user to scroll in various
algorithm with an accuracy of 92.9%. directions with mouse movement and to type on a virtual
keyboard by selecting the desired keys through mouth
Keywords‫ـــ‬Eye-to-text communication, landmarks, movement without the need for any additional assistance
OpenCV2 library, Dlib library, ratio, pixels from someone. The results demonstrated that it integrates all
types of user input and improves cursor utilization across
I. INTRODUCTION various contexts. Bharath et al. [8].
The development of eye-to-text communication devices Presented an eye-tracking system for driver fatigue and
has become a helpful resource for people with many drowsiness detection. Because fatigue and distraction are
impairments. These systems use computer vision (CV) to significant causes of traffic accidents, the suggested device
monitor eye movement and translate it into text, allowing detects closed eyes, indicates tiredness, and alarms the driver
users a more natural and efficient means of communication to prevent accidents. The system regularly captures and
[1]. While OpenCV2 provides various tools for analyzing analyzes eye pictures, applies the Haar algorithm to
photos and videos, Dlib offers powerful algorithms for recognize the driver's eyes, and determines the best threshold
identifying facial landmarks, such as precise eye tracking. for accurate identification. The technology recognizes the
Integrating these libraries can create an effective eye- eyes, nose, lips, and eyebrows. The eye area is then measured
tracking system that mediates data transfer from gaze to vertically and horizontally using Euclidean distance. These
words [2]. Improvements to the system's algorithms allow measures calculate the Eye Aspect Ratio (EAR), which
for more accurate monitoring of the user's eyes and less lag indicates the driver's attention. Increased accuracy, such as
time [3-5]. adding an infrared video camera for nighttime or low-light
Furthermore, the system utilizes a webcam as its primary conditions and considering sunglasses-wearing drivers
data source, making it easily accessible and inexpensive. Băiașu et al. [9].
Observing eye movements is made possible by a webcam's
constant feed of live video of the user's face. The data is
then sent into eye tracking algorithms, which precisely II. MATERIALS AND METHODS
identify the user's gaze direction and focus. Once the data The effectiveness of the proposed system is evaluated by
has been retrieved, it may be converted into text, giving describing the hardware used and the acquired dataset in the
people a new means of expressing their thoughts and following sections.
communicating efficiently [6]. Suggested a novel approach

979-8-3503-4089-1/23/$31.00 ©2023 IEEE 90


Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 11,2025 at 06:57:42 UTC from IEEE Xplore. Restrictions apply.
2023 IEEE 13th International Conference on System Engineering and Technology (ICSET), 2 October 2023, Shah Alam, Malaysia

1. Proposed System

The suggested system consists of the components depicted


in Figure 1, which include a person, webcam, Raspberry
Pi3, and monitor. The camera is positioned on the person's
head to avoid any head motion and artifacts [10]. To
identify facial landmarks during a recording, the camera
should be between 20 and 60 centimetres away from an
individual's face before recording begins [11]. The optimal
distance, as determined by the test, was 30 centimetres. The
5-megapixel resolution camera has a sensor image of 1080p
30fps, 720p 60fps, and 640x480p 90fps Video [12]. The
steps involved in displaying the results on the screen are
converting the video into images (frames), processing the
acquired image data, and sending frames to the Raspberry
Pi3 by installing the OpenCV2, Dlib, Numpy, and Pandas
libraries [13].

Figure 2. Flowchart of the System.


Figure 1. Suggested system of eye-to-text communication.
2.1 Load capture a video
2. Proposed Algorithm Video captured using a webcam consists of 30 frames per
second, each creating the appearance of movement in the
The method is divided into several basic stages. At the start, subject's eyes. As shown below in Figure 3.
the software initializes to capture a video frame from the
webcam and employs the "While Loop" to load a frame in a
valid state, reloading it in case of an error. The system
recognizes the user's face in the video and converts it to
grayscale to ease facial landmark detection. The user's eye
areas are then pinpointed by face detection techniques using
OpenCV2 and Dlib libraries. The system then uses gaze
estimation methods to follow the user's eyes in real-time.
The ratio of blinks and gazes will be calculated from the
collected eye-tracking algorithms. If the eye aspect ratio
(EAR) is less than 0.2, then blinking was identified at that
time as the output of the dataset. The eye motion ratio will
be added to the dataset if it falls within the range [less than
0.2] and [greater than 5]. In the end, stopping the system or
reaching the end of the process causes it to cease tracking
eye movements and producing textual output. As shown in
Figure 2.

Figure 3. Capture video frames.

91
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 11,2025 at 06:57:42 UTC from IEEE Xplore. Restrictions apply.
2023 IEEE 13th International Conference on System Engineering and Technology (ICSET), 2 October 2023, Shah Alam, Malaysia

2.2 Face detection


Dlib and OpenCV2 are used in a "For Loop" to load each
frame from the video and detect face landmarks within the
image. Once a face or landmark is discovered, it will be
used to generate a new landmark inside the loop. The next
step is to grayscale a color frame, making it simple to spot
face features. Figure 4 shows facial landmarks in a grayscale
frame [14].

(a)

(a) (b)
Figure 4. Face detection (a) and (b) grayscale.

A digital image on a screen is presented through a matrix


of pixels. The image's pixels are represented as integer
values. Commencing with numerical values ranging from 0,
which represents black pixels, to 255, which represents
white pixels, generate a grayscale image. The RGB color
(b)
model is widely recognized as the most prevalent color
model. It should be noted that OpenCV2 tends to load color Figure 6. Landmarks (a) and detect facial landmarks (b).
images in reverse order, specifically, BGR [15], as shown
below in Figures 5.
2.3 Detect eye blinking based on webcam using Dlib
To identify eye blinking, it is necessary first to determine the
facial landmarks of the eyes and then compute the eye aspect
ratio (EAR) between the landmarks of the eyelids. The eye
can be described as a collection of six points, each with a
particular set of coordinates [18]. The width of an eye is the
distance by the horizontal line, which extends from point
(P1) to point (P4); the height of an eye is the distance by the
vertical line, which extends from point (P2) to point (P3) and
from point (P6) to point (P5). As the eye moves from an
open to a closed position, the length of the vertical line will
vary, while the horizontal line will remain constant. Blinking
is detected by comparing the lengths of these two lines and
identifying the difference. This ratio will remain relatively
stable when the eye is open but swiftly decrease after closure.
As shown below in Figure 7.

Figure 5. RGB (a) and BGR (b).


There are 68 fixed locations on the face that serve as
markers. If you know the indexes of these spots, you can use
the same technique to focus on a particular part of the face
(like the eyes, lips, eyebrows, nose, or ears). This article
focuses on the eyes as a research tool for selecting eye-gaze
monitoring [16, 17], as shown in Figure 6. (a) (b)
Figure 7. The eye-opening (a) and the eye-closing (b).

92
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 11,2025 at 06:57:42 UTC from IEEE Xplore. Restrictions apply.
2023 IEEE 13th International Conference on System Engineering and Technology (ICSET), 2 October 2023, Shah Alam, Malaysia

Measuring the distance between two points, horizontally indicated when the sclera covers the eye's left side while the
and vertically. The left eye's horizontal line length will be pupil and iris point in the opposite direction [19]. The sclera
the distance between points 36 and 39, and the left eye's covers the lower side, and on the other side, the pupil and iris
vertical line length will be the halfway point between points are on the top side, as shown in Figure 10.
37 and 38 and between points 40 and 41. As shown in
Figure 8, using the equations below (1) and (2), get the ratio
between two eyes.

‖𝐏𝟐−𝐏𝟔‖+‖𝐏𝟑−𝐏𝟓‖
𝐸𝐴𝑅 = (1)
2‖𝐏𝟏−𝐏𝟒‖

𝐸𝐴𝑅𝐿𝑒𝑓𝑡 𝑒𝑦𝑒 +𝐸𝐴𝑅𝑅𝑖𝑔ℎ𝑡 𝑒𝑦𝑒


𝑅𝑎𝑡𝑖𝑜 = (2)
2

Selecting landmarks for the eye region left: [36, 37, 38, 39,
40, 41] Also, with the eye region right, these landmarks:
[42, 43, 44, 45, 46, 47].

Figure 10. Movements eye gaze.

To calculate the ratio of pixels used to pick up the eye


gaze, sum all the white pixels on the left and all the white
pixels on the right, then divide the result by the sum of the
two numbers. This can be accomplished by converting the
image to grayscale and generating a black mask [20]. All
zeros are represented by black, and all other digits are
represented by white (one). To calculate the non-zero left
side threshold, and then the same thing for the white part: To
calculate the non-zero right side threshold, divide the left
white pixel by the right white pixel to get the ratio of eye
Figure 8. Eye region left and right. gaze tracking. Figures 11, 12, and 13 show equations (3) and
(4).

2.4 Detect eye-gaze tracking in video using OpenCV2


Identifying and analyzing the eye image consists of three
primary parts: The first part of each person's eyes is called
the pupil, which appears as a black circle in the centre eye.
The second part of the eye is called the iris, which refers to
the most extensive circle of the eye, which can be a variety
of colors depending on the individual. The final portion:
The sclera, because it is always white, can be used to track
the eyes to determine the amount of white retinal pixels, as
shown in Figure 9.
Figure 11. Black mask AND grayscale.

Figure 9. Eye primary parts. Figure 12. Threshold eye.


When looking left, the sclera covers the eye on the right Threshold eye = height, width
side, and the pupil and iris are located on the other side.
When the eye is in a straight line of sight, the sclera is Left side white or Topside white = No.of pixels
evenly distributed between the two eyes. Right-eye gazing is Right side white or Bottom side white = No.of pixels

93
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 11,2025 at 06:57:42 UTC from IEEE Xplore. Restrictions apply.
2023 IEEE 13th International Conference on System Engineering and Technology (ICSET), 2 October 2023, Shah Alam, Malaysia

𝑅𝑎𝑡𝑖𝑜 =
𝐿𝑒𝑓𝑡 𝑠𝑖𝑑𝑒 𝑤ℎ𝑖𝑡𝑒
(3) direction to the bottom has not occurred. However, if the
𝑅𝑖𝑔ℎ𝑡 𝑠𝑖𝑑𝑒 𝑤ℎ𝑖𝑡𝑒 EAR is less than the threshold of 0.20, an eye-blinking is
produced in the event of continual eyelid-closing when the
EAR is less than 0.17, and if the EAR is between 0.19 and
𝑇𝑜𝑝 𝑠𝑖𝑑𝑒 𝑤ℎ𝑖𝑡𝑒
𝑅𝑎𝑡𝑖𝑜 = (4) 0.18, an eye-direction signifies down. The results showed
𝐵𝑜𝑡𝑡𝑜𝑚 𝑠𝑖𝑑𝑒 𝑤ℎ𝑖𝑡𝑒
that the system achieved an accuracy of 94.5%.

Figure 15. Graph for eye blinking signals.


2. Eye gaze tracking using the OpenCV2 library
The graph showing eye-gaze tracking signals for the dataset
is displayed in Figure 16. The reading for the eye gaze
ranged from 0.4 to 5 for the ratio eye tracking computation
taken from the webcam's 14 video frames. This calculation is
Figure 13. No.of pixels (Height, Width). dependent on the number of white pixels in the sclera. When
the ratio of eye gaze is between a value of 0.4 and 0.7, the
3. Display screen-based graphical user interface (GUI) eye direction is on the right. However, if the ratio of eye gaze
is more than 1 and less than 3, a direction is not selected for
Indicate eye movements as left gaze (water), right gaze eye gaze (none), and if the eye gaze ratio is between a value
(food), top gaze (bathroom), bottom gaze (sleep), blink of 3 and 5, this indicates that an eye direction is on the left.
(pain), and centre gaze (None). Moreover, a preview on a Finally, if the ratio of eye gaze is less than 0.4, the eye
display screen employing a GUI assists the healthcare direction is on the top. The results showed that the system
provider with the requirements and needs of the patient. As achieved an accuracy of 92.9%.
demonstrated in Figure 14.

Figure 16. Graph for eye gaze tracking signals.


Data gathered by a camera created a graph showing the
relationship between the different eye movements directions
tested in the experiment. The frames are taken from 1400
samples with an accuracy of 93.7%, as shown in Figure 17.

Figure 14. Display screen graphical user interface (GUI).

III. RESULTS
1. Eye blinking using the Dlib library
The graph showing eye blinking signals for the dataset is
displayed in Figure 15. The reading for the eye blink varied
from 0.05 to 0.4 for the ratio (EAR) computation taken from
the webcam's 14 video frames. This variation is dependent
on the threshold that was chosen. When an EAR is greater
than the threshold value of 0.20, an eye-blinking or eye- Figure 17. Eye movement directions.

94
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 11,2025 at 06:57:42 UTC from IEEE Xplore. Restrictions apply.
2023 IEEE 13th International Conference on System Engineering and Technology (ICSET), 2 October 2023, Shah Alam, Malaysia

IV. CONCLUSION International Journal of Circuits, Systems and


People with speech or movement impairments can Signal Processing, vol. 15, pp. 1-7, 2021.
significantly benefit from eye-gaze tracking technology. In [10] A. J. Larrazabal, C. G. Cena, and C. E. Martínez,
this paper, we present the results of our investigation into a "Video-Oculography Eye Tracking Towards
proposed eye-tracking system with efficient eye-text Clinical Applications: A Review," Computers in
communication, which consists of a circuit successfully Biology And Medicine, vol. 108, pp. 57-66, 2019.
built between a webcam, Raspberry Pi3, and a display [11] Y. Cheng, H. Wang, Y. Bao, and F. Lu,
screen and which captures the user's face using the face and "Appearance-based Gaze Estimation with Deep
eye detection algorithms and libraries OpevCV2, Dlib, Learning: A Review and Benchmark," arXiv
Numpy, and Pandas. The OpevCV2 library's sclera pixel preprint arXiv:2104.12668, 2021.
count was used to determine the gazing ratio, while the Dlib [12] K. Dergachov, L. Krasnov, O. Cheliadin, and R.
library's facial feature points were used to determine the Kazatinskij, "Video Data Quality Improvement
blinking ratio. After that, the ratios were imported into the Methods and Tools Development for Mobile Vision
data set with the Pandas library, and information was Systems," Advanced Information Systems, vol. 4,
inserted from the right, left, top, bottom, and blinking of the no. 2, p. 2522-9052, 2020.
Excel sheet. The Python programming language was used to [13] D. Phayde, P. Shanbhag, and S. G. Bhagwath,
set up the experiment. This approach shows promise to
"Real-Time Drowsiness Diagnostic System Using
improve patient care, as the algorithm showed encouraging
Opencv Algorithm." International Journal of
outcomes in real-time. However, some challenges have been
overcome, such as illumination, artifacts, and the distance Trendy Research in Engineering and Technology,
between the subject's eye and the camera. vol. 6, no. 2, p. 2582-0958, 2022.
[14] S. Wang, H. Yin, and X. Wang, "Research on the
Improvement of LSB-based Image Steganography
Algorithm," Academic Journal of Science and
REFERENCES
Technology, vol. 5, no. 3, pp. 222-224, 2023.
[1] M. P. Paing, A. Juhong, and C. Pintavirooj, [15] U. H. Ghori and M. Kulkarni, "Face Recognition
"Design and Development of an Assistive System and Face Comparison with Opencv and Python."
Based on Eye Tracking," Electronics, vol. 11, no. International Research Journal of Modernization in
4, p. 535, 2022. Engineering Technology and Science, vol. 5, no. 5,
[2] K. R. M. K. Nizar and M. H. Jabbar, "Driver p. 2582-5208, 2023.
Drowsiness Detection with an Alarm System Using [16] M. Bodini, "A Review of Facial Landmark
a Webcam," Evolution in Electrical and Electronic Extraction in 2D Images and Videos Using Deep
Engineering, vol. 4, no. 1, pp. 87-96, 2023. Learning," Big Data and Cognitive Computing, vol.
[3] T. Wang, J. Wang, O. Cossairt, and F. Willomitzer, 3, no. 1, p. 14, 2019.
"Optimization-Based Eye Tracking Using [17] D. Aspandi, O. Martinez, F. Sukno, and X. Binefa,
Deflectometric Information," arXiv preprint "Composite Recurrent Network with Internal
arXiv:2303.04997, 2023. Denoising for Facial Alignment in Still and Video
[4] S. L. Matthews, A. Uribe-Quevedo, and A. Images in the Wild," Image and Vision Computing,
Theodorou, "Rendering Optimizations for Virtual vol. 111, p. 104189, 2021.
Reality Using Eye-Tracking," in 2020 22nd [18] C. Dewi, R.-C. Chen, X. Jiang, and H. Yu,
symposium on virtual and augmented reality "Adjusting Eye Aspect Ratio for Strong Eye Blink
(SVR), 2020: IEEE, pp. 398-405. Detection Based on Facial Landmarks," PeerJ
[5] J. Kim et al., "Nvgaze: An Anatomically-Informed Computer Science, vol. 8, p. e943, 2022.
Dataset for Low-Latency, Near-Eye Gaze [19] S. G. Amer, R. A. Ramadan, and M. A. Elshahed,
Estimation," in Proceedings of the 2019 CHI "Wheelchair Control System Based Eye Gaze,"
Conference on Human Factors in Computing International Journal of Advanced Computer
Systems, 2019, pp. 1-12. Science and Applications, vol. 12, no. 6, 2021.
[6] M. Mangla, A. Sayyad, N. Shama, S. N. Mohanty, [20] P. Chakraborty, D. Roy, M. Z. Rahman, and S.
and D. Singh, "An Artificial Intelligence and Rahman, "Eye Gaze Controlled Virtual Keyboard,"
Computer Vision Based EyeWriter," in Ambient International Journal of Recent Technology and
Intelligence in Health Care: Proceedings of Engineering, vol. 8, no. 4, pp. 3264-3269, 2019.
ICAIHC 2022: Springer, 2022, pp. 451-458.
[7] K. S. Srinivas, Y. Y. K. Reddy, and R.
MaruthaMuthu, "Multimedia Assistance for
Paralyzed Patients Using Eye Blink Detection."
International Journal of Research in Engineering
and Science, vol. 10, no. 5, PP. 80-84, 2022.
[8] M. R. R. Bharath, "Controlling Mouse and Virtual
Keyboard Using Eye-Tracking by Computer
Vision," Journal of Algebraic Statistics, vol. 13,
no. 3, pp. 3354-3368, 2022.
[9] A.-M. Băiașu and C. Dumitrescu, "Contributions to
Driver Fatigue Detection Based on Eye-Tracking,"

95
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on February 11,2025 at 06:57:42 UTC from IEEE Xplore. Restrictions apply.

You might also like