0% found this document useful (0 votes)
15 views

Hands-Free Mouse Control Using Facial Feature

Uploaded by

Akshat K
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Hands-Free Mouse Control Using Facial Feature

Uploaded by

Akshat K
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Hands-Free Mouse Control Using Facial Feature

Pradeep Kumar S Saket Agarwal Daniyal Mustafa


Department of Electronics and Department of Electronics and Department of Electronics and
2023 International Conference on Smart Systems for applications in Electrical Sciences (ICSSES) | 979-8-3503-4729-6/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICSSES58299.2023.10200228

Communication Communication Communication


Nitte Meenakshi Institute of Technology Nitte Meenakshi Institute of Technology Nitte Meenakshi Institute of Technology
Bengaluru, India Bengaluru, India Bengaluru, India
[email protected] [email protected] [email protected]

Divyansh Kumar Gautam Ranjan


Department of Electronics and Department of Electronics and
Communication Communication
Nitte Meenakshi Institute of Technology Nitte Meenakshi Institute of Technology
Bengaluru, India Bengaluru, India
[email protected] [email protected]

Abstract— Significant emphasis has been paid to gesture so that both disabled and able-bodied people can use it. We
recognition as a natural form of communication, particularly for broaden this strategy to make it appropriate for both disabled
the elderly and the cognitively handicapped. Due to its numerous and non-disabled individuals. The goal of classification of eye-
applications in virtual reality, the understanding of sign language, blinks into voluntary, reactive, and spontaneous forms is to
and computer games, hand gesture recognition is a key research develop a useful and applicable input technique.
subject in the science of human-computer interaction (HCI).
Lighting, hand motion fluctuations, and textures are a few of the Our study looks at the process for categorising eye-blink
drawbacks of image-based approaches that may reduce the input and adding it to our suggested system. Reflex blinks are
recognition's accuracy. This research investigates the use of brought on by outside forces, spontaneous blinks happen
camera input data to train models for facial key point unconsciously, and voluntary blinks are ones that are
identification and Touchless Head-Control (THC) as a touchless intentionally made. We offer a simple and effective input
method of cursor control to get around these constraints. method for everyone by incorporating these types of blinks into
our system.
Keywords - Gesture Recognition, Impaired individuals, Hand
Gesture Recognition, Human Computer Interaction, Facial Key This study proposes a new technique for automatically
Points, Cursor control. classifying various eye blink patterns and identifying deliberate
blinks. The technique uses an eye-blink integration value to
calculate a threshold, which enhances the measuring
I. INTRODUCTION environment's robustness and lessens the influence of individual
variances. The suggested technique may be used for dry eye
Our study suggests a brand-new method for engaging with detection, facial recognition system anti-spoofing, and
computers that makes use of human eye movements, notably the drowsiness detection.
series of blinks. Physically challenged people, especially those
without hands, can benefit from using this method instead of the The study also explores the use of the suggested approach in
usual mouse. Using OpenCV, our system locates the location of Human-Computer Interaction (HCI) approaches to help persons
the human eye's (pupil) and uses that location as a point of with physical limitations, such as Locked-in syndrome,
reference to move the pointer. quadriplegia, or those who have lost their arms, handle their
laptops. Overall, this research presents a technique that has
Our research expands on prior work on eye-blink input promise for enhancing the reliability and accuracy of eye-blink
systems, which were created to help people communicate who classification and detection, with possible applications in a
have severe motor limitations, such those who have ALS. number of different domains.
However, we hope to broaden the applicability of this strategy

Authorized licensed use limited to: Zhejiang University. Downloaded on October 16,2024 at 08:58:43 UTC from IEEE Xplore. Restrictions apply.
II. LITERATURE SURVEY The proposed method involves detecting facial features such as
According to earlier research, assistive technologies for persons eyes, nose, and mouth using Haar cascades and then segmenting
with disabilities can be created using alternate input modalities these components using distance and angle measurements [9],
like brow motions. The development of wearable devices for [10]. A real-time eye tracking algorithm developed by Siti
human motion monitoring and energy harvesting in healthcare Nuradlin Syahirah Sheikh Anwar, Azrina Abd Aziz, and Syed
applications has also been researched using TENG-based Hasan Adil utilizes image processing techniques to detect and
sensors[1]. A innovative method for creating a modular and track the eye in real-time, followed by calibration to estimate the
adaptable keyboard and mouse helping device for people with gaze point. It can be used in various applications, such as
physical limitations is suggested in the study by Hameed et al. human-computer interaction, medical diagnosis, and
(2022). The tool is designed to offer people with motor surveillance systems[12].
limitations who struggle with conventional keyboard and mouse III. METHODOLOGY
interfaces physical help. The project has the potential to advance
the area by providing a cutting-edge design technique for A. Convolutional neural networks (CNNs)
creating adaptable and modular assistive technology. This Convolutional neural networks (CNNs) have deep structures
strategy advances the field of research currently being done on that enable them to extract features from raw image pixels and
the creation of powerful assistive technology for those with generate multiple levels of abstract feature representations,
disabilities [2], [3] suggests a facial recognition system that finds which are useful for improving detection methods. In our project,
human faces in images by using the Python OpenCV tools. To we leveraged CNN models for real-time detection of facial key
recognise facial features and categorise photos, they used a points, which have undergone substantial advancements in
variety of image processing techniques, including image recent years.
thresholding, edge detection, and contour detection. Different
methods for feature extraction and classification have been To detect faces in the input from the webcam, we utilized the
proposed for facial identification, including Eigenface, Local Cascade Classifier object, and the identified face was
Binary Patterns, and Haar Cascades. normalized before being fed into the model. During the training
phase, the model was trained on normalized images resized to
Frazer K. Noble evaluates and compares the performance of 96x96 pixels to match the model input size. Following this, the
various feature detection and matching techniques available in model predicted the facial key points for the faces detected in
the OpenCV library. He provides an in-depth analysis of the the input data from the webcam.
performance of various detectors and matchers, including SIFT,
SURF, ORB, BRISK, AKAZE, and FLANN, and provides To detect faces in webcam input, we utilized the Cascade
insights into their suitability for different computer vision Classifier object and normalized the detected face before
applications [4], [5] proposes a gesture-based control system passing it to the model. The model was trained on normalized
utilizing a smart glove embedded with sensors to capture hand images that were resized to 96x96 pixels to match the model
gestures and convert them into computer commands. The input size. The model then predicted the facial key points for the
proposed system addresses mobility and dexterity limitations detected faces in the webcam input data.
faced by special-needs individuals in accessing computers, B. Means squared error (MSE)
providing a low-cost, efficient, and user-friendly alternative to
traditional input devices. Experimental evaluation demonstrates The CNN model we used for predicting facial key points
high accuracy and speed in recognizing gestures, potentially employs the mean squared error (MSE) as a loss function. The
contributing to the development of more accessible and MSE measures the average squared difference between
inclusive human-computer interaction systems[11]. predicted values and ground truth values. We evaluated all
models using the rectified linear unit (ReLU) activation function
Savina Colaco and Dong Seog Han proposes a method for and the accuracy metric. The learning rate for the models was
detecting facial key points using a convolutional neural network varied between 0.1 to 0.001. The simple CNN model had
(CNN) architecture. The authors tested their model on the 300W 218,300 total parameters, with one trainable parameter. We
dataset and achieved competitive results compared to other monitored the model loss across epochs during training.
state-of-the-art methods. Their proposed method demonstrated
robustness to variations in pose, illumination, and occlusion When the user was facing forward, the model was able to
[6],[7]. Suci Dwijayanti and colleagues proposed a method for generate an estimation of the facial key points on the face.
simultaneous facial expression recognition and face recognition However, the accuracy of the key points detected on the webcam
using a convolutional neural network (CNN). The authors tested input was found to be higher than that of the simple CNN model
their model on the JAFFE and CK+ datasets and achieved for head orientation. The model is capable of predicting key
competitive results compared to state-of-the-art methods. The points that are then mapped onto the webcam input data, and
proposed method demonstrated its robustness to variations in various models were compared based on different parameters
pose, illumination, and facial expressions [8]. such as learning rate, optimizer, loss function, and accuracy.

Facial components detection is a crucial task in computer C. Viola-Jones algorithm


vision and has various applications such as face recognition, The project incorporates Viola-Jones algorithm for face
facial expression recognition, and gaze tracking. Ankur Kumar, detection, which uses composite features to maintain a certain
K.M. Baalamurugan, and B. Balamurugan proposes a real-time degree of observability and ensure that the face recognition rate
system for facial components detection using Haar classifiers. is not affected. This algorithm efficiently detects the inner face

Authorized licensed use limited to: Zhejiang University. Downloaded on October 16,2024 at 08:58:43 UTC from IEEE Xplore. Restrictions apply.
of the rectangle in an image. Discriminant analysis is used to be computed by subtracting the sum of pixels above and to the
analyze the complex face features of the rectangle region, which left of the rectangle from the sum of pixels in the rectangle.
are then fed into the classifier for face recognition. The Viola-
Jones algorithm is known for its high detection speed, making it By precomputing the integral image, the pixel sums for any
suitable for real-time face detection. rectangular region can be quickly obtained. This allows for
efficient calculation of Haar features, even for large images with
The characteristics of this algorithm are: many possible feature locations.
 The feature value of the face image is extracted by The main process of this method is as follows:
the use of the integral map to ensure the speed at
1. Input an image and determine a rectangular frame of
which the feature is extracted, and secondly ensure
the human face by the ViolaJones algorithm.
the correct rate of face detection by using the
AdaBoost strong classifier. 2. After the faces inside the rectangular frame are
calibrated, they are processed into four types of sub-
 The traditional AdaBoost classifier is modified. In images.
the traditional version of AdaBoost, weak
classifiers are constructed using small decision 3. Use NLDA (Zero Space Linear Discriminant Analysis)
trees and then combined to form a strong classifier. to extract features from the obtained full-face image
This process is repeated multiple times to improve and four sub-images.
the overall detection rate of the classifier. By using
4. Evaluate the validity of all extracted features (global
a series of strong classifiers, the AdaBoost
and local features) by discriminant distance.
algorithm can achieve high accuracy in its
predictions. 5. Select feature regions with large distance from
discriminant values to form new composite feature
The main steps of the Viola-Jones algorithm are the following:
vectors, and then input them to classifier for face
1. Detection of face images by Haar features. recognition.
2. The calculation speed of Haar features is improved through
the integral graph. F. Real-Time Eye Tracking Algorithm
3. Training the face data set through the cascaded AdaBoost
classifier.
4. The trained detection classifier is used for the final face
image detection.
D. Haar Characteristics
The Viola-Jones algorithm utilizes four types of Haar
features: edge features, linear features, center features, and
diagonal features. Each Haar feature template contains two
regions, namely the black region and the white region, and the
difference value between the pixel intensities of these two
regions is calculated as the eigenvalue. For the features of center,
edge and diagonal type, the formula for calculating the
eigenvalue is as follows:
𝑉 = 𝑠𝑢𝑚𝑤ℎ𝑖𝑡𝑒 − 𝑠𝑢𝑚𝑏𝑙𝑎𝑐𝑘
However, for the linear features, the black and white regions are
of different sizes and shapes, making it impossible to directly
calculate the difference value. To overcome this, the number of
pixels in the black and white regions is kept the same by
adjusting the dimensions of the rectangular region. The
numerical formulas for calculating the characteristics are as
follows:
𝑣 = 𝑠𝑢𝑚𝑤ℎ𝑖𝑡𝑒 − 2𝑠𝑢𝑚𝑏𝑙𝑎𝑐𝑘
E. Integral graph Fig 1. Block Diagram of eye tracking algorithm
To efficiently calculate Haar features for an image, the
Viola-Jones algorithm uses an integral image. An integral image Our proposed eye tracking algorithm was developed in several
is a grayscale image where each pixel value corresponds to the stages, as shown in Figure 1. To evaluate the accuracy of the
sum of all pixels above and to the left of that pixel in the original algorithm, performance measurements were conducted after the
image. Using the integral image, the sum of all pixels in a development process was completed.
rectangular region of the original image can be quickly
calculated. The top-left corner of the rectangular region is used
as the starting point, and the sum of pixels in the rectangle can

Authorized licensed use limited to: Zhejiang University. Downloaded on October 16,2024 at 08:58:43 UTC from IEEE Xplore. Restrictions apply.
G. Face Detection J. Gaussian Blur Filtering
To locate the eyes of subjects in real-time processing, our To remove any unwanted noise in the grayscale image
eye tracking algorithm utilizes a face detection method that generated beforehand, such as shadows, eyelashes, and
relies on the Viola-Jones algorithm and Haar Cascade excessive lighting, the eye tracking algorithm employs the
classifiers. These classifiers utilize Haar wavelets to detect and Gaussian Blur Filter. This filter helps to smooth out the image
track lines and edges in the human face. The face detection and reduce high-frequency noise, thus enhancing the accuracy
process is essential to accurately recognize human faces, and to of the tracking process.
ensure the algorithm's effectiveness. However, it is important
K. Eye Aspect Ratio (EAR)
to note that the algorithm may fail to detect a human face if the
input video does not contain human facial features. Therefore, To detect eye blinks, the landmark shape detector is used to
the accuracy and effectiveness of the face detection step are extract the vertical lines that connect the upper and lower
crucial for the successful implementation of the eye tracking eyelids, as well as the horizontal lines that connect the
algorithm. outermost left and outermost right points of the eye (p), as
illustrated in Fig. 3. These lines are used to detect when the eye
H. Facial features detection is closed, and thus, when a blink has occurred.
In the eye tracking technique, we employ the face landmark
prediction method as shown in Fig. 2, which is based on a shape
predictor model, to identify circular objects. To discover and
identify 81 face feature landmarks in any input image, we
specifically employ the 81 facial landmarks shape predictor.
This shape predictor model must be downloaded individually
because it is not a part of the DLIB libraries that have already
been constructed. The eye blinking detection and eye
movement categorization modules depend heavily on the facial
landmark prediction procedure because it enables the algorithm
to precisely identify the face points for the Region of Interest
(ROI), which includes the outer parts of the eyes. The algorithm
can track eye movements and recognize blinks by precisely
locating the locations of these landmarks.. Fig 3. Calculation of Eye Aspect Ratio

The eye tracking algorithm uses the points and lines around
the eye area to calculate eye blinking. Specifically, eye blinking
is detected when the length of the horizontal line decreases to a
certain threshold. The eye blinking is calculated using the Eye
Aspect Ratio (EAR) which is described as :
𝐸𝐴𝑅 = 𝑉/𝐻
The Eye Aspect Ratio (EAR) used in the eye tracking algorithm
is calculated based on the average horizontal length (H) and
average vertical length (V) of both eyes. Specifically, the EAR
is calculated using the formula where V represents the average
Fig 2. Detection of facial features
vertical length of both eyes and H represents the average
I. Eye Detection, Grayscale Conversion, and Image Masking horizontal length of both eyes.
After detecting the subject's face using the Viola-Jones ||𝑝2 − 𝑝6|| + ||𝑝3 − 𝑝5||
algorithm and Haar Cascade classifiers, the eye tracking 𝐸𝐴𝑅 =
algorithm identifies the eyes by creating a region of interest
2||𝑝1 − 𝑝4||
The eye aspect ratio (EAR) is a useful metric for detecting eye
(ROI) based on the point location arrays around the surface of
blinks, as it remains relatively constant when the eye is open
the eye. To improve the accuracy of eye tracking, a dark black
and decreases gradually as the eye closes. The EAR is not
mask is created around the circumference of the eye area to
affected by variations in eye size or head pose, but may differ
enhance the image of the eyes and capture the ROI. Before
across individuals with different eye shapes.
creating the mask, the input video is converted into grayscale
In our algorithm, we classify an eye blink as occurring
frames using the RGB2GRAY function in the OpenCV library.
when the EAR blinking ratio exceeds a threshold value of 3.48.
This conversion simplifies the process of detecting lines and
However, individuals with slanted eyes may have smaller
edges in the input video. The black mask is then used to capture
horizontal line values, which could affect the accuracy of the
the ROI of the eye tracking algorithm, enabling more precise
EAR blinking ratio. To address this issue, our algorithm allows
tracking of the subject's eye movements.
for flexibility in adjusting the blinking ratio threshold to better
suit the individual's eye shape and improve accuracy.

Authorized licensed use limited to: Zhejiang University. Downloaded on October 16,2024 at 08:58:43 UTC from IEEE Xplore. Restrictions apply.
IV. EXPERIMENTAL RESULT
We have developed an innovative interface that allows users
to control their cursor movements using their facial expressions
and head movements. This Interface uses two main facial
features, namely Eye Aspect Ratio (EAR) and Mouth Aspect
Ratio (MAR), to track the movements of the user's eyes, mouth,
and head.
When the user runs the Python application through the
terminal, the camera opens automatically, and the application
begins detecting the user's eyes and mouth. Once the user opens
their mouth, the MAR algorithm is activated, and the application
starts taking inputs from the user's facial movements. The
interface is shown in the Fig. 4.

Fig 6. Cursor used to scroll up

V. CONCLUSION
Fig 4. Final developed simulation In summary, our proposed eye tracking algorithm can be
To track the movement of the user's head as shown in Fig. 5, further improved by using better trained models to enhance the
our application uses the user's nose as a reference point. When speed of the system. Additionally, the system can be made more
the user's nose moves away from its starting position, the dynamic by incorporating the user's head rotation to determine
application detects the direction of the movement, whether it's the rate at which the cursor position changes. Future research
left, right, up, or down. The cursor on the screen moves in the can also focus on improving the accuracy of the Eye Aspect
same direction as the user's nose movement. However, the input Ratio (EAR) calculations used in the algorithm, which can be
is only considered when the user's nose moves outside the green achieved by modifying the formulae for the aspect ratios.
box, which is designed to handle errors in case of accidental or Furthermore, to simplify the detection of the face, some image
unintentional movements. processing techniques can be used to preprocess the input video
before the model detects and analyzes the facial features.
Eye blink and facial feature estimation is an intuitive
and convenient input method which has many applications in
real life and academic research. In this research work, a robust
eye gaze estimation system which has been implemented has
several benefits over the existing solutions. The proposed
system achieves 99% validation accuracy using dataset created
with low-cost web cam, whereas many of the existing solutions
require a high-cost wearable device to achieve such high
accuracy. Moreover, the prediction is in real time for the gaze
class output which provides smooth gaze output. The multiclass
prediction approach reduced the computational complexity of
Fig 5. Cursor moving towards right the system and enabled the system to work in real-time.
The interface that we developed also includes features for
left-clicking and right-clicking the mouse. Left-clicking is REFERENCES
implemented by detecting the blink of the left eye, while right-
clicking is detected by the blink of the right eye. Furthermore, [1] D F Vera Anaya and M R. Yuce, "A Hands-free Human-Computer-
we have implemented scrolling by detecting the Eye Aspect Interface Platform for Paralyzed Patients Using a TENG-based Eyelash
Motion Sensor," in IEEE Sensors Journal, vol. 21, no. 7, pp. 8945-8953,
Ratio (EAR). To activate the scroll mode, the user needs to April 1, 2021, doi: 10.1109/JSEN.2021.3059374.
decrease the EAR by squinting both eyes. From this point on, [2] K Hameed, S M Ali and U Ali, "An Approach to Design Keyboard and
top-down movement is considered scrolling, rather than the Mouse Assisting Device for Handicap Users," 2022 IEEE International
movement of the cursor. Fig. 6, shows the head is tilted upwards IoT, Electronics and Mechatronics Conference (IoTEMC), Lahore,
to move the cursor in the upwards direction on the screen. Pakistan, 2022, pp. 1-6, doi: 10.1109/IOTEMC54415.2022.9687113.

Authorized licensed use limited to: Zhejiang University. Downloaded on October 16,2024 at 08:58:43 UTC from IEEE Xplore. Restrictions apply.
[3] J. Vadlapati, S. Velan S and E. Varghese, "Facial Recognition using the Recognition and Face Recognition Using a Convolutional Neural
OpenCV Libraries of Python for the Pictures of Human Faces," 2021 12th Network." In 2020 International Seminar on Research of Information
International Conference on Computing, Communication and Technology and Intelligent Systems (ISRITI), pp. 437-442. IEEE, 2020.
Networking Technologies (ICCCNT), Bengaluru, India, 2021, pp. 1-6, [8] Mudit Agarwal, Mradul Mittal, and Milind Gautam. "Real-Time Facial
doi: 10.1109/ICCCNT53297.2021.9513342. Emotion Classification using Deep Convolution Neural Network." 2022
[4] F K Noble, "Comparison of OpenCV’s Feature Detectors and Feature International Conference on Computing and Network Technologies
Matchers," 2019 IEEE International Conference on Imaging Systems and (CONIT). IEEE, 2022.
Techniques (IST), Manchester, United Kingdom, 2019, pp. 1-5, doi: [9] Injae Lee, Heechul Jung, Chung Hyun Ahn, Jeongil Seo, Junmo Kim,
10.1109/IST.2019.8868516. Ohseok Kwon (2020). “Real-time Personalized Facial Expression
[5] M A Rady, S M Youssef and S F Fayed, "Smart Gesture-based Control in Recognition System Based on Deep Learning.” (ICCE 2020).
Human Computer Interaction Applications for Special-need People," [10] S Ramesh and R Rajasree, "A survey on Face Recognition Technique,"
2019 International Conference on Natural Language Processing and 2019 2nd International Conference on Computational Intelligence and
Information Retrieval (NLP-IR), Cairo, Egypt, 2019, pp. 1-5, doi: Informatics (ICCII 2019).
10.1109/NLP-IR48044.2019.9042138.
[11] Hafzatin Nurlatifa, Sunu Wibirama and Rudy Hartanto. A Study of Event
[6] Colaco S, & Han D S, “Facial Keypoint Detection with Convolutional Detection Methods in Eye-Tracking Based on Its Properties. [ICST 2020].
Neural Networks.” IEEE Access 8, 26552-26566 (2020). DOI:
10.1109/ACCESS.2020.2976948. [12] Siti Nuradlin Syahirah Sheikh Anwar, Azrina Abd Aziz and Syed Hasan
Adil. Development of Real-Time Eye Tracking Algorithm. [ICCIS 2021]
[7] Suci Dwijayanti, Rahmad Rhedo Abdillah, Hera Hikmarika, Hermawati,
Zaenal Husin, and Bhakti Yudho Suprapto. "Facial Expression

Authorized licensed use limited to: Zhejiang University. Downloaded on October 16,2024 at 08:58:43 UTC from IEEE Xplore. Restrictions apply.

You might also like