Project Report 1 (15)
Project Report 1 (15)
RECOGNITION
A Project Report
Submitted By
M.RAJESH
200303124373
the Degree of
BACHELOR OF TECHNOLOGY
Prof.MUSKAN KUMARI
Assistant Professor
PARUL UNIVERSITY
VADODARA
October - 2023
PARUL UNIVERSITY
C ERTIFICATE
This is to Certify that Project - 2 -Subject code 203105400 of 7th Semester entitled “VIRTUAL
MOUSE USING HAND GESTURES RECOGNITION” of Group No. PUCSE 135 has been
successfully completed by
• M.RAJESH - 200303124373
External Examiner
ii
Acknowledgements
-Auliq-Ice
Any significant project that a person undertakes has the support of the people who assisted
him in overcoming obstacles and achieving his objective. It brings me great pleasure to offer my
profound gratitude to my renowned Prof. Muskan Kumari for her steadfast, exceptional, and
extremely helpful collaboration and mentoring. Being led by her is a success for me. She provides
regular motivation and encouragement, which makes any complexity simple. She provided me with
a great deal of insightful advice and timely suggestions during the course of the assignment. I will
always be grateful to her, and I’m proud to have worked for her.
I also want to convey my gratitude and respect to Professor and Department Head of Computer
Science and Engineering, Dr. Amit Barve, I consider myself extremely fortunate to have received
his invaluable counsel, direction, and leadership. Last but not least, a sincere thank you to the
All-Powerful God.
Place : Vadodara
Date :
M.RAJESH- 200303124373
Abstract
This project presents an approach to develop a real-time hand gesture recognition based in
—Vision Based|| that uses only built-in-camera and Computer Vision technology, such as image
processing that can acknowledge many gestures for use in computer human interaction.
The application of real time hand-gesture recognition in the real world are countless, due to the
fact that it can be used almost anywhere where we interact with computers. The Principle
application of this project is to imitate the mouse as a visual inputting device with all of its tasks
such as left click, selecting, curser moving and scroll.
Table of Contents
Acknowledgements iii
Abstract iv
Table of Contents v
1 Introduction 1
1.5.1 Open CV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5.2 Numpy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5.4 Autopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
v
TABLE OF CONTENTS
1.5.5 PyAutoGUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.10 ALGORITHM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.11.2 Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2 Literature Survey 17
2020. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Controlling mouse pointer using webcam G. Sahu et al.; D. Gupta et al. . . . . . . 18
vi
TABLE OF CONTENTS
2.7 Interaction between human and machine using hand gestures recognition
(cheng,zhang,wang,2004) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.8 Recents improvements in hand gestures recognition for human computer interaction
Technology(Czuszynski,Murad,2009) . . . . . . . . . . . . . . . . . . . . . . . . 19
2.12 Some gestures are unimanual some gestures are bimanual symmetric Bourdot et al.
(2010) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.13 Gestures are categorized in time as either static or dynamic.(D.Gupta, G.Sahu, 2006) 22
2.15 Use a neural network method to identify American Sign Language’s static posture
Kulkarni . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 Conclusion 24
4 Future Work 25
vii
List of Figures
viii
Chapter 1
Introduction
Human-computer interaction has been transformed by the development of hand gesture detection
technology, which gives consumers a more natural and engaging experience when using digital
gadgets. The development of a virtual mouse that can be operated with hand gestures is an
intriguing application of this technology. With this advancement, people can traverse digital worlds
naturally and without using their hands. It goes beyond conventional input techniques. We will
discuss the idea of a hand gesture-driven virtual mouse in this introduction, as well as its possible
advantages and effects on the field of interactive computing.
The physical mouse is replaced by a digital one that reacts in real-time to the user’s hand
movements thanks to hand gesture recognition technology. Users can point, click, and browse
through computer interfaces utilizing gestures made in the air rather than interacting with a physical
device. This idea not only adds a layer of futuristic interactivity but also takes accessibility and
hygienic issues into account.
1
CHAPTER 1. INTRODUCTION
For the creation of engaging VR and AR experiences, hand gesture recognition is essential. Users
can use their natural hand movements to engage with virtual environments, manipulate objects, and
operate programs within these realities.
2
CHAPTER 1. INTRODUCTION
An easy method of interacting with smart TVs and home automation systems uses a virtual mouse
that can be controlled by hand gestures. Without the need of remote controllers, users can change
the volume, manage smart home devices, and switch between channels.As an example of this, we
integrated it with online interactive 2 player game so that two players can compete in the game each
with his hands.
A virtual mouse that can be operated with hand gestures in educational settings can improve hands-
on learning opportunities. Students can handle 3D models, engage with educational applications,
and take part in virtual experiments.
• The representation of the system environment, software system, and system restrictions
employed in this project will be protected.
• The detection step, which covers the acquisition and transformation of images throughout the
image processing life cycle until the required images are needed for recognition.
• Using the result of the detection procedure, or recognition, to identify the gesture will be
discussed.
• For this project, each of the events that will be produced in response to gestures will be defined
independently.
3
CHAPTER 1. INTRODUCTION
1.5.1 Open CV
In our experiment, we used the OpenCV 2.0 library to process video frames. Intel or Progress’
Open Source Computer Vision Library is known as OpenCV. It is a collection of C result and a
few C++ classes that execute a few well-liked image processing and computer vision techniques.
OpenCV is a middle to high level cross-platform API made up of several hundred C components.
While some of them may be used at compile time, it is independent of external numerical libraries.
We must import this library as import cv2 in order to use it.
1.5.2 Numpy
A library made up of multidimensional array objects and a collection of patterns for processing
arrays is called the NumPy Library, which stands for Numerical Python. It provides a vast library of
high-level mathematical functions that work on these arrays as well as sophisticated data structures
to Python that provide properly organized computations using arrays and matrices. Additionally, it
will provide a concise explanation of the basic mathematical operations in a simple manner. Numpy
is becoming more well-known and is utilized in several production systems. We must import numpy
in order to use this library.
The MediaPipe framework is utilized for hand tracking and gesture detection, and the OpenCV
library is used for computer vision. For the purpose of tracking and identifying hand movements
and hand tips, the set of instructions uses machine learning ideas. A framework called MediaPipe
is a Google open source framework that is applied in a machine learning pipeline. Given that
the framework was created utilizing time series data, the Mediapipe structure is advantageous
for cross-platform development. The MediaPipe structure has multiple modalities, allowing this
4
CHAPTER 1. INTRODUCTION
architecture to be used with a variety of audio and video files.Developers create and analyze systems
through plot using the MediaPipe structure. They also utilize it to evolve systems for application
purposes. The pipeline configuration is where the actions in the MediaPipe-using system are carried
out. We must import this library as import mediapipe in order to use it.
1.5.4 Autopy
An easy-to-use GUI automation toolkit for Python is called Autopy. It includes cross-platform,
effective, and straightforward controls for the keyboard and mouse, locating colors and bitmaps
on the screen, and showing alarms. Several mouse control functions are included in AutoPy. The
autopy module has many functions for simulating mouse movements and button clicks of the mouse
wheel. Mainly it works in our project as a functional controlling of a mouse. To use this library we
have to import it as: import autopy.
1.5.5 PyAutoGUI
PyAutoGUI is a python automation library. It includes functions for controlling the mouse in a
simple manner. This package is works on the windows, linux and macOS x which provides the
ability to simulate mouse curser moves and button clicks. Mainly it works in our project as a
functional controlling like scrolling of the mouse. To use this library we have to import pyautogui.
5
CHAPTER 1. INTRODUCTION
6
CHAPTER 1. INTRODUCTION
Using the camera, the video is captured and making the each frame more clear. For accessing the
system camera and capture video, we used cv.2VideoCapture().
Step 1: From the video, we get to differ the hand from other objects by using the library source
named opencv.
Step 2 : Human hand gets a landmarks after getting detected, by using the package called mediapipe.
Step 3 : Marked hand gets the mathematical representation of hand tips by using the numpy library.
7
CHAPTER 1. INTRODUCTION
8
CHAPTER 1. INTRODUCTION
9
CHAPTER 1. INTRODUCTION
1. Firstly, user has to raise all fingers, then after the gesture denotes gets start.
2. If the only index finger is up, where finger(1) is 1 and remaining are 0 then the curser has to
move from current position to next position.
3. If only the middle finger is up then the curser has to be hold at the current location or position.
4. If the thumb finger is up then the lift click operation has to take place.
5. The scroll up function must be used if both the index and middle fingers are up.
6. The scroll down feature must be used if both the index and pinky fingers are up.
7. If the all fingers are down then the no function is going to happens. Now gesture is determined.
10
CHAPTER 1. INTRODUCTION
11
CHAPTER 1. INTRODUCTION
12
CHAPTER 1. INTRODUCTION
13
CHAPTER 1. INTRODUCTION
14
CHAPTER 1. INTRODUCTION
• Measure the finger position on the screen and add it to the current cursor position.
• Verify that the index finger is up before updating the cursor’s location to the next one and
leaving the previous one empty.
• Upgrade the current position of curser frequently as the curser is always moving as the index
finger is up.
Left Click on
• Get the finger location, scale it to the screen position, and the motion that is
being used right now.
• Make that the left click function is executed at the current cursor position if only the thumb
finger is raised.
• Keep modernize the curser position and left click function is to be operated.
Scroll up
• We update the cursor position when we see that the index and middle fingers are up.
Scroll down
• Find the index and pinky fingers are up then the update the current curser position.
15
CHAPTER 1. INTRODUCTION
1.10 ALGORITHM
1. Taking image or video as input through the webcam.
3. Giving the fingers tip id’s and skeleton or mathematical representation of hand
4. Recognizing the hand gesture from the fingers positions (up and down).
5. According to the position of fingers we found, it will be linked to the cursor position.
6. Performs the operations left click, moving and scrolling in the system.
1. Open CV
2. Numpy
3. Media pipe
4. Autopy
5. Pyautogui
6. Python
7. pycham / Jupiter
1.11.2 Technology
• Machine Learning
16
Chapter 2
Literature Survey
Hasan; et al 2020.
Form the result Function of Gauss To eliminate the rotating impact, the image has been divided into
circular areas, or domains that are formed in a terrace shape. The shape is divided into 11 terraces,
each of which has a width of 0.1. The 0.1 width division results in 9 terraces, numbered 1 through 9,
0.9 to 0.8, 0.7 to 0.6, 0.6 to 0.5, 0.5 to 0.4, 0.4 to 0.3, 0.3 to 0.2, and 0.2 to 0.1, plus one terrace with
a value of less than 0.1 and the last terrace for the outside region that stretched beyond the outer
terrace. This division’s explanation is illustrated. Terraces split with a 0.1 probability. Every terrace
is divided into 8 sections known as the feature regions, and empirical research reveals that number
8 is the largest.
17
CHAPTER 2. LITERATURE SURVEY
2.5 Controlling mouse pointer using webcam G. Sahu et al.; D. Gupta et al.
Gupta, D., et al.A system for manipulating the mouse pointer with a webcam was created by G. Sahu
et al. It can make or finish phone calls and control the volume of a media player and powerpoint
slides. The user’s finger was recognized using RGB color tapes.
2.7 Interaction between human and machine using hand gestures recognition
(cheng,zhang,wang,2004)
Hand gesture recognition techniques for a human machine interaction uses sensors like depth
cameras or even gloves equipped with sensors to capture hand movements depth information is
praticularly useful for capturing 3D hand gesture’s and some for feature extraction like hand contour
and hand shape etc...and also neural networks like convolutional neural network,recurrent networks
are also used for identifying hand motions.
18
CHAPTER 2. LITERATURE SURVEY
Technology(Czuszynski,Murad,2009)
a wide range of hand posture and gesture identification methods and tools that shed light on both
their uses and limitations.Hand gesture detection relies heavily on image-based methods. Traditional
computer vision algorithms were used in early techniques, but more recent improvements have made
use of deep learning models, particularly Convolutional Neural Networks . With the use of these
models, diverse postures and movements may be accurately recognized by automatically extracting
hierarchical information from hand photos.A smooth interface between humans and computers
requires real-time gesture recognition. A lot of work is still being done to optimize algorithms and
models for low-latency processing, which will allow for useful applications in interactive systems
like gaming and virtual reality.
Acharya,2012)
In order to improve engagement in virtual reality (VR) environments, hand gesture detection is
essential. This technology gives users a more immersive and organic manner to interact with digital
material. Effective hand gesture recognition in VR is made possible by a variety of methods and
technology.For realistic interactions in virtual reality, it is essential to estimate the hand’s
three-dimensional posture. The spatial arrangement of the hand joints is predicted using methods
like direct regression approaches or model-based pose estimation.A flawless VR experience
requires real-time processing. To accomplish low-latency hand gesture recognition and reaction,
optimization approaches including model simplification and parallel processing are used.Enhancing
the impression of presence and interaction through visual feedback in VR includes showing virtual
hands that replicate the user’s actual hand movements.Users can sense tactile sensations in reaction
to their gestures by integrating haptic feedback devices, which further improves the immersive
19
CHAPTER 2. LITERATURE SURVEY
Agrawal,2007)
Computer vision techniques are used to interpret and comprehend hand movements and
configurations from visual input, which is commonly gathered by cameras, in vision-based hand
gesture identification. This method is frequently utilized in a variety of applications, including
virtual reality and human-computer interaction.Recognizing dynamic motions requires constant
hand tracking between frames. The position and motion of the hand are estimated by tracking
algorithms. For reliable tracking, methods like Kalman filtering or particle filtering are frequently
utilized.Once trained, the model may be used to recognize hand movements in real time. Based on
the recognized patterns, the system divides the input motions into predetermined categories.3D
hand pose estimation is used in applications that need three-dimensional data. This entails figuring
out how the hand joints are arranged spatially in three dimensions. Accurate pose estimation is
made possible by depth data from cameras or specialized sensors.For responsive interactions,
real-time processing that is accomplished through parallelism and optimization is essential. The
system associates recognized motions with certain application activities and provides feedback via
visual, aural, or haptic means. A seamless and immersive user experience depends on robustness to
external elements, adaptability to various settings, and intuitive user mappings. Current study
examines developments in sensor fusion, deep learning, and practical applications in several
fields.The system’s primary function, gesture recognition, categorizes input motions into
predetermined groups. The seamless integration of these detected gestures into apps is ensured by
mapping them to certain actions. The user experience is improved when feedback is given, such as
visual representation or haptic responses. For real-world applications, resilience to environmental
changes, a variety of hand forms, and various illumination situations are essential. The limits of
vision-based hand gesture identification are being pushed by ongoing research into cutting-edge
20
CHAPTER 2. LITERATURE SURVEY
methodologies like deep learning and multi-modal sensor fusion in industries ranging from gaming
to healthcare.
2.12 Some gestures are unimanual some gestures are bimanual symmetric
21
CHAPTER 2. LITERATURE SURVEY
G.Sahu, 2006)
A crucial component of gesture recognition is temporal classification, which distinguishes between
static and dynamic gestures based on the speed and style of the hand movements. Understanding the
temporal properties of gestures is crucial for the development of more precise and context-sensitive
recognition systems.Static gestures involve a consistent hand arrangement that is maintained over
time with little to no movement. The hand conveys information without the need for dynamic
changes by remaining in a specific posture or shape.The thumbs-up sign, pointing, and making
specific hand shapes that correspond to letters or symbols in sign language are examples of common
static gestures.In human-computer interaction scenarios, commands are frequently conveyed through
static gestures. For example, a hand held in a particular position in front of a camera can be
connected to particular orders or activities.In contrast, dynamic gestures feature constant, changing
hand movements. The ability to recognize these gestures depends on being able to record the
temporal evolution of hand configurations.Dynamic gestures include waving, swiping, and making
circular motions with the hand. Gestures that shift from one position to another are referred to be
dynamic gestures in sign language.In order to develop more flexible and expressive gesture detection
systems, researchers frequently investigate hybrid techniques that blend static and dynamic aspects.
To fully comprehend the gesture, one must take into account both the initial hand configuration and
the subsequent dynamic movements.
22
CHAPTER 2. LITERATURE SURVEY
recognition
With the background being black, the input image is segmented using a thresholding technique. In
order to adjust the coordinates to match the centroid of the hand object at the origin of the X and Y
axes, any segmented picture is normalized and its center mass is computed.A scaled normalization
process is used to address this issue and maintain image dimensions because this method, which is
based on the central mass of the object, generates images with varying dimensions. Two methods
are utilized to extract the features: first, edge matrices; and second, normalized features, which
measure only the brightness rates of pixels and exclude other black pixels in order to shorten the
feature vector. There are six different gestures in the database, each with ten examples (5 for training
and 5 for assessment).
el at
After the input image is captured by the camera and a skin color detection filter is applied, the
clustering procedure is utilized to locate the boundaries of each group in the clustered image using
a standard contour-tracking technique. Neural Networks MLP and Dynamic Programming DP
matching were employed in the grouping process. A color segmentation method based on the skin
color filter was used to identify the hand region, and the hand form morphology was found using
the (SGONG) network. The number of raised fingers and other aspects of the hand’s shape were
determined by the finger identification procedure, which was utilized to extract the features.
23
Chapter 3
Conclusion
Processing rates have significantly grown in the modern, digital world, and current computers are
now capable of assisting people in challenging jobs. However, coding technologies appear to
significantly impede the completion of a small number of activities, utilizing the resources at hand
inefficiently, and limiting the expressiveness of program usage. Here, gesture recognition can be
helpful. To attain interactivity and usability, computer vision techniques for human gesture
association must outperform present performance in terms of robustness and speed.
This project’s goal was to develop a system that could identify a variety of hand gestures in real
time and use that knowledge to prosecute the movements in the right situations for our application.
24
Chapter 4
Future Work
Two-handed 3D: Using several cameras, it would be feasible to track the gestures made by both
hands while they are both in the frame. To recognize a gesture indicated by a partially obscured
hand, a process would need to be created. It would be significantly more difficult to carry out this
process.
Because in order to recognize the gestures, we must process multiple frames at once from
various cameras. These motions could be used with sign language if we put them to use.
25
References
3. D.-H. Liou, D.Lee, and C.-C.Hsieh, —A real time gesture control virtual mouse using motion
history image,|| in Proceedings of the 2010 2nd International Conference on Signal Processing
System, July 2010.
4. S.U.Dudhane, —Cursor control system using hand gesture recognition,|| IJARCCE, vol. 2, no.
5, 2013.
5. K.P.Vinay, —Cursor control using gesture control virtual mouse,|| International Journal of
Critical Accounting, vol. 0975–8887, 2016.
7. P. Nandhini, J. Jaya, and J. George, —Computer vision system for food quality evaluation—a
review,|| in Proceedings of the 2013 International Conference on Current Trends in Engineering
and Technology (ICCTET), PP. 85–87, C
8. J. Jaya and K.Thanushkodi, —Implementation of certain system for medical image diagnosis,||
European Journal of Scientific Research, vol. 53, no. 4, pp. 561–567, 2011.
9. P.Nandhini and J.Jaya, —Image segmentation for food quality evaluation using computer
vision system,|| International Journal of Engineering Research and Applications, vol. 4, no. 2,
pp. 1– 3, 2014.
26
CHAPTER 4. FUTURE WORK
10. J.Jaya and K.Thanushkodi, —Implementation of classification system for medical images,||
European Journal of Scientific Research, vol. 53, no. 4, pp. 561–569, 2011.
11. J. T. Camillo Lugaresi, —MediaPipe: A Framework for Building Perception Pipelines,|| 2019
https://arxiv.org/abs/1906.08172.
12. Google, MP, https://ai.googleblog.com/2019/08/on-device real time hand tracking with html.
13. V. Bazarevsky and G. R. Fan Zhang. On-Device, MediaPipe for Real-Time Hand Tracking.
14. K.Pulli, A.Baksheev, K.Kornyakov, and V.Eruhimov, —Realtime computer vision with
openCV,|| Queue, vol. 10, no. 4, pp. 40–56, 2012.
16. D.-S. Tran, N.-H. Ho, H.-J. Yang, S.-H. Kim, and G. S. Lee, —Real-time virtual mouse
system using RGB-D images and fingertip detection,|| Multimedia Tools and Applications
Multimedia Tools and Applications, vol. 80, no. 7, pp. 10473–10490, 2021.
18. J. S.Nayak, —Hand gesture recognition for human computer interaction,|| Procedia Computer
Science, vol. 115, pp. 367– 374, 2017.
19. K. H. Shibly, S. Kumar Dey, M.A.Islam, and S.Iftekhar Showrav, —Design and development
of hand gesture based virtual mouse,|| in Proceedings of the 2019 1st International Conference
on Advances in Science, Engineering and Robotics Technology (ICASERT), pp. 1–5, Dhaka,
Bangladesh, May 2019.
27