0% found this document useful (0 votes)
5 views

Major Report

The project 'Virtual Mouse Using Hand Gesture Recognition' aims to replace traditional input devices with a touchless system that allows users to control a computer mouse using hand gestures captured via a webcam. Utilizing computer vision and machine learning techniques, the system detects hand landmarks and translates gestures into mouse operations such as movement, clicks, and scrolling. This innovative solution enhances accessibility for individuals with physical impairments and is suitable for use in sterile environments, showcasing the potential of gesture recognition in modern human-computer interaction.

Uploaded by

bhorsubhro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Major Report

The project 'Virtual Mouse Using Hand Gesture Recognition' aims to replace traditional input devices with a touchless system that allows users to control a computer mouse using hand gestures captured via a webcam. Utilizing computer vision and machine learning techniques, the system detects hand landmarks and translates gestures into mouse operations such as movement, clicks, and scrolling. This innovative solution enhances accessibility for individuals with physical impairments and is suitable for use in sterile environments, showcasing the potential of gesture recognition in modern human-computer interaction.

Uploaded by

bhorsubhro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

ABSTRACT

In the evolving world of Human-Computer Interaction (HCI), the use of


touchless and intuitive input methods is gaining popularity for enhancing
accessibility and user convenience. This project, titled “Virtual Mouse Using
Hand Gesture Recognition,” aims to eliminate the need for physical input
devices like a traditional mouse by enabling users to control cursor movement
and perform mouse operations through simple hand gestures captured via a
webcam.
The system uses computer vision and machine learning techniques, specifically
the MediaPipe framework by Google, integrated with OpenCV for image
processing and Python for implementation. The webcam captures real-time
video frames, which are processed to detect and track the user's hand
landmarks using a deep learning-based hand tracking model. Each gesture
corresponds to a specific mouse operation such as cursor movement, left click,
right click, double click, scroll up, scroll down, and even taking a screenshot.
A core component of the project involves calculating the angle between fingers
and the distance between specific landmarks to differentiate gestures
accurately. These gestures are interpreted through custom algorithms built on
top of the hand landmark data provided by MediaPipe, ensuring high precision
and real-time responsiveness.
This project not only demonstrates a practical and cost-effective alternative to
conventional input devices but also provides a foundation for further research
into gesture-based interfaces, which can be especially beneficial for physically
challenged individuals or in touchless environments such as cleanrooms or
public terminals.
The successful implementation of this system showcases the potential of
gesture recognition in real-time applications and reflects the growing
significance of AI-driven natural user interfaces in shaping the future of
human-computer interaction.

Page | 1
INTRODUCTION

The traditional methods of interacting with computers—through hardware


devices such as a mouse and keyboard—have remained largely unchanged for
decades. However, the advancement of computer vision and artificial
intelligence (AI) has opened up new possibilities for developing more natural,
intuitive, and contactless user interfaces. One such emerging area is gesture-
based control, where users can interact with digital systems using simple hand
gestures. This project, "Virtual Mouse Using Hand Gesture Recognition,"
explores this innovative domain by replacing the physical mouse with a virtual
one controlled entirely by hand gestures.
The virtual mouse system utilizes a webcam to capture real-time video input,
detects the user’s hand, and identifies specific landmarks on the hand using
MediaPipe, a state-of-the-art machine learning solution developed by Google.
By analyzing the spatial relationships, angles, and distances between various
hand landmarks, the system interprets gestures and maps them to standard
mouse functions such as cursor movement, left click, right click, double click,
scroll operations, and screenshot capture.
The use of Python, OpenCV, and MediaPipe allows for efficient image
processing and hand tracking without the need for expensive hardware. The
system is designed to be platform-independent, low-cost, and easy to use,
making it accessible for a wide range of users including those with physical
disabilities or in environments where touch is not desirable (e.g., clean rooms,
medical settings, or public information kiosks).
This project not only serves as a demonstration of modern gesture recognition
capabilities but also contributes toward building more immersive and
accessible technologies for the future.

Objective of the Project:


The primary objective of this project is to design and implement a virtual
mouse system controlled entirely by hand gestures, using real-time hand
tracking and gesture recognition. The specific goals include:
1. To eliminate the need for a physical mouse by allowing users to perform
mouse operations with hand gestures using a standard webcam.
2. To accurately detect and track hand landmarks in real time using the
MediaPipe framework.

Page | 2
3. To develop gesture recognition logic that translates finger positions and
movements into mouse actions such as:
 Cursor movement
 Left click
 Right click
 Double click
 Scroll up and down
 Screenshot capture
4. To create a user-friendly and responsive interface that ensures smooth
cursor control and precise gesture detection with minimal latency.
5. To enhance human-computer interaction by implementing a touchless,
intuitive, and accessible input method suitable for users with limited
mobility or for use in sterile or public environments.
6. To lay the foundation for further development in gesture-based
interaction technologies that could be applied in gaming, augmented
reality, robotics, and assistive devices.

Motivation Behind the Project:


In today’s rapidly evolving digital world, human-computer interaction (HCI) is
shifting toward more natural, intuitive, and contactless interfaces. While
traditional devices like the mouse and keyboard have been the backbone of
interaction for decades, they come with limitations—especially in scenarios
that demand hygiene, accessibility, or hands-free operation.
The COVID-19 pandemic highlighted the need for contactless technologies in
public and medical spaces. At the same time, individuals with physical
disabilities often find it challenging to use traditional input devices. These
challenges motivated the development of a gesture-based virtual mouse—a
system that allows users to control their computer using only hand
movements, making the interaction seamless and more inclusive.
Additionally, the rapid advancement of computer vision and machine learning
technologies like MediaPipe and OpenCV has made it possible to implement
such systems using only a webcam and a standard computer. This makes the
solution cost-effective, easy to deploy, and highly scalable across different
environments.

Page | 3
Real-World Applications:
The Virtual Mouse Using Hand Gesture Recognition system can be applied in
various real-world scenarios, including:
 Accessibility Solutions
For individuals with physical impairments who cannot use a
traditional mouse or keyboard, this system provides an
alternative, hands-free method of interacting with computers.
 Public Information Kiosks
In airports, hospitals, or banks, users can operate information
systems without touching screens, reducing the spread of germs
and maintaining hygiene.
 Medical and Cleanroom Environments
Doctors, lab technicians, or researchers working in sterile
conditions can interact with systems without direct contact,
ensuring cleanliness and safety.
 Gaming and Virtual Reality (VR)
The system can be adapted for gesture-based controls in
immersive gaming environments or AR/VR applications.
 Smart Home Control
Gestures can be used to control home appliances, lights, or
entertainment systems without physical remotes.
 Robotics and IoT Devices
Gesture recognition can be integrated into robotics to provide
command inputs or control robotic arms and drones.

Page | 4
TECHNOLOGY USED

A. SOFTWARE:
The development of this project required a combination of integrated
development tools, programming languages, computer vision libraries, and
automation modules. Below is a comprehensive explanation of each software
component used:
1. Python Programming Language
Version: Python 3.x (3.8 or higher is recommended)
Purpose and Role:
 Python is the core programming language used for developing the
entire project due to its simplicity and extensive support for
libraries and frameworks related to computer vision, machine
learning, and automation.
 Python allows for rapid development and prototyping, which is
ideal for projects involving real-time processing like gesture
recognition.
 It provides easy-to-use syntax and extensive libraries for
interacting with hardware and performing advanced calculations,
making it the perfect choice for this project.
Python also supports libraries like PyAutoGUI, OpenCV, and MediaPipe,
which are essential for capturing video, processing hand gestures, and
controlling the virtual mouse.

2. PyCharm (IDE)
Version: Community or Professional Edition
Purpose and Role:
PyCharm is an integrated development environment (IDE) used for
writing, editing, testing, and debugging Python code. It supports
features such as:
 Intelligent code completion
 Project navigation
 Integrated debugging and testing tools
 Virtual environment support

Page | 5
In this project, PyCharm helped streamline the development
process by providing a feature-rich environment that enhances
productivity and efficiency.
It also supports version control (like Git) and integrates well with
various Python libraries, making it ideal for the smooth execution
of Python-based projects.

3. OpenCV (Open Source Computer Vision Library)


Version: OpenCV 4.x (or compatible)
Purpose and Role:
OpenCV is an open-source computer vision library that enables
real-time image and video processing. In this project, OpenCV is
used for:
 Capturing video frames from the webcam in real time. The
frames are then passed through the gesture recognition
pipeline.
 Converting image frames from BGR (default format from
webcam) to RGB for better compatibility with MediaPipe.
 Flipping frames horizontally to provide a mirror-like view
(for more natural interaction).
 Drawing feedback on the frames (e.g., mouse actions like
clicks and scrolls) to show the status of the gesture being
detected.
OpenCV's efficient video and image processing capabilities make it an
ideal choice for real-time applications like hand gesture recognition.

4. MediaPipe by Google
Version: MediaPipe 0.8.x (or compatible)
Purpose and Role:
MediaPipe is a framework developed by Google that allows real-
time, cross-platform processing for computer vision and machine
learning tasks. In this project, the MediaPipe Hands solution is
used for:
 Real-time hand tracking: It detects the 21 hand landmarks
(keypoints) from the webcam video feed, which represent
key positions on the hand (e.g., wrist, knuckles, fingers).
 Landmark Detection: MediaPipe processes each hand
frame and provides the x, y, z coordinates for each of the 21

Page | 6
landmarks. These coordinates are essential for detecting
gestures like pointing, fist, or specific finger movements.
 Gesture Recognition: By analyzing the relative positions of
the hand landmarks (angles, distances), gestures are
identified to perform actions like moving the cursor,
clicking, scrolling, etc.
MediaPipe is lightweight and performs very efficiently, making it suitable
for real-time processing even with basic hardware.

5. PyAutoGUI
Version: PyAutoGUI 0.9.x (or compatible)
Purpose and Role:
PyAutoGUI is a Python automation library used for controlling the
mouse and keyboard programmatically. In this project, it is used
to:
 Move the mouse: The x, y coordinates obtained from the
hand landmarks are mapped to screen coordinates, and
PyAutoGUI is used to move the mouse pointer to the
calculated position on the screen.
 Click actions: Based on hand gestures (e.g., index finger
pointing, fist, etc.), PyAutoGUI performs mouse actions like
left-click, right-click, and double-click.
 Take screenshots: The project uses PyAutoGUI's screenshot
feature to capture the screen when a certain gesture (like
making a fist) is detected.
 Scrolling: PyAutoGUI's scroll function is used to simulate
mouse wheel scroll actions, based on gestures such as
pinching with the thumb and another finger.
PyAutoGUI is crucial for creating the virtual mouse experience by
providing a way to interact with the system as though the user were
using a physical mouse.

6. Pynput
Version: Pynput 1.x (or compatible)
Purpose and Role:
Pynput is a library used to control and monitor input devices,
including the mouse and keyboard.
In this project, Pynput is used for:

Page | 7
 Mouse control: Specifically for simulating mouse button
presses (left click, right click, double-click). While PyAutoGUI
also handles this task, Pynput can be used for low-level
control, offering more responsiveness in some cases.
 Button press simulation: Pynput can trigger mouse button
presses and releases, making the interaction smoother and
more reliable during gesture recognition.
 It also ensures that the mouse actions (clicks) are
recognized correctly when hand gestures such as bending
the index finger or making a fist are detected.

7. NumPy
Version: NumPy 1.x (or compatible)
Purpose and Role:
NumPy is a popular library used for numerical computing in
Python. It is essential for tasks involving arrays and complex
mathematical operations.
In this project, NumPy might be used indirectly in functions that
involve:
 Calculating distances: The Euclidean distance between two
hand landmarks (e.g., thumb and index finger) can be
computed using NumPy’s vectorized operations, improving
efficiency.
Angle calculations: The relative angles between landmarks (e.g.,
the angle between the thumb, index, and middle fingers) can
also be computed using NumPy to classify specific gestures.
If included, NumPy is helpful in optimizing mathematical operations,
making the code faster and more efficient.

8. Custom Utility File (util.py)


Purpose and Role:
This is a user-defined Python module, specifically designed to
handle the following utility functions:
 Distance calculations: Calculating the Euclidean
distance between two landmarks to determine if
certain gestures (such as clicking or zooming) are
performed.
 Angle calculations: Computing angles between
different landmarks to detect gestures like

Page | 8
bending fingers (for clicks) or spreading fingers
(for scrolling).
 Gesture classification logic: The utility module
helps to modularize the logic for detecting
different hand gestures, reducing the complexity
in the main program and improving
maintainability.

9. OpenCV Drawing Utilities (mediapipe.solutions.drawing_utils)


Purpose and Role:
OpenCV's Drawing Utilities help in visualizing the hand
landmarks and gestures on the video feed. These utilities are
crucial for:
 Drawing hand landmarks: The detected hand
landmarks are drawn on the video frame in real
time, making it easy to see where the system is
detecting the hand.
 Feedback visualization: When a specific gesture
is detected (e.g., a click), a text label (e.g., “Left
Click”, “Scroll Up”) is displayed on the screen to
confirm the action.
This feature enhances the user experience by providing immediate visual
feedback.

B. HARDWARE:
Details specifications of the device used for creating this project:
DEVICE SPECIFICATIONS:
Device name LAPTOP-TAC90UTT
Processor Intel(R) Core(TM) i5-1035G1 CPU @ 1.00GHz 1.19 GHz
Installed RAM 8.00 GB (7.74 GB usable)
Device ID 0459F067-605E-43E1-BED2-59A6EF2B354F
Product ID 00356-24564-25200-AAOEM
System type 64-bit operating system, x64-based processor
Pen and touch No pen or touch input is available for this display
WINDOWS SPECIFICATIONS:
Edition Windows 11 Home Single Language
Version 24H2
Installed on 07-12-2024
OS build 26100.3915
Page | 9
PROJECT DETAILS

1. Project Overview
The project titled "Virtual Mouse Using Hand Gesture Recognition" is an
innovative solution aimed at replacing traditional hardware input devices with
a vision-based system. The system leverages real-time video input from a
webcam, along with hand gesture detection and tracking, to control mouse
operations such as movement, clicking, scrolling, and even taking
screenshots—all without physical contact.
This system is particularly relevant in a world that is moving toward touchless
technology, where interaction through gestures ensures sanitation,
accessibility, and convenience, especially for people with mobility impairments
or in environments where physical touch is restricted.

2. Working Principle
The project is built using computer vision and hand landmark detection. The
core principle involves:
Capturing video frames through a webcam.
Using MediaPipe to detect 21 hand landmarks on the user's
hand.
Tracking the relative positions of key landmarks (like the thumb,
index, and middle fingers).
Based on the position and gesture, the system performs:
 Mouse movement by mapping the index
fingertip's position to the screen.
 Left and right clicks through specific gestures like
finger pinching or folding.
 Scrolling and zooming using two-finger gestures.
 Screenshots through special gestures like making
a fist.
Each gesture is classified in real-time and converted into a corresponding
mouse event using PyAutoGUI and Pynput.

3. System Architecture
The overall system works in the following flow:
Input Device:
 Webcam captures live video feed of hand gestures.
Processing Units:

Page | 10
 OpenCV handles frame capture, pre-processing, and visual feedback.
 MediaPipe detects hand landmarks and tracks gestures.
 Custom logic (via util.py) analyzes gestures using landmark positions.
 PyAutoGUI/Pynput simulates mouse control events based on gesture
analysis.
Output Device:
 Screen reflects mouse movements and interactions.

4. Features Implemented
1. Mouse Cursor Movement
o The cursor follows the movement of the index finger in real-time.
o Movement is smooth and mapped from camera space to screen
space.
2. Left Click
o Triggered when the index finger is bent and thumb is away.
o Gesture is identified by measuring finger angles and distances.
3. Right Click
o Triggered when the middle finger is bent and thumb is away.
o A different angle and distance pattern is used from the left click.
4. Double Click
o Triggered when both index and middle fingers are bent.
o Uses time threshold to avoid accidental multiple clicks.
5. Screenshot Capture
o Triggered when the fist is made (all fingers closed).
o Takes a screenshot and saves it with a unique filename.
6. Scroll Up
o Triggered when thumb and ring finger tips are brought close
together.
o Performs upward scrolling using pyautogui.scroll().
7. Scroll Down
o Triggered when thumb and pinky finger tips are brought close
together.
o Scrolls down by simulating mouse wheel motion.
8. Real-Time Visual Feedback
o On-screen gesture names are displayed (e.g., “Left Click”, “Scroll
Up”).
o Helpful for debugging and user interaction awareness.

Page | 11
5. Algorithm and Logic
 Step 1: Capture video from webcam using OpenCV.
 Step 2: Detect hand using MediaPipe and extract landmarks.
 Step 3: Calculate relative distances or angles between specific
landmarks.
 Step 4: Based on position logic (e.g., if distance < threshold), classify
gestures.
 Step 5: Use PyAutoGUI/Pynput to trigger corresponding mouse action.
 Step 6: Provide visual feedback on the screen for detected gestures.

6. Hand Gesture Classification


Gesture recognition is achieved by analyzing:
 The distance between landmarks (e.g., index tip and thumb tip).
 Finger orientation (e.g., whether a finger is raised or folded).
 Relative motion, if any (e.g., finger drag, zoom gesture with left hand).
Thresholds are defined for detecting pinches, folds, and movement direction,
ensuring accurate and noise-free detection.

7. Advantages of the System


 Touchless interaction: Prevents physical wear and tear or
contamination.
 Accessibility: Helpful for physically challenged users.
 User-friendly: Intuitive gestures mimic real-life motions.
 Cost-effective: Requires only a webcam and free software tools.
 Portable: Can work on any laptop with a webcam and Python
environment.

8. Limitations
 Lighting conditions: Poor lighting can affect hand detection accuracy.
 Camera quality: Low-resolution webcams may impact precision.
 Background noise: Cluttered backgrounds may cause false detections.
 Single-user limitation: Multi-user hand gesture detection is not
supported.

Page | 12
CONCLUSION

The project “Virtual Mouse Using Hand Gesture Recognition” successfully


demonstrates how computer vision and hand gesture tracking can be used to
control mouse functionalities in real time, without any physical hardware. By
integrating technologies like MediaPipe, OpenCV, and PyAutoGUI, this system
replaces the traditional mouse with intuitive hand gestures, enabling touchless
control over a computer interface.
Throughout the development process, we implemented key features such as
cursor movement, left and right clicks, double click, scrolling, and screenshot
capture, all using distinct hand gestures. These features were thoroughly
tested under various lighting conditions and showed promising performance
with minimal latency and high gesture accuracy.
The system not only enhances the way users interact with machines but also
opens up possibilities for creating accessible computing solutions for people
with physical disabilities, as well as contactless interfaces for use in sterile or
public environments.
While the project has limitations—such as sensitivity to lighting, background
interference, and the use of only a single hand in most operations—these
challenges can be overcome in future enhancements by introducing machine
learning-based gesture classifiers, background segmentation, and multi-hand
support.
In conclusion, this project is a step toward futuristic human-computer
interaction, contributing to the broader field of gesture-based computing, and
showcases how everyday hardware like a webcam can be transformed into a
powerful input device with the right software tools and creativity.

Page | 13
REFERENCES

MediaPipe Documentation
https://google.github.io/mediapipe
(Used for real-time hand tracking and landmark detection)
OpenCV Documentation
https://docs.opencv.org
(Used for image processing, video capture, and display handling)
PyAutoGUI Documentation
https://pyautogui.readthedocs.io
(Used for automating mouse and keyboard actions)
Pynput Library
https://pynput.readthedocs.io
(Used to control mouse button clicks programmatically)
Python Official Website
https://www.python.org
(General reference for Python programming and standard libraries)
Google Search Engine
https://www.google.com
(Used for researching gesture logic, documentation, and problem-
solving)
YouTube Tutorials
https://www.youtube.com
(Used for understanding MediaPipe, gesture recognition logic, and
implementation tutorials)
GitHub
https://www.github.com
(Used for exploring open-source gesture control projects and code
structure inspiration)
GeeksforGeeks
https://www.geeksforgeeks.org
(Used for coding references and explanations of OpenCV and Python
functions)
Stack Overflow
https://stackoverflow.com
(Used to resolve programming errors and understand third-party library
behavior)

Page | 14

You might also like