Major Report
Major Report
Page | 1
INTRODUCTION
Page | 2
3. To develop gesture recognition logic that translates finger positions and
movements into mouse actions such as:
Cursor movement
Left click
Right click
Double click
Scroll up and down
Screenshot capture
4. To create a user-friendly and responsive interface that ensures smooth
cursor control and precise gesture detection with minimal latency.
5. To enhance human-computer interaction by implementing a touchless,
intuitive, and accessible input method suitable for users with limited
mobility or for use in sterile or public environments.
6. To lay the foundation for further development in gesture-based
interaction technologies that could be applied in gaming, augmented
reality, robotics, and assistive devices.
Page | 3
Real-World Applications:
The Virtual Mouse Using Hand Gesture Recognition system can be applied in
various real-world scenarios, including:
Accessibility Solutions
For individuals with physical impairments who cannot use a
traditional mouse or keyboard, this system provides an
alternative, hands-free method of interacting with computers.
Public Information Kiosks
In airports, hospitals, or banks, users can operate information
systems without touching screens, reducing the spread of germs
and maintaining hygiene.
Medical and Cleanroom Environments
Doctors, lab technicians, or researchers working in sterile
conditions can interact with systems without direct contact,
ensuring cleanliness and safety.
Gaming and Virtual Reality (VR)
The system can be adapted for gesture-based controls in
immersive gaming environments or AR/VR applications.
Smart Home Control
Gestures can be used to control home appliances, lights, or
entertainment systems without physical remotes.
Robotics and IoT Devices
Gesture recognition can be integrated into robotics to provide
command inputs or control robotic arms and drones.
Page | 4
TECHNOLOGY USED
A. SOFTWARE:
The development of this project required a combination of integrated
development tools, programming languages, computer vision libraries, and
automation modules. Below is a comprehensive explanation of each software
component used:
1. Python Programming Language
Version: Python 3.x (3.8 or higher is recommended)
Purpose and Role:
Python is the core programming language used for developing the
entire project due to its simplicity and extensive support for
libraries and frameworks related to computer vision, machine
learning, and automation.
Python allows for rapid development and prototyping, which is
ideal for projects involving real-time processing like gesture
recognition.
It provides easy-to-use syntax and extensive libraries for
interacting with hardware and performing advanced calculations,
making it the perfect choice for this project.
Python also supports libraries like PyAutoGUI, OpenCV, and MediaPipe,
which are essential for capturing video, processing hand gestures, and
controlling the virtual mouse.
2. PyCharm (IDE)
Version: Community or Professional Edition
Purpose and Role:
PyCharm is an integrated development environment (IDE) used for
writing, editing, testing, and debugging Python code. It supports
features such as:
Intelligent code completion
Project navigation
Integrated debugging and testing tools
Virtual environment support
Page | 5
In this project, PyCharm helped streamline the development
process by providing a feature-rich environment that enhances
productivity and efficiency.
It also supports version control (like Git) and integrates well with
various Python libraries, making it ideal for the smooth execution
of Python-based projects.
4. MediaPipe by Google
Version: MediaPipe 0.8.x (or compatible)
Purpose and Role:
MediaPipe is a framework developed by Google that allows real-
time, cross-platform processing for computer vision and machine
learning tasks. In this project, the MediaPipe Hands solution is
used for:
Real-time hand tracking: It detects the 21 hand landmarks
(keypoints) from the webcam video feed, which represent
key positions on the hand (e.g., wrist, knuckles, fingers).
Landmark Detection: MediaPipe processes each hand
frame and provides the x, y, z coordinates for each of the 21
Page | 6
landmarks. These coordinates are essential for detecting
gestures like pointing, fist, or specific finger movements.
Gesture Recognition: By analyzing the relative positions of
the hand landmarks (angles, distances), gestures are
identified to perform actions like moving the cursor,
clicking, scrolling, etc.
MediaPipe is lightweight and performs very efficiently, making it suitable
for real-time processing even with basic hardware.
5. PyAutoGUI
Version: PyAutoGUI 0.9.x (or compatible)
Purpose and Role:
PyAutoGUI is a Python automation library used for controlling the
mouse and keyboard programmatically. In this project, it is used
to:
Move the mouse: The x, y coordinates obtained from the
hand landmarks are mapped to screen coordinates, and
PyAutoGUI is used to move the mouse pointer to the
calculated position on the screen.
Click actions: Based on hand gestures (e.g., index finger
pointing, fist, etc.), PyAutoGUI performs mouse actions like
left-click, right-click, and double-click.
Take screenshots: The project uses PyAutoGUI's screenshot
feature to capture the screen when a certain gesture (like
making a fist) is detected.
Scrolling: PyAutoGUI's scroll function is used to simulate
mouse wheel scroll actions, based on gestures such as
pinching with the thumb and another finger.
PyAutoGUI is crucial for creating the virtual mouse experience by
providing a way to interact with the system as though the user were
using a physical mouse.
6. Pynput
Version: Pynput 1.x (or compatible)
Purpose and Role:
Pynput is a library used to control and monitor input devices,
including the mouse and keyboard.
In this project, Pynput is used for:
Page | 7
Mouse control: Specifically for simulating mouse button
presses (left click, right click, double-click). While PyAutoGUI
also handles this task, Pynput can be used for low-level
control, offering more responsiveness in some cases.
Button press simulation: Pynput can trigger mouse button
presses and releases, making the interaction smoother and
more reliable during gesture recognition.
It also ensures that the mouse actions (clicks) are
recognized correctly when hand gestures such as bending
the index finger or making a fist are detected.
7. NumPy
Version: NumPy 1.x (or compatible)
Purpose and Role:
NumPy is a popular library used for numerical computing in
Python. It is essential for tasks involving arrays and complex
mathematical operations.
In this project, NumPy might be used indirectly in functions that
involve:
Calculating distances: The Euclidean distance between two
hand landmarks (e.g., thumb and index finger) can be
computed using NumPy’s vectorized operations, improving
efficiency.
Angle calculations: The relative angles between landmarks (e.g.,
the angle between the thumb, index, and middle fingers) can
also be computed using NumPy to classify specific gestures.
If included, NumPy is helpful in optimizing mathematical operations,
making the code faster and more efficient.
Page | 8
bending fingers (for clicks) or spreading fingers
(for scrolling).
Gesture classification logic: The utility module
helps to modularize the logic for detecting
different hand gestures, reducing the complexity
in the main program and improving
maintainability.
B. HARDWARE:
Details specifications of the device used for creating this project:
DEVICE SPECIFICATIONS:
Device name LAPTOP-TAC90UTT
Processor Intel(R) Core(TM) i5-1035G1 CPU @ 1.00GHz 1.19 GHz
Installed RAM 8.00 GB (7.74 GB usable)
Device ID 0459F067-605E-43E1-BED2-59A6EF2B354F
Product ID 00356-24564-25200-AAOEM
System type 64-bit operating system, x64-based processor
Pen and touch No pen or touch input is available for this display
WINDOWS SPECIFICATIONS:
Edition Windows 11 Home Single Language
Version 24H2
Installed on 07-12-2024
OS build 26100.3915
Page | 9
PROJECT DETAILS
1. Project Overview
The project titled "Virtual Mouse Using Hand Gesture Recognition" is an
innovative solution aimed at replacing traditional hardware input devices with
a vision-based system. The system leverages real-time video input from a
webcam, along with hand gesture detection and tracking, to control mouse
operations such as movement, clicking, scrolling, and even taking
screenshots—all without physical contact.
This system is particularly relevant in a world that is moving toward touchless
technology, where interaction through gestures ensures sanitation,
accessibility, and convenience, especially for people with mobility impairments
or in environments where physical touch is restricted.
2. Working Principle
The project is built using computer vision and hand landmark detection. The
core principle involves:
Capturing video frames through a webcam.
Using MediaPipe to detect 21 hand landmarks on the user's
hand.
Tracking the relative positions of key landmarks (like the thumb,
index, and middle fingers).
Based on the position and gesture, the system performs:
Mouse movement by mapping the index
fingertip's position to the screen.
Left and right clicks through specific gestures like
finger pinching or folding.
Scrolling and zooming using two-finger gestures.
Screenshots through special gestures like making
a fist.
Each gesture is classified in real-time and converted into a corresponding
mouse event using PyAutoGUI and Pynput.
3. System Architecture
The overall system works in the following flow:
Input Device:
Webcam captures live video feed of hand gestures.
Processing Units:
Page | 10
OpenCV handles frame capture, pre-processing, and visual feedback.
MediaPipe detects hand landmarks and tracks gestures.
Custom logic (via util.py) analyzes gestures using landmark positions.
PyAutoGUI/Pynput simulates mouse control events based on gesture
analysis.
Output Device:
Screen reflects mouse movements and interactions.
4. Features Implemented
1. Mouse Cursor Movement
o The cursor follows the movement of the index finger in real-time.
o Movement is smooth and mapped from camera space to screen
space.
2. Left Click
o Triggered when the index finger is bent and thumb is away.
o Gesture is identified by measuring finger angles and distances.
3. Right Click
o Triggered when the middle finger is bent and thumb is away.
o A different angle and distance pattern is used from the left click.
4. Double Click
o Triggered when both index and middle fingers are bent.
o Uses time threshold to avoid accidental multiple clicks.
5. Screenshot Capture
o Triggered when the fist is made (all fingers closed).
o Takes a screenshot and saves it with a unique filename.
6. Scroll Up
o Triggered when thumb and ring finger tips are brought close
together.
o Performs upward scrolling using pyautogui.scroll().
7. Scroll Down
o Triggered when thumb and pinky finger tips are brought close
together.
o Scrolls down by simulating mouse wheel motion.
8. Real-Time Visual Feedback
o On-screen gesture names are displayed (e.g., “Left Click”, “Scroll
Up”).
o Helpful for debugging and user interaction awareness.
Page | 11
5. Algorithm and Logic
Step 1: Capture video from webcam using OpenCV.
Step 2: Detect hand using MediaPipe and extract landmarks.
Step 3: Calculate relative distances or angles between specific
landmarks.
Step 4: Based on position logic (e.g., if distance < threshold), classify
gestures.
Step 5: Use PyAutoGUI/Pynput to trigger corresponding mouse action.
Step 6: Provide visual feedback on the screen for detected gestures.
8. Limitations
Lighting conditions: Poor lighting can affect hand detection accuracy.
Camera quality: Low-resolution webcams may impact precision.
Background noise: Cluttered backgrounds may cause false detections.
Single-user limitation: Multi-user hand gesture detection is not
supported.
Page | 12
CONCLUSION
Page | 13
REFERENCES
MediaPipe Documentation
https://google.github.io/mediapipe
(Used for real-time hand tracking and landmark detection)
OpenCV Documentation
https://docs.opencv.org
(Used for image processing, video capture, and display handling)
PyAutoGUI Documentation
https://pyautogui.readthedocs.io
(Used for automating mouse and keyboard actions)
Pynput Library
https://pynput.readthedocs.io
(Used to control mouse button clicks programmatically)
Python Official Website
https://www.python.org
(General reference for Python programming and standard libraries)
Google Search Engine
https://www.google.com
(Used for researching gesture logic, documentation, and problem-
solving)
YouTube Tutorials
https://www.youtube.com
(Used for understanding MediaPipe, gesture recognition logic, and
implementation tutorials)
GitHub
https://www.github.com
(Used for exploring open-source gesture control projects and code
structure inspiration)
GeeksforGeeks
https://www.geeksforgeeks.org
(Used for coding references and explanations of OpenCV and Python
functions)
Stack Overflow
https://stackoverflow.com
(Used to resolve programming errors and understand third-party library
behavior)
Page | 14