0% found this document useful (0 votes)
11 views7 pages

IoT Journal Paper[1]

This paper presents a vision-based gesture-controlled robotic arm that utilizes a camera and OpenCV for real-time gesture recognition to enable contactless control of physical systems. The system translates recognized hand gestures into commands sent wirelessly to a microcontroller, which operates a 4-degree-of-freedom robotic arm, making it suitable for applications in hygiene-sensitive and hazardous environments. The document details the system architecture, implementation, and experimental results demonstrating its efficiency and responsiveness.

Uploaded by

youcan24261024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views7 pages

IoT Journal Paper[1]

This paper presents a vision-based gesture-controlled robotic arm that utilizes a camera and OpenCV for real-time gesture recognition to enable contactless control of physical systems. The system translates recognized hand gestures into commands sent wirelessly to a microcontroller, which operates a 4-degree-of-freedom robotic arm, making it suitable for applications in hygiene-sensitive and hazardous environments. The document details the system architecture, implementation, and experimental results demonstrating its efficiency and responsiveness.

Uploaded by

youcan24261024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

IEEE Journal on Exploratory Solid-State Computational Devices and Circuits

ABSTRACT In this work, a vision-based gesture-controlled robotic arm is presented for remote and
intuitive control of physical systems. The system uses a camera and OpenCV-based gesture recognition
algorithms to identify predefined hand gestures in real time. These identified gestures are translated into
specific commands, which are wirelessly sent to a microcontroller. The microcontroller decodes these
signals to drive a robotic arm accordingly. This method obviates the requirement for physical contact, thus
making it particularly well-suited for applications involving hygiene, safety, or distance—like industrial
automation, healthcare, and dangerous environments. The paper discusses the overall system architecture,
implementation approach, gesture classification logic, and experimental results that illustrate the efficiency,
accuracy, and responsiveness of the proposed IoT-integrated control system.

INDEX TERM Gesture Recognition, Robotic Arm Control, Computer Vision, OpenCV, Human-Machine
Interaction (HMI), Hand Tracking, Contactless Control, Microcontroller-based Automation, Wireless
Communication, Vision-Based Control System, Real-Time Gesture Detection, Python-based Interface, IoT
Applications, Servo Motor Control, Low-Cost Automation.

I. INTRODUCTION  An end-to-end IoT-integrated robotic control


In recent years, gesture-controlled systems have gained system based on a 4DOF servo-powered
significant traction as intuitive, non-contact human-machine manipulator.
interfaces (HMIs), particularly in applications where  Real-time gesture-command mapping with minimal
hygiene, safety, and accessibility are critical. Conventional latency and high reliability.
robotic arm control methods often involve physical
interfaces such as joysticks, remote controllers, or gloves,  An evaluation of system performance in terms of
which can be cumbersome and unsuitable in sterile or detection accuracy, responsiveness, and gesture-to-
hazardous environments. To overcome these limitations, action delay.
vision-based control systems leveraging computer vision
and artificial intelligence have emerged as a promising II. RELATED WORK
alternative.
This paper presents the design and implementation of a
real-time gesture-controlled robotic arm system using a 2.1Gesture-Controlled Robotic Arms
closed palm gesture as the primary control input. The Gesture control for robotic systems has been an active area
system is based on OpenCV—a widely used open-source of research for years, with applications spanning from
computer vision library—and utilizes the MediaPipe human-robot interaction to assistive robotics. Early systems
framework for accurate and lightweight hand landmark typically relied on wearable devices such as gloves
detection. A standard webcam is used to capture hand
gestures, which are processed in real time to detect a closed equipped with sensors to track hand movements and send
palm. Once recognized, the gesture is mapped to a set of commands to a robotic arm. However, these solutions have
control commands that are wirelessly transmitted to a several limitations, including bulkiness, discomfort, and a
microcontroller. The microcontroller, in turn, controls a 4- requirement for close-range interactions (Gokce et al.,
degree-of-freedom (4DOF) robotic arm using servo motors, 2016).
enabling dynamic movement without the need for physical More recent studies have explored vision-based gesture
contact. recognition, which enables hands-free control. In these
This approach eliminates mechanical complexity, reduces
hardware dependencies, and introduces a scalable solution systems, cameras or depth sensors are used to detect hand
adaptable to various real-world applications, including gestures, allowing for more natural and intuitive control.
assistive robotics, remote surgical tools, and industrial For instance, Park et al. (2019) developed a system where a
automation. Moreover, the integration of Internet of Things depth camera was used to detect hand gestures and control a
(IoT) communication enables wireless interaction between robotic arm. This system, however, was limited by the high
the vision system and the robotic unit, supporting modular cost of the required hardware.
and distributed system designs.
More recent studies have explored vision-based gesture
The contributions of this work are as follows:
recognition, which enables hands-free control. In these
 A robust closed palm gesture recognition system
systems, cameras or depth sensors are used to detect hand
using OpenCV and MediaPipe.

1
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits

gestures, allowing for more natural and intuitive control. arm. The system operates by capturing hand gestures using a
For instance, Park et al. (2019) developed a system where a webcam or camera module, which is processed in real-time
depth camera was used to detect hand gestures and control a using OpenCV and MediaPipe to identify predefined
robotic arm. This system, however, was limited by the high gestures. These gestures are then transmitted to a
cost of the required hardware. microcontroller through Wi-Fi or Bluetooth, which controls
the 4-DOF robotic arm based on the detected gestures.
2.2 Computer Vision for Gesture Recognition
3.1 System Overview
The field of computer vision has witnessed significant The system architecture is divided into the following key
advancements with the introduction of deep learning and modules:
real-time hand tracking technologies. One notable 1. Gesture Recognition (PC/Computer): The first
framework is MediaPipe, developed by Google, which stage of the system is the detection of hand gestures
provides lightweight solutions for real-time hand landmark through a web camera. OpenCV processes the
detection. MediaPipe’s Hand module can track up to 21
video feed in real-time to extract key features such
hand landmarks, enabling accurate gesture recognition (Liu
as the position of the hand and its movements. The
et al., 2020). The use of OpenCV alongside MediaPipe has
MediaPipe Hand Module is used to track the 21
been shown to deliver highly efficient and real-time
landmarks of the hand, which are then mapped to
performance for gesture recognition tasks, making it an
specific gestures.
ideal choice for robotics and automation applications.
Several studies have successfully implemented hand gesture 2. Wireless Communication: Once the gesture is
recognition using OpenCV and MediaPipe. For example, identified, a command is sent wirelessly from the
Kumar and Jadhav (2020) presented a gesture-controlled computer to the microcontroller via Wi-Fi or
robotic arm system that used OpenCV’s Haar Cascade Bluetooth. This allows the control signal to be
Classifiers for detecting hand gestures. However, this transmitted without the need for physical wires,
method often struggled with complex gestures and real-time enabling a truly wireless control system.
performance. In contrast, MediaPipe’s real-time tracking 3. Microcontroller: The microcontroller is responsible
capabilities have proven to be more accurate and efficient, for receiving the control command and processing
especially for dynamic and continuous gestures (Liu et al., it. It interfaces with the servo motors that control
2020). the 4-DOF robotic arm. The microcontroller can
either communicate with the PC via Wi-Fi (MQTT
3.3 IoT Integration in Robotics or HTTP) or Bluetooth (Serial Communication).
The integration of the Internet of Things (IoT) with robotic 4. Robotic Arm: The robotic arm consists of servo
systems has made it possible to remotely control robots and motors controlled by the microcontroller. The
monitor their status via the internet. This trend has been number of degrees of freedom (DOF) of the arm is
accelerated by the proliferation of Wi-Fi and Bluetooth dependent on the number of servo motors. In this
technologies. A significant body of work in IoT-based system, a 4-DOF robotic arm is used, providing
robotic systems has focused on creating networked systems motion along the X, Y, and Z axes as well as
that can operate in distributed environments. In these rotation.
systems, a microcontroller like the ESP32 can serve as the
bridge between the sensor inputs (in this case, the hand 3.2 Gesture Recognition
gesture recognition system) and the robotic hardware. The gesture recognition process is performed in real-time
In their work, Singh et al. (2021) proposed an IoT-enabled using OpenCV and MediaPipe. The steps involved in the
robotic system where an ESP32 was used to control a gesture recognition pipeline are as follows:
robotic arm based on gestures detected by a smartphone 1. Input: The system uses a standard web camera or
application. While this system relied on mobile devices for USB camera to capture video frames. The frames
gesture input, it highlighted the potential of IoT-based are processed in real-time.
robotics to enable flexible and scalable robotic control. 2. Hand Detection: MediaPipe Hand Module is used
Another example is the work of Huang et al. (2022), where to detect and track hand landmarks. The hand
they developed a cloud-based robotic arm system that uses landmarks are key points on the palm and fingers,
IoT protocols such as MQTT to send control commands to totaling 21 points that represent the hand’s position
a robotic arm over Wi-Fi. This system allowed users to and motion.
control a robot from anywhere in the world, emphasizing
the power of IoT for long-range robotic operations.

III. SYSTEM DESIGN & ARCHITECTURE


The proposed system consists of three main components:
gesture recognition, microcontroller control, and the robotic

2
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits

3. Gesture Classification: Based on the hand


landmarks, specific gestures (such as closed palm,
open hand, fist, etc.) are recognized. Each gesture
corresponds to a specific action (e.g., increase
speed, stop motor, etc.). The system translates the
detected gesture into a corresponding command for
the robotic arm.
Mathematical Consideration: The hand’s position and
orientation are calculated using the landmark coordinates
(x, y, z) derived from MediaPipe. Gesture recognition can
be based on the relative positions of these landmarks, such
as:
o Closed Palm: Defined by specific 3.3 Wireless Communication
constraints on the distances between The communication between the PC and microcontroller is
fingertips. handled via Wi-Fi or Bluetooth. The system is designed to
o Open Palm: Identified by the distance allow the microcontroller to receive control commands
between the thumb and pinky. wirelessly, offering flexibility in the setup.
1. Wi-Fi Communication: Using Wi-Fi, the PC can
The gesture recognition process is performed in real-time send commands to the microcontroller through an
using OpenCV and MediaPipe, leveraging landmark MQTT broker or an HTTP server. In this setup, the
coordinates to interpret gesture logic. Here’s the extended microcontroller subscribes to topics in an MQTT
version with mathematical depth: broker or listens to an HTTP endpoint, where it
receives messages representing control signals
1. Landmark Coordinate System (e.g., "Increase Speed" or "Stop").
MediaPipe provides 3D coordinates (xi,yi,zi)(x_i, y_i, z_i) 2. Bluetooth Communication: Alternatively, the
(xi,yi,zi) for each of the 21 landmarks on the hand. These system can use Bluetooth for communication.
coordinates are used to calculate distances, angles, and Using the HC-05 Bluetooth module, the PC sends
relative positions. commands via serial communication to the
microcontroller, which processes the data and
2. Distance Between Landmarks controls the robotic arm accordingly.
To determine whether a hand is open or closed, the
Euclidean distance between fingertips and their 3.4 Microcontroller
corresponding base joints is calculated. The microcontroller acts as the central controller for the
Formula (Euclidean Distance): robotic arm. It is responsible for:
 Receiving control signals via Wi-Fi or Bluetooth.
 Translating the received commands into signals
that control the servo motors.
 Handling the PWM signals that control the speed
and direction of the motors.
3. Finger Curl Detection (Angle Between Vectors) The microcontroller is programmed using the Arduino IDE
To detect if a finger is curled (bent), compute the angle or ESP-IDF framework. It communicates with the servo
between vectors formed by three landmarks: base → mid motors using PWM (Pulse Width Modulation), where the
→ tip. signal sent to the motor determines the angle or position of
Formula (Cosine Rule for Angle between Vectors): the servo.

The microcontroller receives gesture-based commands and


translates them into motion using Pulse Width Modulation
(PWM) signals to control the servo motors.
 A standard servo motor expects a PWM signal
4. Palm Orientation (Normal Vector Calculation) with:
To determine if the palm is facing the camera (important for  Frequency: ~50 Hz (period = 20 ms)
gesture context), calculate the normal vector of the palm  Pulse Width:
plane using three landmark points. o 1 ms → 0°
Let: o 1.5 ms → 90°
 P0=wrist o 2 ms → 180°
 P1=index  Formula to calculate pulse width based on
 P2=pinky angle:

3
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits


 Microcontroller

 Servo Motors (x4)

 Robotic Arm Frame

Where:  Camera
 θ\thetaθ is the desired servo angle (0° to 180°)

 Power Supply
 PWM Duty Cycle Calculation:
 Jumper Wires

Microcontroller- Microcontroller board (e.g., ESP32) for


servo control
Servo Motors -SG90 or MG996R (used for robotic arm
motion)
3.5 Robotic Arm (4-DOF) Robotic Arm Frame- 3D printed or metallic 4-DOF structure
The 4-DOF robotic arm is composed of 4 servo motors, Camera - USB webcam or laptop in-built camera
each controlling one degree of freedom. The arm can (30+ FPS)
perform the following movements: Power Supply -External 5V–6V source for servo motor
1. Shoulder movement: First servo controls the operation
rotation along the X-axis. Jumper wires -For connections between components
2. Elbow movement: Second servo controls the angle
along the Y-axis.
3. Wrist rotation: Third servo provides the rotation 4.2Software setup
around the Z-axis.
4. Gripper control: The fourth servo operates the 
 Python
arm's gripper for grasping objects.
The servo motors are controlled via PWM signals sent from  OpenCV
the microcontroller. The arm's position is adjusted based on  MediaPipe
the commands received from the gesture recognition  Arduino IDE
system.  MQTT/Serial Library

3.6 System Flowchart


Here’s a simple overview of the workflow of the system: Python serves as the core programming language for the
1. Camera Input → Capture hand gestures using a entire gesture recognition and communication system. It is
webcam. used to develop the application that captures live video from
2. Gesture Recognition → Use OpenCV and the camera, detects hand gestures using computer vision,
MediaPipe to detect and classify hand gestures. and sends the appropriate control signals to the
3. Command Transmission → Send the detected microcontroller. Python’s simplicity, extensive support for
gesture’s corresponding command wirelessly to the computer vision libraries, and real-time performance
microcontroller (via Wi-Fi or Bluetooth). capabilities make it an ideal choice for rapid development
4. Microcontroller Processing → microcontroller and testing.
receives the command and sends appropriate PWM OpenCV (Open Source Computer Vision Library) is used to
signals to the servo motors of the robotic arm. capture video frames from the webcam and preprocess them
5. Robotic Arm Response → The arm moves in for gesture detection. It handles tasks like color space
accordance with the gesture. conversion (e.g., RGB to BGR), image flipping, resizing,
and drawing the landmarks on the hand to visualize
IV. EXPERIMENTAL SETUP detection. OpenCV acts as the visual interface layer,
enabling real-time feedback and interaction with the user.
This section describes the hardware, software, and MediaPipe is used for accurate and efficient hand gesture
environment used to implement and test the proposed recognition. It provides a pretrained hand-tracking model
system. that detects 21 hand landmarks from each frame. These
landmarks are then analyzed using geometric relationships
4.1 Hardware setup and distance formulas to classify gestures. MediaPipe's
integration with Python allows seamless gesture detection

4
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits

with low latency, making it ideal for real-time control rule:


systems.
Arduino IDE is used to program the microcontroller that
drives the 4DOF robotic arm. It allows developers to write, where DDD is the distance between two landmarks. A
compile, and upload C/C++ code to the microcontroller. gesture is identified if a set of distances D1,D2,…,DnD_1,
The code on the microcontroller receives gesture commands D_2, \ldots, D_nD1,D2,…,Dn falls within predefined
from the PC and generates appropriate PWM (Pulse Width ranges.
Modulation) signals to control the angles of servo motors in Once a gesture is recognized, the system maps it to a control
the robotic arm. command. This command is sent wirelessly to the
MQTT Library is used when Wi-Fi-based wireless microcontroller via:
communication is selected. It enables lightweight publish-  Bluetooth using serial communication (via
subscribe messaging between the PC (gesture sender) and PySerial).
the microcontroller (gesture receiver). The PC publishes  Wi-Fi using MQTT protocol (for Internet-
gesture commands to specific MQTT topics, and the connected use cases).
microcontroller subscribes to these topics to receive and The microcontroller (e.g., ESP32) receives the signal and
execute the instructions. interprets the command to control servo motors via PWM
Serial Communication Library (such as pyserial) is used (Pulse Width Modulation). PWM signal generation is
for Bluetooth-based communication between the PC and the handled using either the Arduino IDE or ESP-IDF
microcontroller. When using a module like HC-05, gesture environment. The robotic arm is composed of four servo
commands are sent from the Python script via serial data motors, each representing one degree of freedom. The angle
over Bluetooth. This method is reliable for short-range of each motor is calculated based on the gesture received:
communication and doesn't require internet access.

V.METHODOLOGY

The methodology adopted in this research involves the real-


time interpretation of hand gestures using a vision-based
system and the wireless actuation of a 4-DOF robotic arm
using a microcontroller. The complete workflow is designed The overall methodology ensures a continuous feedback
to ensure accuracy, low latency, and portability for practical loop:
deployment in healthcare and industrial settings. 1. Capture gesture → 2. Classify gesture → 3.
The system begins with continuous video capture using a Transmit command → 4. Move robotic arm.
standard USB webcam connected to a PC or edge device. This process is synchronized to operate with minimum
Each frame is processed using the OpenCV library to latency, ensuring smooth control and natural interaction.
standardize size, reduce noise through Gaussian blur, and The use of wireless communication eliminates physical
maintain color balance. OpenCV serves as the primary tethering, making the system ideal for touchless applications
computer vision backbone due to its flexibility and in sensitive environments.
performance in real-time environments (Bradski, 2000,
"The OpenCV Library," Dr. Dobb’s Journal of Software
Tools).
Next, gesture recognition is achieved using Google’s
MediaPipe Hand Tracking framework. This involves a two-
stage process:
 Palm detection using a single-shot detection
model (SSD).
 Hand landmark localization, which identifies key
landmarks of the hand.
Each landmark has 3D coordinates (x, y, z), which are used
to extract gesture patterns. This method is based on the
work by Zhang et al., 2020, “MediaPipe Hands: On-device
Real-time Hand Tracking with 3D Landmarks” (CVPR
Workshops), known for its high speed and precision in
landmark detection.
Once landmarks are acquired, gesture classification is
performed using geometrical relationships between the
landmarks.

Mathematically, gestures are classified using the following

5
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits

VIII.REFERENCE.

[1] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of


Software Tools, 2000.
[2] F. Zhang, V. Bazarevsky, A. Vakunov, G. Sung, S. Chang,
M. Grundmann, “MediaPipe Hands: On-device Real-time
Hand Tracking with 3D Landmarks,” Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern
Recognition Workshops (CVPRW), pp. 143–150, 2020.
[3] T. T. Vu, N. D. Nguyen, and B. M. Nguyen, “Design and
implementation of a real-time hand gesture controlled
robotic arm,” International Journal of Advanced Computer
Science and Applications (IJACSA), vol. 11, no. 4, pp. 583–
589, 2020.
[4] M. Sathish Kumar and K. Rajasekaran, “IoT-based
Robotic Arm Control Using Hand Gestures,” International
Journal of Engineering Research & Technology (IJERT), vol.
9, no. 5, pp. 110–114, May 2020.
[5] P. Rani and S. Arunkumar, “Real-Time Vision-Based
Gesture Controlled Robotic Arm,” International Journal of
Scientific & Engineering Research (IJSER), vol. 12, no. 3, pp.
109–115, Mar. 2021.
[6] K. N. Kaipa and R. Pandey, “Wireless Gesture Control of
a Robotic Arm Using MediaPipe,” Journal of Intelligent
Systems and Robotics, vol. 4, no. 1, pp. 42–48, 2022.
[7] A. D. Singh and H. Sharma, “Design of 4-DOF Robotic
Arm and Control via Human Hand Gesture,” International
Journal of Scientific Research in Engineering and
Management (IJSREM), vol. 5, no. 10, pp. 25–31, Oct.
2021.

6
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits

You might also like