0% found this document useful (0 votes)
11 views

ProjectReview1 Report Document

Uploaded by

rahulict518
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

ProjectReview1 Report Document

Uploaded by

rahulict518
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

HandSense: AI-Powered Gesture Recognition

PROJECT WORK1(REVIEW1)

Submitted by

Kirtik Roshan G R 212221043002


Ragul E 212221043005
Rahul K 212221043006

in partial fulfillment of the award of

the degree of

BACHELOR OF ENGINEERING
in

COMPUTER SCIENCE AND ENGINEERING

SAVEETHA ENGINEERING COLLEGE, THANDALAM


An Autonomous Institution Affiliated to
ANNA UNIVERSITY - CHENNAI 600 025

NOVEMBER 2024
LIST OF ABBREVIATIONS

CV - OpenCV (Open Source Computer Vision Library)

FVS - Face Verification System

FVSLD - Face Verification System with Liveness Detection

LD - Liveness Detection

TF - TensorFlow (Machine Learning Framework)

Keras - A high-level neural networks API

Python - A high-level programming language

OCR - Optical Character Recognition

AI - Artificial Intelligence

CNN - Convolutional Neural Network

DL - Deep Learning

ML - Machine Learning

TABLE OF CONTENTS
CHAPTE PAGE
R NO TITLE NO

1 Abstract 4

2 Existing System 5

3 Introduction 6

4 Literature Review 20

5 References 28

ABSTRACT:
Hand gesture recognition systems received great attention in the recent few
years because of their manifoldness applications and the ability to interact with
machines efficiently through human-computer interaction. Due to the effect of
lighting and complex background, most visual hand gesture recognition systems
work only under restricted environment. With the rapid development of computer
vision, the demand for interaction between human and machine is becoming more
and more extensive. Since hand gestures are able to express enriched information,
the hand gesture recognition is widely used in robot control, intelligent furniture and
other aspects. One of the technical possibilities to implement hand gesture detection
systems is to use the vision-based approach. The dataset uses all the required
gestures. With all the features, an OpenCV and keras a hand gesture prediction
model is built. The validation results indicate the precision and accuracy of the
proposed model.
EXISTING SYSTEM:
Hand gesture recognition with surface electromyography (sEMG) is
indispensable for Muscle-GestureComputer Interface. The usual focus of it is upon
performance evaluation involving the accuracy and robustness of hand gesture
recognition. However, addressing the reliability of such classifiers has been absent,
to our best knowledge. This may be due to the lack of consensus on the definition of
model reliability in his field. This paper has raised a concern about model reliability
in sEMG-based hand gesture recognition. By defining the model reliability R as the
quality of its uncertainty measures and providing an offline framework to investigate
it, we have demonstrated that ECNN has great potential for classifying finger
movements.

Drawbacks:

• Accuracy is low.
• Electromyography Signals are used to prepare the dataset.
• It is a complex process.

INTRODUCTION:
Hand gestures are an important part of nonverbal communication and form an
integral part of our interactions with the environment. Notably, sign language is a
set of hand gestures that is valuable to millions of disabled people. However,
deaf/dumb users experience difficulty in communicating with the outside world as
most neither understand nor can use sign language. Gesture recognition and
classification platforms can aid in translating the gestures to those who do not
understand sign language. There are two major approaches in the classification of
hand gestures. The first approach is the vision-based approach. This involves the use
of cameras to acquire the pose and movement of the hand and algorithms to process
the recorded images. Although this approach is popular, it is very computationally
intensive, as images or videos have to undergo significant preprocessing to segment
features such as the image’s color, pixel values, and shape of hand.

Domain overview:
4.1 Data Science:

Data science is an interdisciplinary field that uses scientific methods,


processes, algorithms and systems to extract knowledge and insights from structured
and unstructured data, and apply knowledge and actionable insights from data across
a broad range of application domains.

The term "data science" has been traced back to 1974, when Peter Naur proposed it
as an alternative name for computer science. In 1996, the International Federation
of Classification Societies became the first conference to specifically feature data
science as a topic. However, the definition was still in flux.

The term “data science” was first coined in 2008 by D.J. Patil, and Jeff
Hammer bacher, the pioneer leads of data and analytics efforts at LinkedIn and
Facebook. In less than a decade, it has become one of the hottest and most trending
professions in the market.

Data science is the field of study that combines domain expertise,


programming skills, and knowledge of mathematics and statistics to extract
meaningful insights from data.

Data science can be defined as a blend of mathematics, business acumen, tools,


algorithms and machine learning techniques, all of which help us in finding out the
hidden insights or patterns from raw data which can be of major use in the formation
of big business decisions.

Data Scientist:

Data scientists examine which questions need answering and where to find
the related data. They have business acumen and analytical skills as well as the
ability to mine, clean, and present data. Businesses use data scientists to source,
manage, and analyze large amounts of unstructured data.

Required Skills for a Data Scientist:

• Programming: Python, SQL, Scala, Java, R, MATLAB.


• Machine Learning: Natural Language Processing, Classification, Clustering.
• Data Visualization: Tableau, SAS, D3.js, Python, Java, R libraries.
• Big data platforms: MongoDB, Oracle, Microsoft Azure, Cloudera.
1.2 ARTIFICIAL INTELLIGENCE:
Artificial intelligence (AI) refers to the simulation of human intelligence in
machines that are programmed to think like humans and mimic their actions. The
term may also be applied to any machine that exhibits traits associated with a
human mind such as learning and problem-solving.

Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to


the natural intelligence displayed by humans or animals. Leading AI textbooks
define the field as the study of “intelligent agents” any system that perceives its
environment and takes actions that maximize its chance of achieving its goals.

Some popular accounts use the term “artificial intelligence” to describe


machines that mimic “cognitive” functions that humans associate with the human
mind, such as “learning” and “problem solving”, however this definition is rejected
by major AI researchers.

Artificial intelligence is the simulation of human intelligence processes by


machines, especially computer systems. Specific applications of AI include expert
systems, natural language processing, speech recognition and machine vision.

AI applications include advanced web search engines, recommendation


systems (used by Youtube, Amazon and Netflix), Understanding human
speech (such as Siri or Alexa), self-driving cars (e.g. Tesla), and competing at the
highest level in strategic game systems (such as chess and Go), As machines
become increasingly capable, tasks considered to require “intelligence” are often
removed from the definition of AI, a phenomenon known as the AI effect. For
instance, optical character recognition is frequently excluded from things considered
to be AI, having become a routine technology.

Artificial intelligence was founded as an academic discipline in 1956, and in


the years since has experienced several waves of optimism, followed by
disappointment and the loss of funding (known as an “AI winter”), followed by new
approaches, success and renewed funding.

AI research has tried and discarded many different approaches during its
lifetime, including simulating the Paddy Leaves, modeling human problem solving,
formal logic, large databases of knowledge and imitating animal behavior. In the
first decades of the 21st century, highly mathematical statistical machine learning has
dominated the field, and this technique has proved highly successful, helping to
solve many challenging problems throughout industry and academia.

The various sub-fields of AI research are centered around particular goals and
the use of particular tools. The traditional goals of AI research
include reasoning, knowledge representation, planning, learning, natural language
processing, perception and the ability to move and manipulate objects. General
intelligence (the ability to solve an arbitrary problem) is among the field’s long-term
goals.

To solve these problems, AI researchers use versions of search and


mathematical optimization, formal logic, artificial neural networks, and methods
based on statistics, probability and economics. AI also draws upon computer
science, psychology, linguistics, philosophy, and many other fields.

The field was founded on the assumption that human intelligence “can be so
precisely described that a machine can be made to simulate it”. This raises
philosophical arguments about the mind and the ethics of creating artificial beings
endowed with human-like intelligence.

These issues have been explored


by myth, fiction and philosophy since antiquity. Science fiction
and futurology have also suggested that, with its enormous potential and power, AI
may become an existential risk to humanity.
As the hype around AI has accelerated, vendors have been scrambling to
promote how their products and services use AI. Often what they refer to as AI is
simply one component of AI, such as machine learning.

AI requires a foundation of specialized hardware and software for writing and


training machine learning algorithms. No one programming language is
synonymous with AI, but a few, including Python, R and Java, are popular.

In general, AI systems work by ingesting large amounts of labeled training


data, analyzing the data for correlations and patterns, and using these patterns to
make predictions about future states.

In this way, a chatbot that is fed examples of text chats can learn to produce
life like exchanges with people, or an image recognition tool can learn to identify
and describe objects in images by reviewing millions of examples.

AI programming focuses on three cognitive skills: learning, reasoning and


self-correction.

Learning processes. This aspect of AI programming focuses on acquiring


data and creating rules for how to turn the data into actionable information. The rules,
which are called algorithms, provide computing devices with step-by-step
instructions for how to complete a specific task.

Reasoning processes. This aspect of AI programming focuses on choosing


the right algorithm to reach a desired outcome.
Self-correction processes. This aspect of AI programming is designed to
continually fine-tune algorithms and ensure they provide the most accurate results
possible.

AI is important because it can give enterprises insights into their operations


that they may not have been aware of previously and because, in some cases, AI can
perform tasks better than humans. Particularly when it comes to repetitive, detail-
oriented tasks like analyzing large numbers of legal documents to ensure relevant
fields are filled in properly, AI tools often complete jobs quickly and with relatively
few errors.

Artificial neural networks and deep learning artificial intelligence


technologies are quickly evolving, primarily because AI processes large amounts of
data much faster and makes predictions more accurately than humanly possible.

1.3 DEEP LEARNING

Deep learning is a branch of machine learning which is completely based on


artificial neural networks, as neural network is going to mimic the human disease so
deep learning is also a kind of mimic of human disease. It’s on hype nowadays
because earlier we did not have that much processing power and a lot of data. A
formal definition of deep learning is- neurons Deep learning is a particular kind of
machine learning that achieves great power and flexibility by learning to represent
the world as a nested hierarchy of concepts, with each concept defined in relation to
simpler concepts, and more abstract representations computed in terms of less
abstract ones. In disease approximately 100 billion neurons all together this is a
picture of an individual neuron and each neuron is connected through thousands of
their neighbors. The question here is how it recreates these neurons in a computer.
So, it creates an artificial structure called an artificial neural net where we have nodes
or neurons. It has some neurons for input value and some for output value and in
between, there may be lots of neurons interconnected in the hidden layer.
It need to identify the actual problem in order to get the right solution and it
should be understood, the feasibility of the Deep Learning should also be checked
(whether it should fit Deep Learning or not). It needs to identify the relevant data
which should correspond to the actual problem and should be prepared accordingly.
Choose the Deep Learning Algorithm appropriately. Algorithm should be used while
training the dataset. Final testing should be done on the dataset

Deep learning (also known as deep structured learning) is part of a broader


family of machine learning methods based on artificial neural
networks with representation learning. Learning can be supervised, semi-
supervised or unsupervised.
Deep-learning architectures such as deep neural networks, deep belief
networks, deep reinforcement learning, recurrent neural networks and convolutional
neural networks have been applied to fields including computer vision, speech
recognition, natural language processing, machine translation, bioinformatics, drug
design, medical image analysis, material inspection and board game programs,
where they have produced results comparable to and in some cases surpassing
human expert performance.

Artificial neural networks (ANNs) were inspired by information processing


and distributed communication nodes in biological systems. ANNs have various
differences from biological disease. Specifically, neural networks tend to be static
and symbolic, while the biological disease of most living organisms is dynamic
(plastic) and analogue.

The adjective "deep" in deep learning refers to the use of multiple layers in
the network. Early work showed that a linear perceptron cannot be a universal
classifier, but that a network with a non-polynomial activation function with one
hidden layer of unbounded width can. Deep learning is a modern variation which is
concerned with an unbounded number of layers of bounded size, which permits
practical application and optimized implementation, while retaining theoretical
universality under mild conditions. In deep learning the layers are also permitted to
be heterogeneous and to deviate widely from biologically
informed connectionist models, for the sake of efficiency, trainability and
understandability, whence the "structured" part.

Deep learning is a class of machine learning algorithms that uses multiple


layers to progressively extract higher-level features from the raw input. For example,
in image processing, lower layers may identify edges, while higher layers may
identify the concepts relevant to a human such as digits or letters or faces.

Interpretations:
Deep neural networks are generally interpreted in terms of the universal
approximation theorem or probabilistic inference.

The classic universal approximation theorem concerns the capacity of feed-


forward neural networks with a single hidden layer of finite size to
approximate continuous functions. In 1989, the first proof was published by George
Cybenko for sigmoid activation functions and was generalised to feed-forward
multi-layer architectures in 1991 by Kurt Hornik. Recent work also showed that
universal approximation also holds for non-bounded activation functions such as the
rectified linear unit.

The universal approximation theorem for deep neural networks concerns the
capacity of networks with bounded width but the depth is allowed to grow proved
that if the width of a deep neural network with ReLU activation is strictly larger than
the input dimension, then the network can approximate any Lebesgue integrable
function; If the width is smaller or equal to the input dimension, then deep neural
network is not a universal approximator.

The probabilistic interpretation derives from the field of machine learning. It


features inference, as well as the optimization concepts of training and testing,
related to fitting and generalization, respectively. More specifically, the probabilistic
interpretation considers the activation nonlinearity as a cumulative distribution
function. The probabilistic interpretation led to the introduction
of dropout as regularizer in neural networks. The probabilistic interpretation was
introduced by researchers including Hopfield, Widrow and Narendra and
popularized in surveys such as the one by Bishop.
Deep learning revolution:
In 2012, a team led by George E. Dahl won the "Merck Molecular Activity
Challenge" using multi-task deep neural networks to predict the biomolecular
target of one drug. In 2014, Hochreiter's group used deep learning to detect off-target
and toxic effects of environmental chemicals in nutrients, household products and
drugs and won the "Tox21 Data Challenge" of NIH, FDA and NCATS.

Significant additional impacts in image or object recognition were felt from


2011 to 2012. Although CNNs trained by back-propagation had been around for
decades, and GPU implementations of NNs for years, including CNNs, fast
implementations of CNNs on GPUs were needed to progress on computer vision. In
2011, this approach achieved for the first time superhuman performance in a visual
pattern recognition contest. Also in 2011, it won the ICDAR Chinese handwriting
contest, and in May 2012, it won the ISBI image segmentation contest. Until 2011,
CNNs did not play a major role at computer vision conferences, but in June 2012, a
paper by Ciresan et al. at the leading conference CVPR showed how max-pooling
CNNs on GPU can dramatically improve many vision benchmark records.

In October 2012, a similar system by Krizhevsky et al. won the large-


scale ImageNet competition by a significant margin over shallow machine learning
methods. In November 2012, Ciresan et al.'s system also won the ICPR contest on
analysis of large medical images for cancer detection, and in the following year also
the MICCAI Grand Challenge on the same topic. In 2013 and 2014, the error rate on
the ImageNet task using deep learning was further reduced, following a similar trend
in large-scale speech recognition.

Image classification was then extended to the more challenging task


of generating descriptions (captions) for images, often as a combination of CNNs
and LSTMs.
Some researchers state that the October 2012 ImageNet victory anchored the
start of a "deep learning revolution" that has transformed the AI industry.

In March 2019, Yoshua Bengio, Geoffrey Hinton and Yann LeCun were awarded
the Turing Award for conceptual and engineering breakthroughs that have made
deep neural networks a critical component of computing.

5. PROPOSED SYSTEM:

We proposed a new and robust deep learning model based on a convolutional


neural network (CNN) to automatically detect hand gesture movements. We used
whole images, so it was not necessary to perform any pre-processing or the waste
types, samples of more number of images are collected that comprised of different
class data. Different hand movement’s images are collected for each classes that was
classified into input images. The DL method used in the study is the Convolutional
Neural Network (CNN). It is predicted that the success of the obtained results will
increase if the CNN method is supported by adding extra feature extraction methods
and detection of hand gesture movements successfully.

5.1 Advantages:

• Accuracy may be improvised.


• No signals are required to prepare the dataset.

6. PREPARING DATASET:

This dataset contains approximately train and test image records of features
extracted, which were then classified into number of classes:
ThumbsUp

ThumbsDown

Callme

Looser

etc

7. LITERATURE SURVEY

General
A literature review is a body of text that aims to review the critical points of current
knowledge on and/or methodological approaches to a particular topic. It is secondary
sources and discuss published information in a particular subject area and sometimes
information in a particular subject area within a certain time period.
Its ultimate goal is to bring the reader up to date with current literature on a
topic and forms the basis for another goal, such as future research that may be needed
in the area and precedes a research proposal and may be just a simple summary of
sources. Usually, it has an organizational pattern and combines both summary and
synthesis.
A summary is a recap of important information about the source, but a synthesis is
a re-organization, reshuffling of information. It might give a new interpretation of
old material or combine new with old interpretations or it might trace the intellectual
progression of the field, including major debates. Depending on the situation, the
literature review may evaluate the sources and advise the reader on the most
pertinent or relevant of them. Loan default trends have been long studied from a
socio-economic stand point.
Most economics surveys believe in empirical modeling of these complex
systems in order to be able to predict the loan default rate for a particular individual.
The use of machine learning for such tasks is a trend which it is observing now.
Some of the survey’s to understand the past and present perspective of loan approval
or not.

Review of Literature Survey

Title : Hand Gesture Recognition Based on Computer Vision: A Review of


Techniques

Author : Munir Oudah 1 , Ali Al-Naji 1,2,* and Javaan Chahl 2

Year : 2020

Hand gestures are a form of nonverbal communication that can be used in


several fields such as communication between deaf-mute people, robot control,
human–computer interaction (HCI), home automation and medical applications.
Research papers based on hand gestures have adopted many different techniques,
including those based on instrumented sensor technology and computer vision. In
other words, the hand sign can be classified under many headings, such as posture
and gesture, as well as dynamic and static, or a hybrid of the two. This paper focuses
on a review of the literature on hand gesture techniques and introduces their merits
and limitations under different circumstances. In addition, it tabulates the
performance of these methods, focusing on computer vision techniques that deal
with the similarity and difference points, technique of hand segmentation used,
classification algorithms and drawbacks, number and types of gestures, dataset used,
detection range (distance) and type of camera used. This paper is a thorough general
overview of hand gesture methods with a brief discussion of some possible
applications.

Title : Hand Gesture Recognition with Skin Detection and Deep Learning Method

Author : Hanwen Huang1 , Yanwen Chong2*

Year : 2019

Gesture recognition, although has been exploring for many years, is still a
challenging problem. Complex background, camera angles and illumination
conditions make the problem more difficult. Thus, this paper presents a fast and
robust method for hand gesture recognition based on RGB video. First we detect the
skin based on their color. Then we extract the contour and segment the hand region.
Finally we recognize the gesture. The results of experiment demonstrate that the
proposed method are efficient to recognize gesture with a higher accuracy than the
state of the art.

Title : Hand Gesture Recognition System Using Camera

Author : Viraj Shinde, Tushar Bacchav, Jitendra Pawar

Year : 2014

In this paper, we focus on using pointing behavior for a natural interface, Hand
gesture recognition based human-machine interface is being developed vigorously
in recent years. Due to the effect of lighting and complex background, most visual
hand gesture recognition systems work only under restricted environment. To
classify the dynamic hand gestures, we developed a simple and fast motion history
image based method. In recent years, the gesture control technique has become a
new developmental trend for many human-based electronics products. This
technique let people can control these products more naturally, intuitively and
conveniently. In this paper, a fast gesture recognition scheme is proposed to be an
interface for the human-machine interaction (HMI) of systems. This paper presents
some low-complexity algorithms and gestures to reduce the gesture recognition
complexity and be more suitable for controlling real-time computer systems.

Title : Static Hand Gesture Recognition using Convolutional Neural Network with
Data Augmentation

Author : Md. Zahirul Islam, Mohammad Shahadat Hossain

Computer is a part and parcel in our day to day life and used in various fields.
The interaction of human and computer is accomplished by traditional input devices
like mouse, keyboard etc. Hand gestures can be a useful medium of human-computer
interaction and can make the interaction easier. Gestures vary in orientation and
shape from person to person. So, non-linearity exists in this problem. Recent
research has proved the supremacy of Convolutional Neural Network (CNN) for
image representation and classification. Since, CNN can learn complex and non-
linear relationships among images, in this paper, a static hand gesture recognition
method using CNN was proposed. Data augmentation like re-scaling, zooming,
shearing, rotation, width and height shifting was applied to the dataset. The model
was trained on 8000 images and tested on 1600 images which were divided into 10
classes. The model with augmented data achieved accuracy 97.12% which is nearly
4% higher than the model without augmentation (92.87%).
Title : Real-Time Hand Gesture Recognition Using Finger Segmentation

Author : Zhi-hua Chen,1 Jung-Tae Kim,1 Jianning Liang

Year : 2014

Hand gesture recognition is very significant for human-computer interaction.


In this work, we present a novel real-time method for hand gesture recognition. In
our framework, the hand region is extracted from the background with the
background subtraction method. Then, the palm and fingers are segmented so as to
detect and recognize the fingers. Finally, a rule classifier is applied to predict the
labels of hand gestures. The experiments on the data set of 1300 images show that
our method performs well and is highly efficient. Moreover, our method shows
better performance than a state-of-art method on another data set of hand gestures.

8. SYSTEM STUDY
Aim:

Hand Gesture is one of the major factors in our communication society. There
are lot of Sign languages that are actively present in the world. So we can’t able to
classify the action easily. So this project can easily classify the hand gesture.

Objectives:
The goal is to develop a deep learning model for Hand gesture classification
by convolutional neural network algorithm for potentially classifying the results in
the form of best accuracy by comparing the CNN architectures.

Scope:

Hand Gesture images are collected. We have to train the machine to classify
the types of gesture. This project contains different types of gesture like ThumbsUp,
Thumbsdown, Callme, Looser, etc. We train to teach the machine to ac`hieve the
accuracy and get the possible outcome.

OUTLINE OF THE PROJECT

Overview of the system:

➢ Define a problem
➢ Gathering image data set
➢ Evaluating algorithms
➢ Detecting results

2 PROJECT REQUIREMENTS

General:
Requirements are the basic constrains that are required to develop a system.
Requirements are collected while designing the system. The following are the
requirements that are to be discussed.

1. Functional requirements

2. Non-Functional requirements

3. Environment requirements

A. Hardware requirements

B. software requirements

9.1 Functional requirements:

The software requirements specification is a technical specification of


requirements for the software product. It is the first step in the requirements analysis
process. It lists requirements of a particular software system. The following details
to follow the special libraries like tensorflow, keras, matplotlib.

9.2 Non-Functional Requirements:

Process of functional steps,

1. Problem define
2. Preparing data
3. Evaluating algorithm
4. Improving results
5. Prediction the result
REFERENCES:
• Zhigang, F. Computer gesture input and its
application in human computer interaction. Mini Micro
Syst. 1999, 6, 418–421. [Google Scholar]
• Mitra, S.; Acharya, T. Gesture recognition: A
survey. IEEE Trans. Syst. Man Cybern. Part C
Appl. Rev. 2007, 37, 311–324. [Google
Scholar] [CrossRef]
• Ahuja, M.K.; Singh, A. Static vision based Hand
Gesture recognition using principal component
analysis. In Proceedings of the 2015 IEEE 3rd
International Conference on MOOCs, Innovation
and Technology in Education (MITE), Amritsar,
India, 1–2 October 2015; pp. 402–406. [Google
Scholar]

You might also like