0% found this document useful (0 votes)

22 views58 pages

Screenshot 2024-03-20 at 11.11.57 PM

Uploaded by

donnoorain69

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views58 pages

Screenshot 2024-03-20 at 11.11.57 PM

Uploaded by

donnoorain69

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

Experiment 1:- Problem identification/Project Title

INTRODUCTION
Human Action Recognition (HAR) is a subfield of computer vision and artificial intelligence that
focuses on the identification and classification of human actions, gestures, and movements in digital
images and video sequences. This technology plays a pivotal role in various applications across different
domains, from security and healthcare to entertainment and education. The primary goal of HAR is to
enable machines to understand and interpret human activities, providing valuable insights and
facilitating automation in a wide range of contexts.
BACKGROUND
In the digital age, the ubiquity of cameras and the exponential growth of video content have created a
vast pool of visual data. Harnessing this data for meaningful insights and automation has become a
crucial challenge. Human Action Recognition arises from the need to extract knowledge and context
from this data by interpreting and categorizing the actions performed by individuals or groups of people.
MOTIVATION
The motivation behind Human Action Recognition is multifaceted. It stems from the growing demand
for automated analysis and understanding of human behavior. The rise in surveillance, the expansion of
wearable devices, and the quest for more natural human-computer interactions have fueled interest in
this field. Moreover, applications in healthcare, security, entertainment, and education have the
potential to transform these industries.
PROBLEM STATEMENT
The problem statement of human action recognition in videos and images involves developing computer
vision and machine learning algorithms to automatically identify and classify the actions or activities
performed by humans in visual data.

1
OBJECTIVE AND SCOPE OF THE PROJECT
The objective of this study is to advance the state-of-the-art in Human Action Recognition, improving
accuracy, speed, and adaptability to different contexts. The scope encompasses the development of
deep learning models, computer vision techniques, and datasets to facilitate action recognition.
Additionally, this research seeks to address real-world challenges in applications like surveillance,
healthcare, entertainment, and education, where accurate action recognition can offer substantial
benefits.
In this era of digital media and technology, Human Action Recognition in Videos and Images presents
an exciting avenue for research and development. The fusion of computer vision, deep learning, and
real-world applications holds the promise of creating smarter, more responsive systems that can
understand, interpret, and interact with human actions in a meaningful and valuable way.Human-
Computer Interaction: Implement the recognition system in applications like gesture control for
computers, augmented reality, or virtual reality.
Evaluation Metrics: Define appropriate metrics to assess the performance of the action recognition
system, including accuracy, precision, recall, and F1 score.
Privacy and Ethics: Address privacy concerns by considering anonymization techniques and ethical
considerations when working with video data.
Adaptability: Design the system to work in different environments and lighting conditions, ensuring
its adaptability to real-world scenarios.
Scalability: Consider the scalability of the system to handle a varying number of actions and
accommodate future expansion.
User Interface: Develop a user-friendly interface for end-users to interact with the recognition system.

2
Experiment 2: Industrial Survey /Literature Review

Introduction
A literature review, literature survey is a text of a scholarly paper, which includes the current knowledge
including substantive findings, as well as theoretical and methodological contributions to a particular
topic. It is conducted before the project commences to give an idea about the existing systems in that
field and their pros and cons and involves a study and review of relevant literature materials in relation
to a topic you have been given. In this survey a research are done on three Research Papers of the
suspicious chat log investigation and their implementation.

IEEE RESEARCH PAPERS

1)Deep Learning for Human Action Recognition
The aim of this project is to develop a model for human actions such as running, jogging, walking,
clapping, hand- waving and boxing. A series of videos is given for the layout, where an individual
executes an event in each video. The action performed on that particular video will be the label of a
video. This relationship must be learned by the model, and the label of an input (video) which he never
saw can then be predicted. Technically, despite descriptions of these acts, the model would need to learn
to distinguish between various human behaviors. There may be many content identification programs
which can work on following jobs like Active object tracking for identifying an item such as a vehicle
or a human from a CCTV picture and learning the patterns in the movement of humans when we are
able to create a pattern that will guide us (humans) to perform a variety of activities.

2)Human Activity Recognition: A Review

Author:-Ong Chin Ann
Human Activity Recognition is one of the active research areas in computer vision for various contexts
like security surveillance, healthcare and human computer interaction. In this paper, a total of thirty-
two recent research papers on sensing technologies used in HAR are reviewed. The review covers three
area of sensing technologies namely RGB cameras, depth sensors and wearable devices. It also
discusses on the pros and cons of the mentioned sensing technologies. The findings showed that RGB
cameras have lower popularity when compared to depth sensors and wearable devices in HAR research.

3
3)Implementation of Human Action Recognitionusing Image Parsing Techniques
Author:- Soumalya Sen
Human activity recognition plays a significant role in human-to-human interaction and interpersonal
relations. Because it provides information about the identity of a person, their personality, and
psychological state, it is difficult to extract. The human ability to recognize another person’s activities
is one of the main subjects of study of the scientific areas of computer vision and machine learning. As
a result of this research, many applications, including video surveillance systems, human-computer
interaction, and robotics for human behavior characterization, require a multiple activity recognition
system. In image and video analysis, human activity recognition is an important research direction. In
the past, a large number of papers have been published on human activity recognition in video and
image sequences. In this paper, we provide a comprehensive survey of the recent development of the
techniques, including methods, systems, and quantitative evaluation of the performance of human
activity recognition.The experimental results show that our method can significantly improve
classification, interpretation, and retrieval performance for the video images. The novelty of this paper
is twofold. First, to capture the video images of human. Secondly, to identify the different types of
action performed by human.

4)Human Action Recognition using DFT

Author-Sonal Kumari
Action is any meaningful movement of the human and it is used to convey information or to interact
naturally without any mechanical devices. Human action recognition is motivated by some of the
applications such as video retrieval, Human robot interaction, to interact with deaf and dumb people etc.
In any Action Recognition System, some preprocessing steps are done for removing the noise caused
because of illumination effects, blurring, false contour etc. Background subtraction is done to remove
the static or slowly varying background. In this paper, multiple background subtraction algorithms are
tested and then one of them is selected for the further process of action recognition. Background
subtraction is also known as foreground/background segmentation or foreground extraction. The next
step is the feature extraction which deals with the extraction of the important feature (like corner points,
optical flow, shape, motion vectors etc.) from the image frame. The proposed novel action recognition
algorithm uses discrete Fourier transform (DFT) of the small image block.

4
5)Human Action Recognition Using SmartphoneSensors

Author-Ashim Saha
As smartphones are becoming ubiquitous, many studies using smartphones are being investigated in
recent years. Further, these smartphones are being laden with several diverse and sophisticated sensors
like GPS sensor, vision sensor (camera), acceleration sensor, audio sensor (microphone), light sensor,
and direction sensor (compass). Activity Recognition is one of the potent research topics, which
can be used to provide effective and adaptive services to users. Our paper is intended to evaluate a
system using smartphone-based sensors used for acceleration, referred to as an accelerometer. To
understand six different human activities using supervised machine learning classiﬁcation; to execute
the model a compiled accelerometer data from different sixteen users are collected as per their usual
day to day routine consisting of sitting, standing, laying down, walking, climbing up and down the
staircase. The sample data thus generated then have been aggregated and combined into examples
upon which supervised machine learning algorithms have been applied to generate predictive models.
To address the limitations of laboratory settings, we have used the Physics Toolbox Sensor Suite with
the Google Android platform to collect these time- series data generated by the smartphone
accelerometer. This kind of activity prediction model can be used to provide insightful information
about millions of human beings merely by making them contain a smartphone with them.
Index Terms—Activity Recognition, Machine Learning, Multi- class classiﬁcation, Smartphone time-
series data, 3-axis Ac- celerometer.

6)A Survey on Human Action Recognition from Videos

Author-Chandni J. Dhamsania
Human action recognition is a way of retrieving videos emerged from Content Based Video Retrieval
(CBVR).It is a growing area of research in the field of computer vision nowadays. Human action
recognition has gained popularity because of its wide applicability in automatic retrieval of videos of
particular action using visual features. The most common stages for action recognition includes: object
and human segmentation, feature extraction, activity detection and classification. This paper describes
the application and challenges of human action recognition. Features and limitations of various methods
for human action recognition are discussed. This paper introduces survey on different types of actions
like single person action recognition, two person or person-object interaction and multiple people action
recognition.

5
7)SCNN: SEQUENTIAL CONVOLUTIONAL NEURAL
NETWORK FOR HUMAN ACTION RECOGNITION IN VIDEOS

Author –Hao Yang, Chunfeng Yuan∗, Junliang Xing, Weiming Hu

Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) are two typical kinds of
neural networks. While CNN models have achieved great success on image recognition due to their
strong abilities in abstracting spatial information from multiple levels, RNN models have not
achieved significant progress in video analyzing tasks (e.g. action recognition), although RNN can
inherently model temporal dependencies from videos. In this work, we propose a Sequential
Convolutional Neural Network, denoted as SCN- N, to extract effective spatial-temporal features from
videos, thus incorporating the strengths of both convolutional operation and recurrent operation. Our
SCNN model extends RNN to directly process feature maps, rather than vectors flattened from feature
maps, to keep spatial structures of the inputs. It replaces the full connections of RNN with
convolutional connections to decrease parameter numbers, computational cost, and over-fitting risk.
Moreover, we introduce asymmet- ric convolutional layers into SCNN to reduce parameter numbers
and computational cost further. Our final SCNN deep architecture used for action recognition achieves
very good performances on two challenging benchmarks, UCF-101 and HMDB-51, outperforming
many state-of-the-art me.

8)Real-Time Human Action Recognition Using CNN Over Temporal Images for
Static Video Surveillance Cameras

Author- Cheng-Bin Jin, Shengzhe Li, Trung Dung Do, and Hakil Kim.
Abstract. This paper proposes a real-time human action recognition approach to static video
surveillance systems. This approach predicts human actions using temporal images and convolutional
neural networks (CNN). CNN is a type of deep learning model that can automatically learn features
from training videos. Although the state-of-the-art methods have shown high accuracy, they consume
a lot of computational resources. Another problem is that many methods assume that exact knowledge
of human positions. Moreover, most of the current methods build complex handcrafted features for
specific classifiers. Therefore, these kinds of methods are difficult to apply in real-world applications.
In this paper, a novel CNN model based on temporal images and a hierarchical action structure is
developed for real-time human action recognition. The hierarchical action structure includes three
levels: action layer, motion layer, and posture layer. The top layer represents subtle actions; the bottom
layer represents posture. Each layer contains one CNN, which means that this model has three CNNs
working together; layers are combined to represent many different kinds of action with a large degree
of freedom. The developed approach was implemented and achieved superior performance for the
ICVL action dataset; the algorithm can run at around 20 frames per second.

6
9)Survey On Feature Extraction Approach for Human Action Recognition in Still
Images and Videos
Author – Pavan M1, Deepika D2, Divyashree R2
Human Action Recognition (HAR) has been a challenging problem yet it needs to be solved.
Recently the detection and recognition of human action has broad range of applications and is
popularized in the field of computer vision. It mainly focuses on to understand human behaviour and
name a label to each action. There are many approaches for action recognizing from both image and
video based actions. Now it is time to review these existing approaches in order to help for future
research. The main aim of this work is to study the various action recognition techniques in videos and
images. The paper presents a brief overview of features of human actions by categorizing as still image-
based and video-based. All related datasets are also introduced in this paper, which will be helpful for
future research.

10)Subspace Analysis Methods plus Motion History Image for Human Action
Recognition

Author – Chunhua Du1,2, Qiang Wu1, ,Jie Yang2.

This paper proposes a new human action recognition method which deals with recognition task in
a quite different way when compared with traditional methods which use sequence matching
scheme. Our method compresses a sequence of an action into a Motion History Image (MHI) on
which low-dimensional features are extracted using subspace analysis methods. Unlike other
methods which use a sequence consisting of several frames for recognition, our method uses only
a MHI per action sequence for recognition. Obviously, our method avoids the complexity as well
as the large computation in sequence matching based methods. Encouraging experimental results
on a widely used database demonstrate the effectiveness of the proposed method.

11)Human action recognition using latent-dynamic condition random

fields
Author – Guangfeng Lin1 , Yindi Fan2, Erhu Zhang.

In video human action recognition of the continual human motion is a difficult point for application.
The method of human action recognition based on latent-dynamic condition random fields is presented.
By star form distance descriptor of human body contour, human pose is extracted. Then in continuous
sequences method building the model of LDCRF shows the mapping relation between action feature
and action semantics. Comparing with traditional CRF and HCRF, by designing the affiliation of latent
feature and human pose, LDCRF implements the modeling in internal action and external movement
feature. In the experiment, Weizmann action database is used, and three experiments are designed.
When composition continuous sequence is tested, except “skip” action, recognition rate reaches over
90%; receiver operating characteristic of three model shows LDCRF moels have the better descriptive
7
capability in internal action and external movement feature;while human action is affected by angle,
accessory and occlusion. It shows LDCRF is robustness in the human body contour integrity situation.

12)Action Recognition by Multiple Features and Hyper-sphere Multi-class SVM

Author – Jia Liu,Jie Yang,Yi Zhang
In this paper we propose a novel framework for action recognition based on multiple features for
improve action recognition in videos. The fusion of multiple features is important for recognizing
actions as often a single feature based representation is not enough to capture the imaging variations
(view-point, illumination etc.) and attributes of individuals (size, age, gender etc.). Hence, we use
two kinds of features: i) a quantized vocabulary of local spatio-temporal (ST) volumes (cuboids and
2-D SIFT), and ii) the higher-order statistical models of interest points, which aims to capture the
global information of the actor. We construct video representation in terms of local space-time
features and global features and integrate such representations with hyper-sphere multi-class SVM.
Experiments on publicly available datasets show that our proposed approach is effective. An
additional experiment shows that using both local and global features provides a richer
representation of human action when compared to the use of a single feature type.

13)Depth Image-based Object Segmentation Scheme for ImprovingHuman

Action Recognition
Author – Sungjoo Park1.
Human action recognition using the 3D camera for surveillance applications is a
promising alternative approach to the conventional 2D camera based surveillance. We
propose a depth image-based object segmentation scheme for improving human action
recognition. Experimental results show that the average accuracy of the dangerous
event detection is improved by about 15% when using the proposed object
segmentation scheme.
14)HUMAN ACTION RECOGNITION USING ROBUST POWER SPECTRUM
FEATURES
Author – Hossein Ragheb, Sergio Velastin.
We propose a new method for human action recognition from video streams that is fast and robust to
noise and to large changes in camera views. We start by extracting features in the Fourier domain once
we obtain the bounding boxes containing the silhouettes of a human for a number of video frames
representing a basic action. After preprocessing, we divide each space-time volume into space-time sub-
volumes (STSV) and compute their corresponding mean-power spectra as our feature vectors. Our
features result in high classification performance even with simple distance measures. We perform an
experimental comparison, using the same data, between our method and two state-of-the-art methods.

8
15)Review on Recent Advances in Human Action Recognition in
Video Data
Author – Akshita Baisware
AI has achieved new heights in image recognition, human action recognition and NLP. It has a vast
area of implementations such as IoT, robotics, biosciences and surveillance. Video based human
action recognition has lots of potential applications that make it the most sought after field for
researchers. With the constant growth of high performance computing, computer vision and GPUs,
deep learning based human activity recognition is one of the constant evolving and promising
stream. This review focuses on recent advancements in the field of action recognition based on deep
learning. The present state of the art techniques for action recognition and prediction as well as the
future scope for the research is discussed in the paper.
16)Human action identification and search in video files
Author – Mirela Kundid
This paper describes an approach for modeling and recognition of human actions within videos.
With millions of videos that are published almost every day, there are new opportunities for research
in the field of search and recognition within the video sequence. Statistical approaches and
approaches based on the description of the model are described in detail in this paper and compared
to a series of videos taken from various on-line databases (KTH, Weizmann, MSR-Action). There
are various approaches to identify actions within video sequences. Approaches that are described
within this paper are based on recognition of the action of a series of images obtained segmentation
and motion picture history by constructing movement (Motion History Images MHI). In this paper
we apply the technique to construct MHI on a series of images obtained from the database used for
the analysis of movement in order to recognize the action within a video (greeting of human in
video).
17)Human Action Recognition Using Deep NeuralNetworks
Authors: - Rashmi R. Koli
Human activities such as body gestures are the most difficult and challenging things in deep neural
networks. Human action recognition is nothing but the human gesture recognition. Gesture shows
a movement of body parts that convey some meaningful message. Gestures are greater most suitable
and natural to have interaction with systems(computer) for humans thus it builds a platform between
humans and machines. Human activity recognition provides the platform to interact with the deaf
and dumb person. In this research work, we introduce to develop a platform for hand movement
recognition, which recognizes hand movement(gestures), by using the CNN, we can identify human
gestures in the image. As we are aware there was a quick increment in the amount of deaf and dumb
victims because of a few conditions. Since deaf and dumb people can't communicate with a typical
person so they want to depend upon a sort of visual correspondence Sign language. The sign
language gives the fine verbal exchange platform for listening to impaired men or women to bring
their minds and to have interaction with a normal character. The purpose of this research is a gadget
that broadens the popularity of gestures, which can recognize gestures and then convert gesture
images into text accordingly. The system pays special attention to the CNN training component
using the CNN algorithm. The concept includes designing a gadget that uses in-depth mastery
standards to treat input as a gesture and then provide recognizable output as text. Keywords—
Human Action Recognition, Deaf and dump, CNN, Hand gesture.

9
18)Human Action Invariancies for Human Action Recognition
Authors: -Nilam Nur Amir Sharif
The uniqueness of the human action shape or silhouette can be used for the human action
recognition. Acquiring the features of human silhouette to obtained the concept of human action
invariancies have led to an important research in video surveillance domain. This paper discusses
the investigation of this concept by extracting individual human action features using integration
moment invariant. Experiment result have shown that human action invariancies are improved with
better recognition accuracy. This has verified that the integration method of moment invariant is
worth explored in recognition of human action in video surveillance.

19)Human action recognition based on recognition of linear patterns in

action bankfeatures using convolutional neural networks
Authors: -Earnest Paul.
In this paper, we proposed a deep convolutional network architecture for recognizing human actions
in videos using action bank features. Action bank features computed against of a predeﬁned set of
videos known as an action bank, contain linear patterns representing the similarity of the video
against the action bank videos. Due to the independence of the patterns across action bank features,
a convolutional neural network with linear masks is considered to capture the local patterns
associated with each action. The knowledge gained through training is used to assign an action label
to videos during testing. Experiments conducted on UCF50 dataset demonstrates the effectiveness
of the proposed approach in capturing and recognizing these linear local patterns.

20)Spatial-temporal Histograms of Gradients andHOD-VLAD Encoding for

Human Action Recognition
Author: -Bo Lin Department of Computer Science Chongqing University Chongqing, China
Automatic human action recognition is a core func- tionality of systems for video surveillance and
human object interaction. In the whole recognition system, feature description and encoding
represent two crucial key steps. In order to construct a powerful action recognition framework it is
important that the two steps must provide reliable performance. In this paper, we proposed a new
human action feature descriptor which is called spatial-temporal histograms of gradients (SPHOG).
SPHOG is based on the spatial and temporal derivation signal, which extracts the gradient changes
between consecutive frames. Compare to the traditional descriptors histograms of optical ﬂow, our
proposed SPHOG costs less computation resource. Vector of Locally Aggregated Descriptors
(VLAD), which is a popular encoding approach for Bag-of-Feature representation. There is a main
drawback of VLAD that it only considers the difference between local descriptor and their centroids.
In order to resolve the weakness, we proposed a improved VLAD method called HOD-VLAD,
which complementary the distribution information of local descriptors by computing a weight
histograms of distance. We validated our proposed algorithm for human action recognition on three
public available datasets KTH, UCF Sports and HMDB51. The evaluation experiment results
indicate that the proposed descriptor and encoding method can improve the efﬁciency of human
action recognition and the recognition accuracy.

10
Experiment 3: Project Proposal
SYSTEM PLANNING

STEPS INVOLVED IN THE SYSTEM DEVELOPMENT LIFE CYCLE:

Below are the steps involved in the System Development Life Cycle. Each phase within the
overall cycle may be made up of several steps.

Step 1: Software Concept

The software concept for a human action recognition project in videos and images involves designing
a system that can automatically detect and classify human actions in visual data.
Human action recognition is a fundamental task in computer vision and artificial intelligence, aimed
at understanding and interpreting human movements from digital data, typically video sequences.
This concept revolves around developing software systems that can automatically detect and
categorize various human actions, such as walking, running, waving, or even complex activities like
dancing or playing sports.

Step 2: Requirements Analysis

Human action recognition is a field within computer vision and artificial intelligence that focuses on
understanding and categorizing human movements and actions in videos or images. It has numerous
applications, including surveillance, healthcare, sports analytics, and more.

To achieve accurate human action recognition, several key steps are involved. These steps include
data,collection,pre-processing,feature-extraction,andclassification.
11
Step 3: Architectural Design
Architectural design for human action recognition typically involves computer vision and deep
learning techniques. A common approach is to use Convolutional Neural Networks (CNNs) and
Recurrent Neural Networks (RNNs) in combination..

The design will serve as a blueprint for the system and helps detect problems before these errors
or problems are built into the final system.Professionals create the system design,but must review
their work with the users to ensure the design meets users'needs.
Step 4: Coding and Debugging
Human action recognition involves identifying and categorizing various actions performed by
individuals, such as walking, running, or dancing. This field has applications in surveillance,
healthcare, sports analysis, and more.
In the coding phase, developers typically use programming languages like Python and deep learning
libraries such as TensorFlow or PyTorch to build and train action recognition models. They start by
collecting labeled datasets containing video clips or image sequences of various actions. These
datasets are then preprocessed to extract meaningful features from the data, such as optical flow or
3D joint positions.
Step 5: System Testing
System testing of human action recognition involves the evaluation of software or hardware
systems designed to identify and classify human actions from various data sources, such as videos,
images, or sensor data. This critical step ensures the system's accuracy, reliability, and
performance.
Step 6: Maintenance
To maintain progress in human action recognition, continuous data collection, algorithm
development, addressing real-world challenges, resource updates, and interdisciplinary collaboration
are essential. These efforts ensure the continued advancement of action recognition systems for
various applications, from surveillance to healthcare.

12
There are various software process models like

 Prototyping Model

 RADModel
 The SpiralModel
 The WaterfallModel
 The IterativeModel

Of all these process models we’ve used the Iterative model (The Linear Sequential Model) for the
development of our project.
The Iterative model

Iterative process starts with a simple implementation of a subset of the software requirements and
iteratively enhances the evolving versions until the full system is implemented. At each iteration,
design modifications are made and new functional capabilities are added. The basic idea behind this
method is to develop a system through repeated cycles (iterative) and in smaller portions at a time
(incremental).

13
The model consists of six distinct stages, namely

1. In the requirements analysis phase

(a) The problem is specified along with the desired service objectives(goals)

(b) The constraints are identified

2. In the specification phase the system specification is produced from the

detailed definitions of (a) and (b) above. This document should clearly define the
product function.
3. In the system and software design phase, the system specifications are translated
into a software representation. The software engineer at this stage is concerned with:
 Data structure

 Software architecture

 Algorithmic detail

 Interface representations

The hardware requirements are also determined at this stage along with a picture of
the overall system architecture. By the end of this stage should the software engineer
should be able to identify the relationship between the hardware, software and the
associated interfaces. Any faults in the specification should ideally not be passed
down stream.
4. In the implementation and testing phase stage the designs are translated into the
software domain

 Detailed documentation from the design phase can significantly reduce the
coding effort.

 Testing at this stage focuses on making sure that any errors are identified and
that the software meets its required specification.
5. In the integration and system testing phase all the program units are integrated and
tested to ensure that the complete system meets the software requirements. After this
stage the software is delivered to the customer [Deliverable – The software product
is delivered to the client for acceptance testing.]

6. The maintenance phase the usually the longest stage of the software. In this phase

14
the software is updated to:

 Meet the changing customer needs

 Adapted to accommodate changes in the external environment
 Correct errors and oversights previously undetected in the testing phases enhancing
the efficiency of the software
Advantages of the Iterative Model:
 Testing is inherent to every phase of the Iterative model
 It is an enforced disciplined approach
 It is documentation driven, that is, documentation is produced at every stage

Disadvantages of the Iterative Model:

 The waterfall model is the oldest and the most widely used paradigm. However, many
projects rarely follow its sequential flow. This is due to the inherent problems
associated with its rigid format. Namely:
 It only incorporates iteration indirectly, thus changes may cause considerable
confusion as the project progresses.
Observe that feedback loops allow for corrections to be incorporated into the model.

For example, a problem/update in the design phase requires a ‘revisit’ to the

specificationsphase. When changes are made at any phase, the relevant documentation
should be updatedto reflect that change.

15
TIMELINE CHART

A timeline chart is an effective way to visualize a process using chronological order. Since
details are displayed graphically, important points in time can be easy seen and understood.
Often used for managing a project’s schedule, timeline charts function as a sort of calendar of
events within a specific period of time.
A Timeline chart is constructed with a horizontal axis representing the total time span of the
project, broken down into increments (for example, days, weeks, or months) and a vertical axis
representing the tasks that make up the project (for example, if the project is outfitting your
computer with new software, the major tasks involved might be: conduct research,
choose software, install software). Horizontal bars of varying lengths represent the sequences,
timing, and time span for each task. Using the same example, you would put conduct research"
at the top of the vertical axis and draw a bar on the graph that represents the amountof time you
expect to spend on the research, and then enter the other tasks below the first one and
representative bars at the points in time when you expect to undertake them.
The bar spans may overlap, as, for example, you may conduct research and choose software
during the same time span. As the project progresses, secondary bars, arrowheads, ordarkened
bars may be added to indicate completed tasks, or the portions of tasks that have been
completed. A vertical line is used to represent the report date.

16
.

17
SYSTEM DESIGN

Introduction:

It is a process of collecting and interpreting facts, identifying the problems, and decomposition
of a system into its component. System analysis is conducted for the purpose of studying a
system or its parts in order to identify its objectives. It is a problem-solving technique that
improves the system and ensures that all the components of the system work efficiently to
accomplish their purpose.

Analysis specifies what the system should do. It is a process of planning a new business system
or replacing an existing system by defining its components or modules to satisfy the specific
requirements. Before planning, you need to understand the old system thoroughly and
determine how computers can best be used in order to operate efficiently. System Design
focuses on how to accomplish the objective of the system.

18
BLOCK DIAGRAM

Training Phase
1. Block Based Silhouette Extraction: - Block-based silhouette extraction is a technique used
in computer vision and image processing to identify and extract the silhouette or outline of
objects in an image. The process involves dividing the image into blocks or regions and
analyzing each block to determine if it contains part of an object's silhouette.
2. Distance Transform Feature: - A distance transform feature is a mathematical transformation
applied to an image or a binary mask that computes the distance of each pixel or point in the
image to a specific target or set of target points. This feature is commonly used in image
processing and computer vision for various applications, such as object recognition, shape
analysis, and image segmentation. It provides valuable information about the spatial
relationship between objects and their proximity to specific reference points.
3. Entropy Feature: - Entropy is a statistical measure used in various fields, including
information theory, thermodynamics, and image processing. In image processing, entropy is
a feature that characterizes the amount of information or randomness present in an image. It
is commonly used as a texture and complexity descriptor for images and plays a crucial role
in image analysis and computer vision. The entropy feature quantifies the degree of
19
uncertainty or disorder within an image.

4. Training using Neural Network Model: - Training using a neural network model is the process
of teaching a neural network to make predictions or classifications by exposing it to a dataset.
During training, the network learns to adjust its internal parameters (weights and biases) in
order to minimize the difference between its predictions and the actual target values in the
dataset. The goal is to enable the network to generalize its learning to make accurate
predictions on new, unseen data.
5. NN Model: - A Neural Network (NN) model, often simply referred to as a neural network, is
a computational model inspired by the structure and function of the human brain. It's a
fundamental building block of deep learning and artificial intelligence, used for a wide range
of tasks, including pattern recognition, classification, regression, and decision-making.

Testing Phase

1. Testing Using Neural Network Model: - Testing using a neural network model is a critical
step in the machine learning and deep learning workflow. It involves evaluating the
performance of a trained neural network on a separate dataset to assess how well it
generalizes to new, unseen data. The testing process provides valuable insights into the
model's accuracy, reliability, and its suitability for real-world applications.
2. Recognized Actions: - A recognized action refers to the identification and classification
of a specific activity, gesture, or behavior based on data or sensory input. This recognition
is typically performed by computer systems, artificial intelligence, or human observers,
depending on the context. Recognized actions can have various applications in fields such
as computer vision, robotics, natural language processing, and human-
computerinteraction.

20
System Requirements:
Hardware Requirements:

1. Computer:
A modern desktop or laptop computer with a multi-core processor (Pentium or above
recommended) for smooth development and testing.
2. Memory (RAM):
At least 4GB of RAM is recommended. More RAM will be beneficial, especially when running
the development environment and the chat applicationsimultaneously.
3. Storage:
Solid State Drive (SSD) with a minimum of 128GB storage space. SSDs offer faster read/write
speeds, enhancing the overall performance of the developmentenvironment and storage of chat
logs and application data.
4. Display:
A high-resolution monitor (1920x1080 or higher) to accommodate various development tools and
improve productivity.

Software Requirements:-
1. Operating System: You can use Windows, macOS, or Linux for your development
environment. Linux is commonly preferred due to its compatibility with many deep
learning libraries and tools.
2. Python: Python is the primary programming language for most machine learning and
computer vision projects.

3. Deep Learning Frameworks:

TensorFlow: TensorFlow is a popular open-source deep learning framework

developed by Google. It offers various tools and libraries for building, training, and
deploying machine learning models.
PyTorch: PyTorch is another widely used deep learning framework known for its
flexibility and dynamic computation graph. It's popular in the research community
and is well-suited for computer vision tasks.

4. Computer Vision Libraries:

 OpenCV: OpenCV is an open-source computer vision library that provides tools

for image and video processing, making it essential for handling input data and
visualizing results.
 scikit-image: This library is useful for various image processing tasks and feature
extraction.
5. Video and Image Processing Libraries:

FFmpeg: If your project involves working with video data, FFmpeg is a powerful
21
tool for video processing, conversion, and manipulation.
Pillow (PIL): This Python Imaging Library is helpful for working with image data.

6. Data Management and Preprocessing:

 NumPy: NumPy is a fundamental library for numerical operations and array

manipulation, which is essential for preprocessing data.
 Pandas: Pandas is useful for data manipulation, cleaning, and organizing.

 Scikit-learn: Scikit-learn provides tools for data preprocessing, model evaluation,

and feature selection.

22
Anjuman-i-Islam’s
M.H.SABOO SIDDIK POLYTECHNIC
8, Saboo Siddik Polytechnic Road, Byculla
Mumbai- 400008
*******

INFORMATION TECHNOLOGY

Experiment 4: PROJECT DIARY

23
Academic Session 2023-24
Programme DIPLOMA IN INFORMATION TECHNOLOGY
Course Code & Course IF-5I

Student Name Zaid Ansari

Student Roll no 210808

Guide Name Ms. KHAN SAMEERA

Project Title Human Action Recognition in Videos & Images

PROGRESSIVE ASSESSMENT (PA) SHEET

Marks
Sr.No Criteria Max Marks
Obtained

1 Problem Identification / Project Title

2 Industrial Survey / Literature Review

10
3 Project Proposal

4 Project Diary

5 Report Writing including documentation 10

6 Presentation 05

Corrective Measured Adopted:

Remark and Signature of the Guide:

29
Experiment 5: Project Report

SR. NO. CHAPTER PAGE NO.

1. Certificate

2. Acknowledgement

3. Abstract

4. Content Page
Chapter 1:
5. Introduction and Background of the Industry or User
Based Problem
Chapter 2:
6. Literature Survey for Problem Identification and
Specification
Chapter 3:
7. Proposed Detail Methodology for Solving the identified
Problem with Action Plan

8. References and Bibliography

30
MAHARASHTRA STATE
BOARD OF TECHNICAL EDUCATION

Certificate

This is to certify that Mr. Zaid Ansari Roll no. 210808 of Fifth semester of Diploma in
Information Technology Engineering of institute M.H. Saboo SiddikPolytechnic (code: 0002)
has completed microproject satisfactorily in the subject: CPP (22316) for the academic year
2023-24 as prescribed in the curriculum.

Place: Byculla, Mumbai Enrollment no:2100020152

Date: Exam seat no:101786

+
Signature Signature Signature
Project guide H. O. D Principal

SEAL OF
INSTITUDE

31
ACKNOWLEDGEMENT
We wish to express our profound gratitude to our guide Ms. Sameera Khan who guided us endlessly in
the framing and completion of the micro project.
She guided us on all the main points in that micro project. We are indebted to her constant
encouragement, cooperation, and help. It was her enthusiastic support that helped us in overcoming
various obstacles in the micro-project.
We are also thankful to our Principal, HOD, faculty members, and classmates of the IT Engineering
department for extending their support and motivation in the completion of this micro-project.

Names of Team Members with Roll Nos.

Fazlur Rahman-210806
Zaid Ansari -210808
Owais Ansari -210828
Noorain Don -210831

32
ABSTRACT

Human action recognition is a fundamental task in computer vision with a wide range of applications,
from surveillance and security to human-computer interaction. This paper presents a comprehensive
review of recent advancements and challenges in the field of human action recognition in videos and
images. We discuss the key components of action recognition, including feature extraction, temporal
modelling, and classification techniques, while highlighting the importance of datasets and evaluation
metrics.

We examine the evolution of feature extraction methods, from traditional handcrafted features to deep
learning-based approaches, and explore how these techniques have improved the accuracy and
robustness of action recognition systems. Temporal modelling, including techniques such as recurrent
neural networks (RNNs) and convolutional neural networks (CNNs), is analysed for its effectiveness in
capturing the temporal dependencies in videos and image sequences.

Furthermore, we delve into the significance of large-scale action recognition datasets and benchmarking
protocols, which have played a pivotal role in advancing the field. We also discuss the latest trends in
leveraging multi-modal information, such as combining visual and depth data, to improve action
recognition accuracy and robustness.

The paper also presents an overview of challenges and open research questions in the domain of action
recognition, including fine-grained action recognition, handling occlusions, and real-time action
recognition in complex environments. We conclude by highlighting potential future directions in the
field, including the integration of explainable AI techniques and the application of action recognition in
healthcare and autonomous systems.

33
Project Report

SR.NO CHAPTER PAGE NO.

1. Certificate

2. Acknowledgement

3. Abstract

4. Content Page
Chapter 1:
5. Introduction and Background of the Industry or
User Based Problem
Chapter 2:
6. Literature Survey for Problem Identification and
Specification
Chapter 3:
7. Proposed Detail Methodology for Solving the
identified Problem with Action Plan
8. References and Bibliography

34
CHAPTER 1

INTRODUCTION AND BACKGROUND OF INDUSTRY

USER BASED PROBLEM

1.1 Introduction

1.2 Background

1.3 Motivation

1.4 Problem Statement

1.5 Objective and Scope

35
1.1 INTRODUCTION: AN OVERVIEW OF THE SYSTEM

Human Action Recognition (HAR) is a subfield of computer vision and artificial intelligence that
focuses on the identification and classification of human actions, gestures, and movements in digital
images and video sequences. This technology plays a pivotal role in various applications across different
domains, from security and healthcare to entertainment and education. The primary goal of HAR is to
enable machines to understand and interpret human activities, providing valuable insights and
facilitating automation in a wide range of contexts.

1.2 BACKGROUND: EXISTING SYSTEM

In the digital age, the ubiquity of cameras and the exponential growth of video content have created a
vast pool of visual data. Harnessing this data for meaningful insights and automation has become a
crucial challenge. Human Action Recognition arises from the need to extract knowledge and context
from this data by interpreting and categorizing the actions performed by individuals or groups of people.

1.3 MOTIVATION
The motivation behind Human Action Recognition is multifaceted. It stems from the growing demand
for automated analysis and understanding of human behavior. The rise in surveillance, the expansion of
wearable devices, and the quest for more natural human-computer interactions have fueled interest in
this field. Moreover, applications in healthcare, security, entertainment, and education have the
potential to transform these industries.

1.4 PROBLEM STATEMENT

The problem statement of human action recognition in videos and images involves developing computer
vision and machine learning algorithms to automatically identify and classify the actions or activities
performed by humans in visual data. This can include tasks like recognizing gestures, sports activities,
daily actions, or any other human movements captured in videos or images. The primary objective is to
build systems that can understand and categorize these actions accurately, which has applications in
various fields, such as surveillance, human-computer interaction, healthcare, and entertainment.

36
1.5 OBJECTIVE AND SCOPE OF THE PROJECT
The objective of this study is to advance the state-of-the-art in Human Action Recognition,
improving accuracy, speed, and adaptability to different contexts. The scope encompasses the
development of deep learning models, computer vision techniques, and datasets to facilitate
action recognition. Additionally, this research seeks to address real-world challenges in
applications like surveillance, healthcare, entertainment, and education, where accurate action
recognition can offer substantial benefits.
In this era of digital media and technology, Human Action Recognition in Videos and Images
presents an exciting avenue for research and development. The fusion of computer vision, deep
learning, and real-world applications holds the promise of creating smarter, more responsive
systems that can understand, interpret, and interact with human actions in a meaningful and
valuable way.Human-Computer Interaction: Implement the recognition system in applications
like gesture control for computers, augmented reality, or virtual reality.
Evaluation Metrics: Define appropriate metrics to assess the performance of the action
recognition system, including accuracy, precision, recall, and F1 score.
Privacy and Ethics: Address privacy concerns by considering anonymization techniques and
ethical considerations when working with video data.
Adaptability: Design the system to work in different environments and lighting conditions,
ensuring its adaptability to real-world scenarios.
Scalability: Consider the scalability of the system to handle a varying number of actions and
accommodate future expansion.
User Interface: Develop a user-friendly interface for end-users to interact with the recognition
system.

37
Chapter 2
Review of literature

2.1-Introduction
A literature review, literature survey is a text of a scholarly paper, which includes the current
knowledge including substantive findings, as well as theoretical and methodological contributions
to a particular topic. It is conducted before the project commences to give an idea about the existing
systems in that field and their pros and cons and involves a study and review of relevant literature
materials in relation to a topic you have been given. In this survey a research are done on three
Research Papers of the suspicious chat log investigation and their implementation.

2.2-Deep Learning for Human Action Recognition

The aim of this project is to develop a model for human actions such as running, jogging, walking,
clapping, hand- waving and boxing. A series of videos is given for the layout, where an
individual executes an event in each video. The action performed on that particular video will be
the label of a video. This relationship must be learned by the model, and the label of an input
(video) which he never saw can then be predicted. Technically, despite descriptions of these acts,
the model would need to learn to distinguish between various human behaviors. There may be many
content identification programs which can work on following jobs like Active object tracking for
identifying an item such as a vehicle or a human from a CCTV picture and learning the patterns in
the movement of humans when we are able to create a pattern that will guide us (humans) to perform
a variety of activities.

2.3-Human Activity Recognition: A Review

Author:-Ong Chin Ann
Human Activity Recognition is one of the active research areas in computer vision for various
contexts like security surveillance, healthcare and human computer interaction. In this paper, a total
of thirty-two recent research papers on sensing technologies used in HAR are reviewed. The review
covers three area of sensing technologies namely RGB cameras, depth sensors and wearable
devices. It also discusses on the pros and cons of the mentioned sensing technologies. The findings
showed that RGB cameras have lower popularity when compared to depth sensors and wearable
devices in HAR research.

38
2.4-Implementation of Human Action Recognitionusing Image Parsing
Techniques
Author:- Soumalya Sen

Human activity recognition plays a significant role in human-to-human interaction and

interpersonal relations. Because it provides information about the identity of a person, their
personality, and psychological state, it is difficult to extract. The human ability to recognize another
person’s activities is one of the main subjects of study of the scientific areas of computer vision and
machine learning. As a result of this research, many applications, including video surveillance
systems, human-computer interaction, and robotics for human behavior characterization, require a
multiple activity recognition system. In image and video analysis, human activity recognition is an
important research direction. In the past, a large number of papers have been published on human
activity recognition in video and image sequences. In this paper, we provide a comprehensive
survey of the recent development of the techniques, including methods, systems, and quantitative
evaluation of the performance of human activity recognition.The experimental results show that our
method can significantly improve classification, interpretation, and retrieval performance for the
video images. The novelty of this paper is twofold. First, to capture the video images of human.
Secondly, to identify the different types of action performed by human.

2.5-Human Action Recognition using DFT

Author-Sonal Kumari
Action is any meaningful movement of the human and it is used to convey information or to interact
naturally without any mechanical devices. Human action recognition is motivated by some of the
applications such as video retrieval, Human robot interaction, to interact with deaf and dumb people
etc. In any Action Recognition System, some preprocessing steps are done for removing the noise
caused because of illumination effects, blurring, false contour etc. Background subtraction is done
to remove the static or slowly varying background. In this paper, multiple background subtraction
algorithms are tested and then one of them is selected for the further process of action recognition.
Background subtraction is also known as foreground/background segmentation or foreground
extraction. The next step is the feature extraction which deals with the extraction of the important
feature (like corner points, optical flow, shape, motion vectors etc.) from the image frame. The
proposed novel action recognition algorithm uses discrete Fourier transform (DFT) of the small
image block.

39
2.6-Human Action Recognition Using SmartphoneSensors

Author-Ashim Saha
As smartphones are becoming ubiquitous, many studies using smartphones are being investigated
in recent years. Further, these smartphones are being laden with several diverse and sophisticated
sensors like GPS sensor, vision sensor (camera), acceleration sensor, audio sensor (microphone),
light sensor, and direction sensor (compass). Activity Recognition is one of the potent research
topics, which can be used to provide effective and adaptive services to users. Our paper is intended
to evaluate a system using smartphone-based sensors used for acceleration, referred to as an
accelerometer. To understand six different human activities using supervised machine learning
classiﬁcation; to execute the model a compiled accelerometer data from different sixteen users are
collected as per their usual day to day routine consisting of sitting, standing, laying down,
walking, climbing up and down the staircase. The sample data thus generated then have been
aggregated and combined into examples upon which supervised machine learning algorithms have
been applied to generate predictive models. To address the limitations of laboratory settings, we
have used the Physics Toolbox Sensor Suite with the Google Android platform to collect these
time- series data generated by the smartphone accelerometer. This kind of activity prediction model
can be used to provide insightful information about millions of human beings merely by making
them contain a smartphone with them.
Index Terms—Activity Recognition, Machine Learning, Multi- class classiﬁcation, Smartphone
time-series data, 3-axis Ac- celerometer.

2.7-A Survey on Human Action Recognition from Videos

Author-Chandni J. Dhamsania
Human action recognition is a way of retrieving videos emerged from Content Based Video
Retrieval (CBVR).It is a growing area of research in the field of computer vision nowadays. Human
action recognition has gained popularity because of its wide applicability in automatic retrieval of
videos of particular action using visual features. The most common stages for action recognition
includes: object and human segmentation, feature extraction, activity detection and classification.
This paper describes the application and challenges of human action recognition. Features and
limitations of various methods for human action recognition are discussed. This paper introduces
survey on different types of actions like single person action recognition, two person or person-
object interaction and multiple people action recognition.

40
2.8-SCNN: SEQUENTIAL CONVOLUTIONAL NEURAL
NETWORK FOR HUMAN ACTION RECOGNITION IN VIDEOS

Author –Hao Yang, Chunfeng Yuan∗, Junliang Xing, Weiming Hu

Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) are two typical kinds
of neural networks. While CNN models have achieved great success on image recognition due to
their strong abilities in abstracting spatial information from multiple levels, RNN models have
not achieved significant progress in video analyzing tasks (e.g. action recognition), although RNN
can inherently model temporal dependencies from videos. In this work, we propose a Sequential
Convolutional Neural Network, denoted as SCN- N, to extract effective spatial-temporal features
from videos, thus incorporating the strengths of both convolutional operation and recurrent
operation. Our SCNN model extends RNN to directly process feature maps, rather than vectors
flattened from feature maps, to keep spatial structures of the inputs. It replaces the full connections
of RNN with convolutional connections to decrease parameter numbers, computational cost, and
over-fitting risk. Moreover, we introduce asymmet- ric convolutional layers into SCNN to reduce
parameter numbers and computational cost further. Our final SCNN deep architecture used for
action recognition achieves very good performances on two challenging benchmarks, UCF-101 and
HMDB-51, outperforming many state-of-the-art me.

2.9-Real-Time Human Action Recognition Using CNN Over Temporal Images

for Static Video Surveillance Cameras

Author- Cheng-Bin Jin, Shengzhe Li, Trung Dung Do, and Hakil Kim.
Abstract. This paper proposes a real-time human action recognition approach to static video
surveillance systems. This approach predicts human actions using temporal images and
convolutional neural networks (CNN). CNN is a type of deep learning model that can automatically
learn features from training videos. Although the state-of-the-art methods have shown high accuracy,
they consume a lot of computational resources. Another problem is that many methods assume that
exact knowledge of human positions. Moreover, most of the current methods build complex
handcrafted features for specific classifiers. Therefore, these kinds of methods are difficult to apply
in real-world applications. In this paper, a novel CNN model based on temporal images and a
hierarchical action structure is developed for real-time human action recognition. The hierarchical
action structure includes three levels: action layer, motion layer, and posture layer. The top layer
represents subtle actions; the bottom layer represents posture. Each layer contains one CNN, which
means that this model has three CNNs working together; layers are combined to represent many
different kinds of action with a large degree of freedom. The developed approach was implemented
and achieved superior performance for the ICVL action dataset; the algorithm can run at around 20
frames per second.

41
2.10-Survey On Feature Extraction Approach for Human Action Recognition
in Still Images and Videos
Author – Pavan M1, Deepika D2, Divyashree R2
Human Action Recognition (HAR) has been a challenging problem yet it needs to be solved.
Recently the detection and recognition of human action has broad range of applications and is
popularized in the field of computer vision. It mainly focuses on to understand human behaviour
and name a label to each action. There are many approaches for action recognizing from both image
and video based actions. Now it is time to review these existing approaches in order to help for
future research. The main aim of this work is to study the various action recognition techniques in
videos and images. The paper presents a brief overview of features of human actions by categorizing
as still image-based and video-based. All related datasets are also introduced in this paper, which
will be helpful for future research.
2.11-Subspace Analysis Methods plus Motion History Image for Human
ActionRecognition
Author – Chunhua Du1,2, Qiang Wu1, ,Jie Yang2.

This paper proposes a new human action recognition method which deals with recognition task
in a quite different way when compared with traditional methods which use sequence matching
scheme. Our method compresses a sequence of an action into a Motion History Image (MHI)
on which low-dimensional features are extracted using subspace analysis methods. Unlike other
methods which use a sequence consisting of several frames for recognition, our method uses
only a MHI per action sequence for recognition. Obviously, our method avoids the
complexity as well as the large computation in sequence matching based methods. Encouraging
experimental results on a widely used database demonstrate the effectiveness of the proposed
method.

2.12-Human action recognition using latent-dynamic condition random

fields
Author – Guangfeng Lin1 , Yindi Fan2, Erhu Zhang.

In video human action recognition of the continual human motion is a difficult point for application.
The method of human action recognition based on latent-dynamic condition random fields is
presented. By star form distance descriptor of human body contour, human pose is extracted. Then
in continuous sequences method building the model of LDCRF shows the mapping relation between
action feature and action semantics. Comparing with traditional CRF and HCRF, by designing the
affiliation of latent feature and human pose, LDCRF implements the modeling in internal action
and external movement feature. In the experiment, Weizmann action database is used, and three
experiments are designed. When composition continuous sequence is tested, except “skip” action,
recognition rate reaches over 90%; receiver operating characteristic of three model shows LDCRF
moels have the better descriptive capability in internal action and external movement feature;while
human action is affected by angle, accessory and occlusion. It shows LDCRF is robustness in the
human body contour integrity situation.

42
2.13-Action Recognition by Multiple Features and Hyper-sphere Multi-class
SVM
Author – Jia Liu,Jie Yang,Yi Zhang
In this paper we propose a novel framework for action recognition based on multiple features
for improve action recognition in videos. The fusion of multiple features is important for
recognizing actions as often a single feature based representation is not enough to capture the
imaging variations (view-point, illumination etc.) and attributes of individuals (size, age,
gender etc.). Hence, we use two kinds of features: i) a quantized vocabulary of local spatio-
temporal (ST) volumes (cuboids and 2-D SIFT), and ii) the higher-order statistical models of
interest points, which aims to capture the global information of the actor. We construct video
representation in terms of local space-time features and global features and integrate such
representations with hyper-sphere multi-class SVM. Experiments on publicly available datasets
show that our proposed approach is effective. An additional experiment shows that using both
local and global features provides a richer representation of human action when compared to
the use of a single feature type.
2.14-Depth Image-based Object Segmentation Scheme for Improving
Human Action Recognition
Author – Sungjoo Park1.
Human action recognition using the 3D camera for surveillance applications is a
promising alternative approach to the conventional 2D camera based surveillance.
We propose a depth image-based object segmentation scheme for improving
human action recognition. Experimental results show that the average accuracy
of the dangerous event detection is improved by about 15% when using the
proposed object segmentation scheme.
2.15-HUMAN ACTION RECOGNITION USING ROBUST POWER
SPECTRUM FEATURES
Author – Hossein Ragheb, Sergio Velastin.
We propose a new method for human action recognition from video streams that is fast and robust
to noise and to large changes in camera views. We start by extracting features in the Fourier domain
once we obtain the bounding boxes containing the silhouettes of a human for a number of video
frames representing a basic action. After preprocessing, we divide each space-time volume into
space-time sub-volumes (STSV) and compute their corresponding mean-power spectra as our
feature vectors. Our features result in high classification performance even with simple distance
measures. We perform an experimental comparison, using the same data, between our method and
two state-of-the-art methods.

43
2.16-Review on Recent Advances in Human Action Recognition
in Video Data
Author – Akshita Baisware
AI has achieved new heights in image recognition, human action recognition and NLP. It has a vast
area of implementations such as IoT, robotics, biosciences and surveillance. Video based human
action recognition has lots of potential applications that make it the most sought after field for
researchers. With the constant growth of high performance computing, computer vision and GPUs,
deep learning based human activity recognition is one of the constant evolving and promising
stream. This review focuses on recent advancements in the field of action recognition based on deep
learning. The present state of the art techniques for action recognition and prediction as well as the
future scope for the research is discussed in the paper.
2.17-Human action identification and search in video files
Author – Mirela Kundid
This paper describes an approach for modeling and recognition of human actions within videos.
With millions of videos that are published almost every day, there are new opportunities for research
in the field of search and recognition within the video sequence. Statistical approaches and
approaches based on the description of the model are described in detail in this paper and compared
to a series of videos taken from various on-line databases (KTH, Weizmann, MSR-Action). There
are various approaches to identify actions within video sequences. Approaches that are described
within this paper are based on recognition of the action of a series of images obtained segmentation
and motion picture history by constructing movement (Motion History Images MHI). In this paper
we apply the technique to construct MHI on a series of images obtained from the database used for
the analysis of movement in order to recognize the action within a video (greeting of human in
video).
2.18-Human Action Recognition Using Deep NeuralNetworks
Authors: - Rashmi R. Koli
Human activities such as body gestures are the most difficult and challenging things in deep neural
networks. Human action recognition is nothing but the human gesture recognition. Gesture shows
a movement of body parts that convey some meaningful message. Gestures are greater most suitable
and natural to have interaction with systems(computer) for humans thus it builds a platform between
humans and machines. Human activity recognition provides the platform to interact with the deaf
and dumb person. In this research work, we introduce to develop a platform for hand movement
recognition, which recognizes hand movement(gestures), by using the CNN, we can identify human
gestures in the image. As we are aware there was a quick increment in the amount of deaf and dumb
victims because of a few conditions. Since deaf and dumb people can't communicate with a typical
person so they want to depend upon a sort of visual correspondence Sign language. The sign
language gives the fine verbal exchange platform for listening to impaired men or women to bring
their minds and to have interaction with a normal character. The purpose of this research is a gadget
that broadens the popularity of gestures, which can recognize gestures and then convert gesture
images into text accordingly. The system pays special attention to the CNN training component
using the CNN algorithm. The concept includes designing a gadget that uses in-depth mastery
standards to treat input as a gesture and then provide recognizable output as text. Keywords—
Human Action Recognition, Deaf and dump, CNN, Hand gesture.

44
2.19-Human Action Invariancies for Human Action Recognition
Authors: -Nilam Nur Amir Sharif
The uniqueness of the human action shape or silhouette can be used for the human action
recognition. Acquiring the features of human silhouette to obtained the concept of human action
invariancies have led to an important research in video surveillance domain. This paper discusses
the investigation of this concept by extracting individual human action features using integration
moment invariant. Experiment result have shown that human action invariancies are improved with
better recognition accuracy. This has verified that the integration method of moment invariant is
worth explored in recognition of human action in video surveillance.

2.20-Human action recognition based on recognition of linear patterns in

2.21-Spatial-temporal Histograms of Gradients andHOD-VLAD Encoding

for Human Action Recognition
Author: -Bo Lin Department of Computer Science Chongqing University Chongqing, China
Automatic human action recognition is a core func- tionality of systems for video surveillance and
human object interaction. In the whole recognition system, feature description and encoding
represent two crucial key steps. In order to construct a powerful action recognition framework it is
important that the two steps must provide reliable performance. In this paper, we proposed a new
human action feature descriptor which is called spatial-temporal histograms of gradients (SPHOG).
SPHOG is based on the spatial and temporal derivation signal, which extracts the gradient changes
between consecutive frames. Compare to the traditional descriptors histograms of optical ﬂow, our
proposed SPHOG costs less computation resource. Vector of Locally Aggregated Descriptors
(VLAD), which is a popular encoding approach for Bag-of-Feature representation. There is a main
drawback of VLAD that it only considers the difference between local descriptor and their centroids.
In order to resolve the weakness, we proposed a improved VLAD method called HOD-VLAD,
which complementary the distribution information of local descriptors by computing a weight
histograms of distance. We validated our proposed algorithm for human action recognition on three
public available datasets KTH, UCF Sports and HMDB51. The evaluation experiment results
indicate that the proposed descriptor and encoding method can improve the efﬁciency of human
action recognition and the recognition accuracy.

45
Chapter 3

PROPOSE DETAILED METHODOLOGY

OF
SOLVING THE IDENTIFIED PROBLEM WITH ACTION PLAN

3.1 System Planning

3.2 System Design
3.3 System Requirement
3.4 Implementation Tools

46
3.1-SYSTEM PLANNING

STEPS INVOLVED IN THE SYSTEM DEVELOPMENT LIFE CYCLE:

Below are the steps involved in the System Development Life Cycle. Each phase within the
overall cycle may be made up of several steps.

Step 1: Software Concept

The software concept for a human action recognition project in videos and images involves designing
a system that can automatically detect and classify human actions in visual data.

Human action recognition is a fundamental task in computer vision and artificial intelligence, aimed
at understanding and interpreting human movements from digital data, typically video sequences.
This concept revolves around developing software systems that can automatically detect and
categorize various human actions, such as walking, running, waving, or even complex activities like
dancing or playing sports.

Step 2: Requirements Analysis

To achieve accurate human action recognition, several key steps are involved. These steps include
data,collection,pre-processing,feature-extraction,andclassification.

47
Step 3: Architectural Design
Architectural design for human action recognition typically involves computer vision and deep
learning techniques. A common approach is to use Convolutional Neural Networks (CNNs) and
Recurrent Neural Networks (RNNs) in combination..

48
There are various software process models like
 Prototyping Model

 RADModel
 The SpiralModel
 The WaterfallModel
 The IterativeModel
Of all these process models we’ve used the Iterative model (The Linear Sequential Model) for the
development of our project.
The Iterative model

49
The model consists of six distinct stages, namely

1. In the requirements analysis phase

(a) The problem is specified along with the desired service objectives(goals)

(b) The constraints are identified

2. In the specification phase the system specification is produced from the

 Software architecture

 Algorithmic detail

 Interface representations

 Detailed documentation from the design phase can significantly reduce the
coding effort.

6. The maintenance phase the usually the longest stage of the software. In this phase

50
the software is updated to:

 Meet the changing customer needs

 Adapted to accommodate changes in the external environment

 Correct errors and oversights previously undetected in the testing phases enhancing
the efficiency of the software
Advantages of the Iterative Model:
 Testing is inherent to every phase of the Iterative model
 It is an enforced disciplined approach
 It is documentation driven, that is, documentation is produced at every stage

Disadvantages of the Iterative Model:

For example, a problem/update in the design phase requires a ‘revisit’ to the

specificationsphase. When changes are made at any phase, the relevant documentation
should be updatedto reflect that change.

51
TIMELINE CHART

52
.

53
3.2-SYSTEM DESIGN
Introduction:

54
BLOCK DIAGRAM

4. Training using Neural Network Model: - Training using a neural network model is the process
55
of teaching a neural network to make predictions or classifications by exposing it to a dataset.
During training, the network learns to adjust its internal parameters (weights and biases) in
order to minimize the difference between its predictions and the actual target values in the
dataset. The goal is to enable the network to generalize its learning to make accurate
predictions on new, unseen data.
5. NN Model: - A Neural Network (NN) model, often simply referred to as a neural network, is
a computational model inspired by the structure and function of the human brain. It's a
fundamental building block of deep learning and artificial intelligence, used for a wide range
of tasks, including pattern recognition, classification, regression, and decision-making.

Testing Phase

1) Testing Using Neural Network Model: - Testing using a neural network model is a critical
step in the machine learning and deep learning workflow. It involves evaluating the
performance of a trained neural network on a separate dataset to assess how well it
generalizes to new, unseen data. The testing process provides valuable insights into the
model's accuracy, reliability, and its suitability for real-world applications.
2) Recognized Actions: - A recognized action refers to the identification and classification
of a specific activity, gesture, or behavior based on data or sensory input. This recognition
is typically performed by computer systems, artificial intelligence, or human observers,
depending on the context. Recognized actions can have various applications in fields such
as computer vision, robotics, natural language processing, and human-computer
interaction.

56
3.3-System Requirements:
Hardware Requirements:
1. Computer:
A modern desktop or laptop computer with a multi-core processor (Pentium or above
recommended) for smooth development and testing.
2. Memory (RAM):
At least 4GB of RAM is recommended. More RAM will be beneficial, especially when running
the development environment and the chat applicationsimultaneously.
3. Storage:
Solid State Drive (SSD) with a minimum of 128GB storage space. SSDs offer faster read/write
speeds, enhancing the overall performance of the developmentenvironment and storage of chat logs
and application data.
4. Display:
A high-resolution monitor (1920x1080 or higher) to accommodate various development tools and
improve productivity.

3.4-Implementation Tools

Software Requirements:-
1. Operating System: You can use Windows, macOS, or Linux for your development
environment. Linux is commonly preferred due to its compatibility with many deeplearning
libraries and tools.
2. Python: Python is the primary programming language for most machine learning and
computer vision projects.