Screenshot 2024-03-20 at 11.11.57 PM
Screenshot 2024-03-20 at 11.11.57 PM
INTRODUCTION
Human Action Recognition (HAR) is a subfield of computer vision and artificial intelligence that
focuses on the identification and classification of human actions, gestures, and movements in digital
images and video sequences. This technology plays a pivotal role in various applications across different
domains, from security and healthcare to entertainment and education. The primary goal of HAR is to
enable machines to understand and interpret human activities, providing valuable insights and
facilitating automation in a wide range of contexts.
BACKGROUND
In the digital age, the ubiquity of cameras and the exponential growth of video content have created a
vast pool of visual data. Harnessing this data for meaningful insights and automation has become a
crucial challenge. Human Action Recognition arises from the need to extract knowledge and context
from this data by interpreting and categorizing the actions performed by individuals or groups of people.
MOTIVATION
The motivation behind Human Action Recognition is multifaceted. It stems from the growing demand
for automated analysis and understanding of human behavior. The rise in surveillance, the expansion of
wearable devices, and the quest for more natural human-computer interactions have fueled interest in
this field. Moreover, applications in healthcare, security, entertainment, and education have the
potential to transform these industries.
PROBLEM STATEMENT
The problem statement of human action recognition in videos and images involves developing computer
vision and machine learning algorithms to automatically identify and classify the actions or activities
performed by humans in visual data.
1
OBJECTIVE AND SCOPE OF THE PROJECT
The objective of this study is to advance the state-of-the-art in Human Action Recognition, improving
accuracy, speed, and adaptability to different contexts. The scope encompasses the development of
deep learning models, computer vision techniques, and datasets to facilitate action recognition.
Additionally, this research seeks to address real-world challenges in applications like surveillance,
healthcare, entertainment, and education, where accurate action recognition can offer substantial
benefits.
In this era of digital media and technology, Human Action Recognition in Videos and Images presents
an exciting avenue for research and development. The fusion of computer vision, deep learning, and
real-world applications holds the promise of creating smarter, more responsive systems that can
understand, interpret, and interact with human actions in a meaningful and valuable way.Human-
Computer Interaction: Implement the recognition system in applications like gesture control for
computers, augmented reality, or virtual reality.
Evaluation Metrics: Define appropriate metrics to assess the performance of the action recognition
system, including accuracy, precision, recall, and F1 score.
Privacy and Ethics: Address privacy concerns by considering anonymization techniques and ethical
considerations when working with video data.
Adaptability: Design the system to work in different environments and lighting conditions, ensuring
its adaptability to real-world scenarios.
Scalability: Consider the scalability of the system to handle a varying number of actions and
accommodate future expansion.
User Interface: Develop a user-friendly interface for end-users to interact with the recognition system.
2
Experiment 2: Industrial Survey /Literature Review
Introduction
A literature review, literature survey is a text of a scholarly paper, which includes the current knowledge
including substantive findings, as well as theoretical and methodological contributions to a particular
topic. It is conducted before the project commences to give an idea about the existing systems in that
field and their pros and cons and involves a study and review of relevant literature materials in relation
to a topic you have been given. In this survey a research are done on three Research Papers of the
suspicious chat log investigation and their implementation.
3
3)Implementation of Human Action Recognitionusing Image Parsing Techniques
Author:- Soumalya Sen
Human activity recognition plays a significant role in human-to-human interaction and interpersonal
relations. Because it provides information about the identity of a person, their personality, and
psychological state, it is difficult to extract. The human ability to recognize another person’s activities
is one of the main subjects of study of the scientific areas of computer vision and machine learning. As
a result of this research, many applications, including video surveillance systems, human-computer
interaction, and robotics for human behavior characterization, require a multiple activity recognition
system. In image and video analysis, human activity recognition is an important research direction. In
the past, a large number of papers have been published on human activity recognition in video and
image sequences. In this paper, we provide a comprehensive survey of the recent development of the
techniques, including methods, systems, and quantitative evaluation of the performance of human
activity recognition.The experimental results show that our method can significantly improve
classification, interpretation, and retrieval performance for the video images. The novelty of this paper
is twofold. First, to capture the video images of human. Secondly, to identify the different types of
action performed by human.
4
5)Human Action Recognition Using SmartphoneSensors
Author-Ashim Saha
As smartphones are becoming ubiquitous, many studies using smartphones are being investigated in
recent years. Further, these smartphones are being laden with several diverse and sophisticated sensors
like GPS sensor, vision sensor (camera), acceleration sensor, audio sensor (microphone), light sensor,
and direction sensor (compass). Activity Recognition is one of the potent research topics, which
can be used to provide effective and adaptive services to users. Our paper is intended to evaluate a
system using smartphone-based sensors used for acceleration, referred to as an accelerometer. To
understand six different hu- man activities using supervised machine learning classification; to execute
the model a compiled accelerometer data from different sixteen users are collected as per their usual
day to day routine consisting of sitting, standing, laying down, walking, climbing up and down the
staircase. The sample data thus generated then have been aggregated and combined into examples
upon which supervised machine learning algorithms have been applied to generate predictive models.
To address the limitations of laboratory settings, we have used the Physics Toolbox Sensor Suite with
the Google Android platform to collect these time- series data generated by the smartphone
accelerometer. This kind of activity prediction model can be used to provide insightful information
about millions of human beings merely by making them contain a smartphone with them.
Index Terms—Activity Recognition, Machine Learning, Multi- class classification, Smartphone time-
series data, 3-axis Ac- celerometer.
Author-Chandni J. Dhamsania
Human action recognition is a way of retrieving videos emerged from Content Based Video Retrieval
(CBVR).It is a growing area of research in the field of computer vision nowadays. Human action
recognition has gained popularity because of its wide applicability in automatic retrieval of videos of
particular action using visual features. The most common stages for action recognition includes: object
and human segmentation, feature extraction, activity detection and classification. This paper describes
the application and challenges of human action recognition. Features and limitations of various methods
for human action recognition are discussed. This paper introduces survey on different types of actions
like single person action recognition, two person or person-object interaction and multiple people action
recognition.
5
7)SCNN: SEQUENTIAL CONVOLUTIONAL NEURAL
NETWORK FOR HUMAN ACTION RECOGNITION IN VIDEOS
Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) are two typical kinds of
neural networks. While CNN models have achieved great success on image recognition due to their
strong abilities in abstracting spa- tial information from multiple levels, RNN models have not
achieved significant progress in video analyzing tasks (e.g. action recognition), although RNN can
inherently model tem- poral dependencies from videos. In this work, we propose a Sequential
Convolutional Neural Network, denoted as SCN- N, to extract effective spatial-temporal features from
videos, thus incorporating the strengths of both convolutional opera- tion and recurrent operation. Our
SCNN model extends RNN to directly process feature maps, rather than vectors flattened from feature
maps, to keep spatial structures of the inputs. It replaces the full connections of RNN with
convolutional connections to decrease parameter numbers, computational cost, and over-fitting risk.
Moreover, we introduce asymmet- ric convolutional layers into SCNN to reduce parameter num- bers
and computational cost further. Our final SCNN deep architecture used for action recognition achieves
very good performances on two challenging benchmarks, UCF-101 and HMDB-51, outperforming
many state-of-the-art me.
8)Real-Time Human Action Recognition Using CNN Over Temporal Images for
Static Video Surveillance Cameras
Author- Cheng-Bin Jin, Shengzhe Li, Trung Dung Do, and Hakil Kim.
Abstract. This paper proposes a real-time human action recognition approach to static video
surveillance systems. This approach predicts human actions using temporal images and convolutional
neural networks (CNN). CNN is a type of deep learning model that can automatically learn features
from training videos. Although the state-of-the-art methods have shown high accuracy, they consume
a lot of computational resources. Another problem is that many methods assume that exact knowledge
of human positions. Moreover, most of the current methods build complex handcrafted features for
specific classifiers. Therefore, these kinds of methods are difficult to apply in real-world applications.
In this paper, a novel CNN model based on temporal images and a hierarchical action structure is
developed for real-time human action recognition. The hierarchical action structure includes three
levels: action layer, motion layer, and posture layer. The top layer represents subtle actions; the bottom
layer represents pos- ture. Each layer contains one CNN, which means that this model has three CNNs
working together; layers are combined to represent many different kinds of action with a large degree
of freedom. The developed approach was imple- mented and achieved superior performance for the
ICVL action dataset; the algorithm can run at around 20 frames per second.
6
9)Survey On Feature Extraction Approach for Human Action Recognition in Still
Images and Videos
Author – Pavan M1, Deepika D2, Divyashree R2
Human Action Recognition (HAR) has been a challenging problem yet it needs to be solved.
Recently the detection and recognition of human action has broad range of applications and is
popularized in the field of computer vision. It mainly focuses on to understand human behaviour and
name a label to each action. There are many approaches for action recognizing from both image and
video based actions. Now it is time to review these existing approaches in order to help for future
research. The main aim of this work is to study the various action recognition techniques in videos and
images. The paper presents a brief overview of features of human actions by categorizing as still image-
based and video-based. All related datasets are also introduced in this paper, which will be helpful for
future research.
10)Subspace Analysis Methods plus Motion History Image for Human Action
Recognition
This paper proposes a new human action recognition method which deals with recognition task in
a quite dif- ferent way when compared with traditional methods which use sequence matching
scheme. Our method compresses a sequence of an action into a Motion History Image (MHI) on
which low-dimensional features are extracted using sub- space analysis methods. Unlike other
methods which use a sequence consisting of several frames for recognition, our method uses only
a MHI per action sequence for recog- nition. Obviously, our method avoids the complexity as well
as the large computation in sequence matching based methods. Encouraging experimental results
on a widely used database demonstrate the effectiveness of the proposed method.
In video human action recognition of the continual human motion is a difficult point for application.
The method of human action recognition based on latent-dynamic condition random fields is presented.
By star form distance descriptor of human body contour, human pose is extracted. Then in continuous
sequences method building the model of LDCRF shows the mapping relation between action feature
and action semantics. Comparing with traditional CRF and HCRF, by designing the affiliation of latent
feature and human pose, LDCRF implements the modeling in internal action and external movement
feature. In the experiment, Weizmann action database is used, and three experiments are designed.
When composition continuous sequence is tested, except “skip” action, recognition rate reaches over
90%; receiver operating characteristic of three model shows LDCRF moels have the better descriptive
7
capability in internal action and external movement feature;while human action is affected by angle,
accessory and occlusion. It shows LDCRF is robustness in the human body contour integrity situation.
8
15)Review on Recent Advances in Human Action Recognition in
Video Data
Author – Akshita Baisware
AI has achieved new heights in image recognition, human action recognition and NLP. It has a vast
area of implementations such as IoT, robotics, biosciences and surveillance. Video based human
action recognition has lots of potential applications that make it the most sought after field for
researchers. With the constant growth of high performance computing, computer vision and GPUs,
deep learning based human activity recognition is one of the constant evolving and promising
stream. This review focuses on recent advancements in the field of action recognition based on deep
learning. The present state of the art techniques for action recognition and prediction as well as the
future scope for the research is discussed in the paper.
16)Human action identification and search in video files
Author – Mirela Kundid
This paper describes an approach for modeling and recognition of human actions within videos.
With millions of videos that are published almost every day, there are new opportunities for research
in the field of search and recognition within the video sequence. Statistical approaches and
approaches based on the description of the model are described in detail in this paper and compared
to a series of videos taken from various on-line databases (KTH, Weizmann, MSR-Action). There
are various approaches to identify actions within video sequences. Approaches that are described
within this paper are based on recognition of the action of a series of images obtained segmentation
and motion picture history by constructing movement (Motion History Images MHI). In this paper
we apply the technique to construct MHI on a series of images obtained from the database used for
the analysis of movement in order to recognize the action within a video (greeting of human in
video).
17)Human Action Recognition Using Deep NeuralNetworks
Authors: - Rashmi R. Koli
Human activities such as body gestures are the most difficult and challenging things in deep neural
networks. Human action recognition is nothing but the human gesture recognition. Gesture shows
a movement of body parts that convey some meaningful message. Gestures are greater most suitable
and natural to have interaction with systems(computer) for humans thus it builds a platform between
humans and machines. Human activity recognition provides the platform to interact with the deaf
and dumb person. In this research work, we introduce to develop a platform for hand movement
recognition, which recognizes hand movement(gestures), by using the CNN, we can identify human
gestures in the image. As we are aware there was a quick increment in the amount of deaf and dumb
victims because of a few conditions. Since deaf and dumb people can't communicate with a typical
person so they want to depend upon a sort of visual correspondence Sign language. The sign
language gives the fine verbal exchange platform for listening to impaired men or women to bring
their minds and to have interaction with a normal character. The purpose of this research is a gadget
that broadens the popularity of gestures, which can recognize gestures and then convert gesture
images into text accordingly. The system pays special attention to the CNN training component
using the CNN algorithm. The concept includes designing a gadget that uses in-depth mastery
standards to treat input as a gesture and then provide recognizable output as text. Keywords—
Human Action Recognition, Deaf and dump, CNN, Hand gesture.
9
18)Human Action Invariancies for Human Action Recognition
Authors: -Nilam Nur Amir Sharif
The uniqueness of the human action shape or silhouette can be used for the human action
recognition. Acquiring the features of human silhouette to obtained the concept of human action
invariancies have led to an important research in video surveillance domain. This paper discusses
the investigation of this concept by extracting individual human action features using integration
moment invariant. Experiment result have shown that human action invariancies are improved with
better recognition accuracy. This has verified that the integration method of moment invariant is
worth explored in recognition of human action in video surveillance.
10
Experiment 3: Project Proposal
SYSTEM PLANNING
Below are the steps involved in the System Development Life Cycle. Each phase within the
overall cycle may be made up of several steps.
Human action recognition is a field within computer vision and artificial intelligence that focuses on
understanding and categorizing human movements and actions in videos or images. It has numerous
applications, including surveillance, healthcare, sports analytics, and more.
To achieve accurate human action recognition, several key steps are involved. These steps include
data,collection,pre-processing,feature-extraction,andclassification.
11
Step 3: Architectural Design
Architectural design for human action recognition typically involves computer vision and deep
learning techniques. A common approach is to use Convolutional Neural Networks (CNNs) and
Recurrent Neural Networks (RNNs) in combination..
The design will serve as a blueprint for the system and helps detect problems before these errors
or problems are built into the final system.Professionals create the system design,but must review
their work with the users to ensure the design meets users'needs.
Step 4: Coding and Debugging
Human action recognition involves identifying and categorizing various actions performed by
individuals, such as walking, running, or dancing. This field has applications in surveillance,
healthcare, sports analysis, and more.
In the coding phase, developers typically use programming languages like Python and deep learning
libraries such as TensorFlow or PyTorch to build and train action recognition models. They start by
collecting labeled datasets containing video clips or image sequences of various actions. These
datasets are then preprocessed to extract meaningful features from the data, such as optical flow or
3D joint positions.
Step 5: System Testing
System testing of human action recognition involves the evaluation of software or hardware
systems designed to identify and classify human actions from various data sources, such as videos,
images, or sensor data. This critical step ensures the system's accuracy, reliability, and
performance.
Step 6: Maintenance
To maintain progress in human action recognition, continuous data collection, algorithm
development, addressing real-world challenges, resource updates, and interdisciplinary collaboration
are essential. These efforts ensure the continued advancement of action recognition systems for
various applications, from surveillance to healthcare.
12
There are various software process models like
Prototyping Model
RADModel
The SpiralModel
The WaterfallModel
The IterativeModel
Of all these process models we’ve used the Iterative model (The Linear Sequential Model) for the
development of our project.
The Iterative model
Iterative process starts with a simple implementation of a subset of the software requirements and
iteratively enhances the evolving versions until the full system is implemented. At each iteration,
design modifications are made and new functional capabilities are added. The basic idea behind this
method is to develop a system through repeated cycles (iterative) and in smaller portions at a time
(incremental).
13
The model consists of six distinct stages, namely
(a) The problem is specified along with the desired service objectives(goals)
Software architecture
Algorithmic detail
Interface representations
The hardware requirements are also determined at this stage along with a picture of
the overall system architecture. By the end of this stage should the software engineer
should be able to identify the relationship between the hardware, software and the
associated interfaces. Any faults in the specification should ideally not be passed
down stream.
4. In the implementation and testing phase stage the designs are translated into the
software domain
Detailed documentation from the design phase can significantly reduce the
coding effort.
Testing at this stage focuses on making sure that any errors are identified and
that the software meets its required specification.
5. In the integration and system testing phase all the program units are integrated and
tested to ensure that the complete system meets the software requirements. After this
stage the software is delivered to the customer [Deliverable – The software product
is delivered to the client for acceptance testing.]
6. The maintenance phase the usually the longest stage of the software. In this phase
14
the software is updated to:
The waterfall model is the oldest and the most widely used paradigm. However, many
projects rarely follow its sequential flow. This is due to the inherent problems
associated with its rigid format. Namely:
It only incorporates iteration indirectly, thus changes may cause considerable
confusion as the project progresses.
Observe that feedback loops allow for corrections to be incorporated into the model.
15
TIMELINE CHART
A timeline chart is an effective way to visualize a process using chronological order. Since
details are displayed graphically, important points in time can be easy seen and understood.
Often used for managing a project’s schedule, timeline charts function as a sort of calendar of
events within a specific period of time.
A Timeline chart is constructed with a horizontal axis representing the total time span of the
project, broken down into increments (for example, days, weeks, or months) and a vertical axis
representing the tasks that make up the project (for example, if the project is outfitting your
computer with new software, the major tasks involved might be: conduct research,
choose software, install software). Horizontal bars of varying lengths represent the sequences,
timing, and time span for each task. Using the same example, you would put conduct research"
at the top of the vertical axis and draw a bar on the graph that represents the amountof time you
expect to spend on the research, and then enter the other tasks below the first one and
representative bars at the points in time when you expect to undertake them.
The bar spans may overlap, as, for example, you may conduct research and choose software
during the same time span. As the project progresses, secondary bars, arrowheads, ordarkened
bars may be added to indicate completed tasks, or the portions of tasks that have been
completed. A vertical line is used to represent the report date.
16
.
17
SYSTEM DESIGN
Introduction:
It is a process of collecting and interpreting facts, identifying the problems, and decomposition
of a system into its component. System analysis is conducted for the purpose of studying a
system or its parts in order to identify its objectives. It is a problem-solving technique that
improves the system and ensures that all the components of the system work efficiently to
accomplish their purpose.
Analysis specifies what the system should do. It is a process of planning a new business system
or replacing an existing system by defining its components or modules to satisfy the specific
requirements. Before planning, you need to understand the old system thoroughly and
determine how computers can best be used in order to operate efficiently. System Design
focuses on how to accomplish the objective of the system.
18
BLOCK DIAGRAM
Training Phase
1. Block Based Silhouette Extraction: - Block-based silhouette extraction is a technique used
in computer vision and image processing to identify and extract the silhouette or outline of
objects in an image. The process involves dividing the image into blocks or regions and
analyzing each block to determine if it contains part of an object's silhouette.
2. Distance Transform Feature: - A distance transform feature is a mathematical transformation
applied to an image or a binary mask that computes the distance of each pixel or point in the
image to a specific target or set of target points. This feature is commonly used in image
processing and computer vision for various applications, such as object recognition, shape
analysis, and image segmentation. It provides valuable information about the spatial
relationship between objects and their proximity to specific reference points.
3. Entropy Feature: - Entropy is a statistical measure used in various fields, including
information theory, thermodynamics, and image processing. In image processing, entropy is
a feature that characterizes the amount of information or randomness present in an image. It
is commonly used as a texture and complexity descriptor for images and plays a crucial role
in image analysis and computer vision. The entropy feature quantifies the degree of
19
uncertainty or disorder within an image.
4. Training using Neural Network Model: - Training using a neural network model is the process
of teaching a neural network to make predictions or classifications by exposing it to a dataset.
During training, the network learns to adjust its internal parameters (weights and biases) in
order to minimize the difference between its predictions and the actual target values in the
dataset. The goal is to enable the network to generalize its learning to make accurate
predictions on new, unseen data.
5. NN Model: - A Neural Network (NN) model, often simply referred to as a neural network, is
a computational model inspired by the structure and function of the human brain. It's a
fundamental building block of deep learning and artificial intelligence, used for a wide range
of tasks, including pattern recognition, classification, regression, and decision-making.
Testing Phase
1. Testing Using Neural Network Model: - Testing using a neural network model is a critical
step in the machine learning and deep learning workflow. It involves evaluating the
performance of a trained neural network on a separate dataset to assess how well it
generalizes to new, unseen data. The testing process provides valuable insights into the
model's accuracy, reliability, and its suitability for real-world applications.
2. Recognized Actions: - A recognized action refers to the identification and classification
of a specific activity, gesture, or behavior based on data or sensory input. This recognition
is typically performed by computer systems, artificial intelligence, or human observers,
depending on the context. Recognized actions can have various applications in fields such
as computer vision, robotics, natural language processing, and human-
computerinteraction.
20
System Requirements:
Hardware Requirements:
1. Computer:
A modern desktop or laptop computer with a multi-core processor (Pentium or above
recommended) for smooth development and testing.
2. Memory (RAM):
At least 4GB of RAM is recommended. More RAM will be beneficial, especially when running
the development environment and the chat applicationsimultaneously.
3. Storage:
Solid State Drive (SSD) with a minimum of 128GB storage space. SSDs offer faster read/write
speeds, enhancing the overall performance of the developmentenvironment and storage of chat
logs and application data.
4. Display:
A high-resolution monitor (1920x1080 or higher) to accommodate various development tools and
improve productivity.
Software Requirements:-
1. Operating System: You can use Windows, macOS, or Linux for your development
environment. Linux is commonly preferred due to its compatibility with many deep
learning libraries and tools.
2. Python: Python is the primary programming language for most machine learning and
computer vision projects.
FFmpeg: If your project involves working with video data, FFmpeg is a powerful
21
tool for video processing, conversion, and manipulation.
Pillow (PIL): This Python Imaging Library is helpful for working with image data.
22
Anjuman-i-Islam’s
M.H.SABOO SIDDIK POLYTECHNIC
8, Saboo Siddik Polytechnic Road, Byculla
Mumbai- 400008
*******
INFORMATION TECHNOLOGY
23
Academic Session 2023-24
Programme DIPLOMA IN INFORMATION TECHNOLOGY
Course Code & Course IF-5I
4 Project Diary
6 Presentation 05
TOTAL 25
24
PROJECT DIARY FORMAT
Week No :
Activities Planned:
Activities Executed:
25
PROJECT DIARY FORMAT
Week No :
Activities Planned:
Activities Executed:
26
PROJECT DIARY FORMAT
Week No :
Activities Planned:
Activities Executed:
27
PROJECT DIARY FORMAT
Week No :
Activities Planned:
Activities Executed:
28
PROJECT DIARY FORMAT
Week No :
Activities Planned:
Activities Executed:
29
Experiment 5: Project Report
1. Certificate
2. Acknowledgement
3. Abstract
4. Content Page
Chapter 1:
5. Introduction and Background of the Industry or User
Based Problem
Chapter 2:
6. Literature Survey for Problem Identification and
Specification
Chapter 3:
7. Proposed Detail Methodology for Solving the identified
Problem with Action Plan
30
MAHARASHTRA STATE
BOARD OF TECHNICAL EDUCATION
Certificate
This is to certify that Mr. Zaid Ansari Roll no. 210808 of Fifth semester of Diploma in
Information Technology Engineering of institute M.H. Saboo SiddikPolytechnic (code: 0002)
has completed microproject satisfactorily in the subject: CPP (22316) for the academic year
2023-24 as prescribed in the curriculum.
+
Signature Signature Signature
Project guide H. O. D Principal
SEAL OF
INSTITUDE
31
ACKNOWLEDGEMENT
We wish to express our profound gratitude to our guide Ms. Sameera Khan who guided us endlessly in
the framing and completion of the micro project.
She guided us on all the main points in that micro project. We are indebted to her constant
encouragement, cooperation, and help. It was her enthusiastic support that helped us in overcoming
various obstacles in the micro-project.
We are also thankful to our Principal, HOD, faculty members, and classmates of the IT Engineering
department for extending their support and motivation in the completion of this micro-project.
Fazlur Rahman-210806
Zaid Ansari -210808
Owais Ansari -210828
Noorain Don -210831
32
ABSTRACT
Human action recognition is a fundamental task in computer vision with a wide range of applications,
from surveillance and security to human-computer interaction. This paper presents a comprehensive
review of recent advancements and challenges in the field of human action recognition in videos and
images. We discuss the key components of action recognition, including feature extraction, temporal
modelling, and classification techniques, while highlighting the importance of datasets and evaluation
metrics.
We examine the evolution of feature extraction methods, from traditional handcrafted features to deep
learning-based approaches, and explore how these techniques have improved the accuracy and
robustness of action recognition systems. Temporal modelling, including techniques such as recurrent
neural networks (RNNs) and convolutional neural networks (CNNs), is analysed for its effectiveness in
capturing the temporal dependencies in videos and image sequences.
Furthermore, we delve into the significance of large-scale action recognition datasets and benchmarking
protocols, which have played a pivotal role in advancing the field. We also discuss the latest trends in
leveraging multi-modal information, such as combining visual and depth data, to improve action
recognition accuracy and robustness.
The paper also presents an overview of challenges and open research questions in the domain of action
recognition, including fine-grained action recognition, handling occlusions, and real-time action
recognition in complex environments. We conclude by highlighting potential future directions in the
field, including the integration of explainable AI techniques and the application of action recognition in
healthcare and autonomous systems.
33
Project Report
2. Acknowledgement
3. Abstract
4. Content Page
Chapter 1:
5. Introduction and Background of the Industry or
User Based Problem
Chapter 2:
6. Literature Survey for Problem Identification and
Specification
Chapter 3:
7. Proposed Detail Methodology for Solving the
identified Problem with Action Plan
8. References and Bibliography
34
CHAPTER 1
OR
1.1 Introduction
1.2 Background
1.3 Motivation
35
1.1 INTRODUCTION: AN OVERVIEW OF THE SYSTEM
Human Action Recognition (HAR) is a subfield of computer vision and artificial intelligence that
focuses on the identification and classification of human actions, gestures, and movements in digital
images and video sequences. This technology plays a pivotal role in various applications across different
domains, from security and healthcare to entertainment and education. The primary goal of HAR is to
enable machines to understand and interpret human activities, providing valuable insights and
facilitating automation in a wide range of contexts.
1.3 MOTIVATION
The motivation behind Human Action Recognition is multifaceted. It stems from the growing demand
for automated analysis and understanding of human behavior. The rise in surveillance, the expansion of
wearable devices, and the quest for more natural human-computer interactions have fueled interest in
this field. Moreover, applications in healthcare, security, entertainment, and education have the
potential to transform these industries.
36
1.5 OBJECTIVE AND SCOPE OF THE PROJECT
The objective of this study is to advance the state-of-the-art in Human Action Recognition,
improving accuracy, speed, and adaptability to different contexts. The scope encompasses the
development of deep learning models, computer vision techniques, and datasets to facilitate
action recognition. Additionally, this research seeks to address real-world challenges in
applications like surveillance, healthcare, entertainment, and education, where accurate action
recognition can offer substantial benefits.
In this era of digital media and technology, Human Action Recognition in Videos and Images
presents an exciting avenue for research and development. The fusion of computer vision, deep
learning, and real-world applications holds the promise of creating smarter, more responsive
systems that can understand, interpret, and interact with human actions in a meaningful and
valuable way.Human-Computer Interaction: Implement the recognition system in applications
like gesture control for computers, augmented reality, or virtual reality.
Evaluation Metrics: Define appropriate metrics to assess the performance of the action
recognition system, including accuracy, precision, recall, and F1 score.
Privacy and Ethics: Address privacy concerns by considering anonymization techniques and
ethical considerations when working with video data.
Adaptability: Design the system to work in different environments and lighting conditions,
ensuring its adaptability to real-world scenarios.
Scalability: Consider the scalability of the system to handle a varying number of actions and
accommodate future expansion.
User Interface: Develop a user-friendly interface for end-users to interact with the recognition
system.
37
Chapter 2
Review of literature
2.1-Introduction
A literature review, literature survey is a text of a scholarly paper, which includes the current
knowledge including substantive findings, as well as theoretical and methodological contributions
to a particular topic. It is conducted before the project commences to give an idea about the existing
systems in that field and their pros and cons and involves a study and review of relevant literature
materials in relation to a topic you have been given. In this survey a research are done on three
Research Papers of the suspicious chat log investigation and their implementation.
38
2.4-Implementation of Human Action Recognitionusing Image Parsing
Techniques
Author:- Soumalya Sen
39
2.6-Human Action Recognition Using SmartphoneSensors
Author-Ashim Saha
As smartphones are becoming ubiquitous, many studies using smartphones are being investigated
in recent years. Further, these smartphones are being laden with several diverse and sophisticated
sensors like GPS sensor, vision sensor (camera), acceleration sensor, audio sensor (microphone),
light sensor, and direction sensor (compass). Activity Recognition is one of the potent research
topics, which can be used to provide effective and adaptive services to users. Our paper is intended
to evaluate a system using smartphone-based sensors used for acceleration, referred to as an
accelerometer. To understand six different hu- man activities using supervised machine learning
classification; to execute the model a compiled accelerometer data from different sixteen users are
collected as per their usual day to day routine consisting of sitting, standing, laying down,
walking, climbing up and down the staircase. The sample data thus generated then have been
aggregated and combined into examples upon which supervised machine learning algorithms have
been applied to generate predictive models. To address the limitations of laboratory settings, we
have used the Physics Toolbox Sensor Suite with the Google Android platform to collect these
time- series data generated by the smartphone accelerometer. This kind of activity prediction model
can be used to provide insightful information about millions of human beings merely by making
them contain a smartphone with them.
Index Terms—Activity Recognition, Machine Learning, Multi- class classification, Smartphone
time-series data, 3-axis Ac- celerometer.
Author-Chandni J. Dhamsania
Human action recognition is a way of retrieving videos emerged from Content Based Video
Retrieval (CBVR).It is a growing area of research in the field of computer vision nowadays. Human
action recognition has gained popularity because of its wide applicability in automatic retrieval of
videos of particular action using visual features. The most common stages for action recognition
includes: object and human segmentation, feature extraction, activity detection and classification.
This paper describes the application and challenges of human action recognition. Features and
limitations of various methods for human action recognition are discussed. This paper introduces
survey on different types of actions like single person action recognition, two person or person-
object interaction and multiple people action recognition.
40
2.8-SCNN: SEQUENTIAL CONVOLUTIONAL NEURAL
NETWORK FOR HUMAN ACTION RECOGNITION IN VIDEOS
Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) are two typical kinds
of neural networks. While CNN models have achieved great success on image recognition due to
their strong abilities in abstracting spa- tial information from multiple levels, RNN models have
not achieved significant progress in video analyzing tasks (e.g. action recognition), although RNN
can inherently model tem- poral dependencies from videos. In this work, we propose a Sequential
Convolutional Neural Network, denoted as SCN- N, to extract effective spatial-temporal features
from videos, thus incorporating the strengths of both convolutional opera- tion and recurrent
operation. Our SCNN model extends RNN to directly process feature maps, rather than vectors
flattened from feature maps, to keep spatial structures of the inputs. It replaces the full connections
of RNN with convolutional connections to decrease parameter numbers, computational cost, and
over-fitting risk. Moreover, we introduce asymmet- ric convolutional layers into SCNN to reduce
parameter num- bers and computational cost further. Our final SCNN deep architecture used for
action recognition achieves very good performances on two challenging benchmarks, UCF-101 and
HMDB-51, outperforming many state-of-the-art me.
Author- Cheng-Bin Jin, Shengzhe Li, Trung Dung Do, and Hakil Kim.
Abstract. This paper proposes a real-time human action recognition approach to static video
surveillance systems. This approach predicts human actions using temporal images and
convolutional neural networks (CNN). CNN is a type of deep learning model that can automatically
learn features from training videos. Although the state-of-the-art methods have shown high accuracy,
they consume a lot of computational resources. Another problem is that many methods assume that
exact knowledge of human positions. Moreover, most of the current methods build complex
handcrafted features for specific classifiers. Therefore, these kinds of methods are difficult to apply
in real-world applications. In this paper, a novel CNN model based on temporal images and a
hierarchical action structure is developed for real-time human action recognition. The hierarchical
action structure includes three levels: action layer, motion layer, and posture layer. The top layer
represents subtle actions; the bottom layer represents pos- ture. Each layer contains one CNN, which
means that this model has three CNNs working together; layers are combined to represent many
different kinds of action with a large degree of freedom. The developed approach was imple- mented
and achieved superior performance for the ICVL action dataset; the algorithm can run at around 20
frames per second.
41
2.10-Survey On Feature Extraction Approach for Human Action Recognition
in Still Images and Videos
Author – Pavan M1, Deepika D2, Divyashree R2
Human Action Recognition (HAR) has been a challenging problem yet it needs to be solved.
Recently the detection and recognition of human action has broad range of applications and is
popularized in the field of computer vision. It mainly focuses on to understand human behaviour
and name a label to each action. There are many approaches for action recognizing from both image
and video based actions. Now it is time to review these existing approaches in order to help for
future research. The main aim of this work is to study the various action recognition techniques in
videos and images. The paper presents a brief overview of features of human actions by categorizing
as still image-based and video-based. All related datasets are also introduced in this paper, which
will be helpful for future research.
2.11-Subspace Analysis Methods plus Motion History Image for Human
ActionRecognition
Author – Chunhua Du1,2, Qiang Wu1, ,Jie Yang2.
This paper proposes a new human action recognition method which deals with recognition task
in a quite dif- ferent way when compared with traditional methods which use sequence matching
scheme. Our method compresses a sequence of an action into a Motion History Image (MHI)
on which low-dimensional features are extracted using sub- space analysis methods. Unlike other
methods which use a sequence consisting of several frames for recognition, our method uses
only a MHI per action sequence for recog- nition. Obviously, our method avoids the
complexity as well as the large computation in sequence matching based methods. Encouraging
experimental results on a widely used database demonstrate the effectiveness of the proposed
method.
In video human action recognition of the continual human motion is a difficult point for application.
The method of human action recognition based on latent-dynamic condition random fields is
presented. By star form distance descriptor of human body contour, human pose is extracted. Then
in continuous sequences method building the model of LDCRF shows the mapping relation between
action feature and action semantics. Comparing with traditional CRF and HCRF, by designing the
affiliation of latent feature and human pose, LDCRF implements the modeling in internal action
and external movement feature. In the experiment, Weizmann action database is used, and three
experiments are designed. When composition continuous sequence is tested, except “skip” action,
recognition rate reaches over 90%; receiver operating characteristic of three model shows LDCRF
moels have the better descriptive capability in internal action and external movement feature;while
human action is affected by angle, accessory and occlusion. It shows LDCRF is robustness in the
human body contour integrity situation.
42
2.13-Action Recognition by Multiple Features and Hyper-sphere Multi-class
SVM
Author – Jia Liu,Jie Yang,Yi Zhang
In this paper we propose a novel framework for action recognition based on multiple features
for improve action recognition in videos. The fusion of multiple features is important for
recognizing actions as often a single feature based representation is not enough to capture the
imaging variations (view-point, illumination etc.) and attributes of individuals (size, age,
gender etc.). Hence, we use two kinds of features: i) a quantized vocabulary of local spatio-
temporal (ST) volumes (cuboids and 2-D SIFT), and ii) the higher-order statistical models of
interest points, which aims to capture the global information of the actor. We construct video
representation in terms of local space-time features and global features and integrate such
representations with hyper-sphere multi-class SVM. Experiments on publicly available datasets
show that our proposed approach is effective. An additional experiment shows that using both
local and global features provides a richer representation of human action when compared to
the use of a single feature type.
2.14-Depth Image-based Object Segmentation Scheme for Improving
Human Action Recognition
Author – Sungjoo Park1.
Human action recognition using the 3D camera for surveillance applications is a
promising alternative approach to the conventional 2D camera based surveillance.
We propose a depth image-based object segmentation scheme for improving
human action recognition. Experimental results show that the average accuracy
of the dangerous event detection is improved by about 15% when using the
proposed object segmentation scheme.
2.15-HUMAN ACTION RECOGNITION USING ROBUST POWER
SPECTRUM FEATURES
Author – Hossein Ragheb, Sergio Velastin.
We propose a new method for human action recognition from video streams that is fast and robust
to noise and to large changes in camera views. We start by extracting features in the Fourier domain
once we obtain the bounding boxes containing the silhouettes of a human for a number of video
frames representing a basic action. After preprocessing, we divide each space-time volume into
space-time sub-volumes (STSV) and compute their corresponding mean-power spectra as our
feature vectors. Our features result in high classification performance even with simple distance
measures. We perform an experimental comparison, using the same data, between our method and
two state-of-the-art methods.
43
2.16-Review on Recent Advances in Human Action Recognition
in Video Data
Author – Akshita Baisware
AI has achieved new heights in image recognition, human action recognition and NLP. It has a vast
area of implementations such as IoT, robotics, biosciences and surveillance. Video based human
action recognition has lots of potential applications that make it the most sought after field for
researchers. With the constant growth of high performance computing, computer vision and GPUs,
deep learning based human activity recognition is one of the constant evolving and promising
stream. This review focuses on recent advancements in the field of action recognition based on deep
learning. The present state of the art techniques for action recognition and prediction as well as the
future scope for the research is discussed in the paper.
2.17-Human action identification and search in video files
Author – Mirela Kundid
This paper describes an approach for modeling and recognition of human actions within videos.
With millions of videos that are published almost every day, there are new opportunities for research
in the field of search and recognition within the video sequence. Statistical approaches and
approaches based on the description of the model are described in detail in this paper and compared
to a series of videos taken from various on-line databases (KTH, Weizmann, MSR-Action). There
are various approaches to identify actions within video sequences. Approaches that are described
within this paper are based on recognition of the action of a series of images obtained segmentation
and motion picture history by constructing movement (Motion History Images MHI). In this paper
we apply the technique to construct MHI on a series of images obtained from the database used for
the analysis of movement in order to recognize the action within a video (greeting of human in
video).
2.18-Human Action Recognition Using Deep NeuralNetworks
Authors: - Rashmi R. Koli
Human activities such as body gestures are the most difficult and challenging things in deep neural
networks. Human action recognition is nothing but the human gesture recognition. Gesture shows
a movement of body parts that convey some meaningful message. Gestures are greater most suitable
and natural to have interaction with systems(computer) for humans thus it builds a platform between
humans and machines. Human activity recognition provides the platform to interact with the deaf
and dumb person. In this research work, we introduce to develop a platform for hand movement
recognition, which recognizes hand movement(gestures), by using the CNN, we can identify human
gestures in the image. As we are aware there was a quick increment in the amount of deaf and dumb
victims because of a few conditions. Since deaf and dumb people can't communicate with a typical
person so they want to depend upon a sort of visual correspondence Sign language. The sign
language gives the fine verbal exchange platform for listening to impaired men or women to bring
their minds and to have interaction with a normal character. The purpose of this research is a gadget
that broadens the popularity of gestures, which can recognize gestures and then convert gesture
images into text accordingly. The system pays special attention to the CNN training component
using the CNN algorithm. The concept includes designing a gadget that uses in-depth mastery
standards to treat input as a gesture and then provide recognizable output as text. Keywords—
Human Action Recognition, Deaf and dump, CNN, Hand gesture.
44
2.19-Human Action Invariancies for Human Action Recognition
Authors: -Nilam Nur Amir Sharif
The uniqueness of the human action shape or silhouette can be used for the human action
recognition. Acquiring the features of human silhouette to obtained the concept of human action
invariancies have led to an important research in video surveillance domain. This paper discusses
the investigation of this concept by extracting individual human action features using integration
moment invariant. Experiment result have shown that human action invariancies are improved with
better recognition accuracy. This has verified that the integration method of moment invariant is
worth explored in recognition of human action in video surveillance.
45
Chapter 3
46
3.1-SYSTEM PLANNING
Below are the steps involved in the System Development Life Cycle. Each phase within the
overall cycle may be made up of several steps.
Human action recognition is a fundamental task in computer vision and artificial intelligence, aimed
at understanding and interpreting human movements from digital data, typically video sequences.
This concept revolves around developing software systems that can automatically detect and
categorize various human actions, such as walking, running, waving, or even complex activities like
dancing or playing sports.
Human action recognition is a field within computer vision and artificial intelligence that focuses on
understanding and categorizing human movements and actions in videos or images. It has numerous
applications, including surveillance, healthcare, sports analytics, and more.
To achieve accurate human action recognition, several key steps are involved. These steps include
data,collection,pre-processing,feature-extraction,andclassification.
47
Step 3: Architectural Design
Architectural design for human action recognition typically involves computer vision and deep
learning techniques. A common approach is to use Convolutional Neural Networks (CNNs) and
Recurrent Neural Networks (RNNs) in combination..
The design will serve as a blueprint for the system and helps detect problems before these errors
or problems are built into the final system.Professionals create the system design,but must review
their work with the users to ensure the design meets users'needs.
Step 4: Coding and Debugging
Human action recognition involves identifying and categorizing various actions performed by
individuals, such as walking, running, or dancing. This field has applications in surveillance,
healthcare, sports analysis, and more.
In the coding phase, developers typically use programming languages like Python and deep learning
libraries such as TensorFlow or PyTorch to build and train action recognition models. They start by
collecting labeled datasets containing video clips or image sequences of various actions. These
datasets are then preprocessed to extract meaningful features from the data, such as optical flow or
3D joint positions.
Step 5: System Testing
System testing of human action recognition involves the evaluation of software or hardware
systems designed to identify and classify human actions from various data sources, such as videos,
images, or sensor data. This critical step ensures the system's accuracy, reliability, and
performance.
Step 6: Maintenance
To maintain progress in human action recognition, continuous data collection, algorithm
development, addressing real-world challenges, resource updates, and interdisciplinary collaboration
are essential. These efforts ensure the continued advancement of action recognition systems for
various applications, from surveillance to healthcare.
48
There are various software process models like
Prototyping Model
RADModel
The SpiralModel
The WaterfallModel
The IterativeModel
Of all these process models we’ve used the Iterative model (The Linear Sequential Model) for the
development of our project.
The Iterative model
Iterative process starts with a simple implementation of a subset of the software requirements and
iteratively enhances the evolving versions until the full system is implemented. At each iteration,
design modifications are made and new functional capabilities are added. The basic idea behind this
method is to develop a system through repeated cycles (iterative) and in smaller portions at a time
(incremental).
49
The model consists of six distinct stages, namely
(a) The problem is specified along with the desired service objectives(goals)
Software architecture
Algorithmic detail
Interface representations
The hardware requirements are also determined at this stage along with a picture of
the overall system architecture. By the end of this stage should the software engineer
should be able to identify the relationship between the hardware, software and the
associated interfaces. Any faults in the specification should ideally not be passed
down stream.
4. In the implementation and testing phase stage the designs are translated into the
software domain
Detailed documentation from the design phase can significantly reduce the
coding effort.
Testing at this stage focuses on making sure that any errors are identified and
that the software meets its required specification.
5. In the integration and system testing phase all the program units are integrated and
tested to ensure that the complete system meets the software requirements. After this
stage the software is delivered to the customer [Deliverable – The software product
is delivered to the client for acceptance testing.]
6. The maintenance phase the usually the longest stage of the software. In this phase
50
the software is updated to:
Correct errors and oversights previously undetected in the testing phases enhancing
the efficiency of the software
Advantages of the Iterative Model:
Testing is inherent to every phase of the Iterative model
It is an enforced disciplined approach
It is documentation driven, that is, documentation is produced at every stage
The waterfall model is the oldest and the most widely used paradigm. However, many
projects rarely follow its sequential flow. This is due to the inherent problems
associated with its rigid format. Namely:
It only incorporates iteration indirectly, thus changes may cause considerable
confusion as the project progresses.
Observe that feedback loops allow for corrections to be incorporated into the model.
51
TIMELINE CHART
A timeline chart is an effective way to visualize a process using chronological order. Since
details are displayed graphically, important points in time can be easy seen and understood.
Often used for managing a project’s schedule, timeline charts function as a sort of calendar of
events within a specific period of time.
A Timeline chart is constructed with a horizontal axis representing the total time span of the
project, broken down into increments (for example, days, weeks, or months) and a vertical axis
representing the tasks that make up the project (for example, if the project is outfitting your
computer with new software, the major tasks involved might be: conduct research,
choose software, install software). Horizontal bars of varying lengths represent the sequences,
timing, and time span for each task. Using the same example, you would put conduct research"
at the top of the vertical axis and draw a bar on the graph that represents the amountof time you
expect to spend on the research, and then enter the other tasks below the first one and
representative bars at the points in time when you expect to undertake them.
The bar spans may overlap, as, for example, you may conduct research and choose software
during the same time span. As the project progresses, secondary bars, arrowheads, ordarkened
bars may be added to indicate completed tasks, or the portions of tasks that have been
completed. A vertical line is used to represent the report date.
52
.
53
3.2-SYSTEM DESIGN
Introduction:
It is a process of collecting and interpreting facts, identifying the problems, and decomposition
of a system into its component. System analysis is conducted for the purpose of studying a
system or its parts in order to identify its objectives. It is a problem-solving technique that
improves the system and ensures that all the components of the system work efficiently to
accomplish their purpose.
Analysis specifies what the system should do. It is a process of planning a new business system
or replacing an existing system by defining its components or modules to satisfy the specific
requirements. Before planning, you need to understand the old system thoroughly and
determine how computers can best be used in order to operate efficiently. System Design
focuses on how to accomplish the objective of the system
54
BLOCK DIAGRAM
Training Phase
1. Block Based Silhouette Extraction: - Block-based silhouette extraction is a technique used
in computer vision and image processing to identify and extract the silhouette or outline of
objects in an image. The process involves dividing the image into blocks or regions and
analyzing each block to determine if it contains part of an object's silhouette.
2. Distance Transform Feature: - A distance transform feature is a mathematical transformation
applied to an image or a binary mask that computes the distance of each pixel or point in the
image to a specific target or set of target points. This feature is commonly used in image
processing and computer vision for various applications, such as object recognition, shape
analysis, and image segmentation. It provides valuable information about the spatial
relationship between objects and their proximity to specific reference points.
3. Entropy Feature: - Entropy is a statistical measure used in various fields, including
information theory, thermodynamics, and image processing. In image processing, entropy is
a feature that characterizes the amount of information or randomness present in an image. It
is commonly used as a texture and complexity descriptor for images and plays a crucial role
in image analysis and computer vision. The entropy feature quantifies the degree of
uncertainty or disorder within an image.
4. Training using Neural Network Model: - Training using a neural network model is the process
55
of teaching a neural network to make predictions or classifications by exposing it to a dataset.
During training, the network learns to adjust its internal parameters (weights and biases) in
order to minimize the difference between its predictions and the actual target values in the
dataset. The goal is to enable the network to generalize its learning to make accurate
predictions on new, unseen data.
5. NN Model: - A Neural Network (NN) model, often simply referred to as a neural network, is
a computational model inspired by the structure and function of the human brain. It's a
fundamental building block of deep learning and artificial intelligence, used for a wide range
of tasks, including pattern recognition, classification, regression, and decision-making.
Testing Phase
1) Testing Using Neural Network Model: - Testing using a neural network model is a critical
step in the machine learning and deep learning workflow. It involves evaluating the
performance of a trained neural network on a separate dataset to assess how well it
generalizes to new, unseen data. The testing process provides valuable insights into the
model's accuracy, reliability, and its suitability for real-world applications.
2) Recognized Actions: - A recognized action refers to the identification and classification
of a specific activity, gesture, or behavior based on data or sensory input. This recognition
is typically performed by computer systems, artificial intelligence, or human observers,
depending on the context. Recognized actions can have various applications in fields such
as computer vision, robotics, natural language processing, and human-computer
interaction.
56
3.3-System Requirements:
Hardware Requirements:
1. Computer:
A modern desktop or laptop computer with a multi-core processor (Pentium or above
recommended) for smooth development and testing.
2. Memory (RAM):
At least 4GB of RAM is recommended. More RAM will be beneficial, especially when running
the development environment and the chat applicationsimultaneously.
3. Storage:
Solid State Drive (SSD) with a minimum of 128GB storage space. SSDs offer faster read/write
speeds, enhancing the overall performance of the developmentenvironment and storage of chat logs
and application data.
4. Display:
A high-resolution monitor (1920x1080 or higher) to accommodate various development tools and
improve productivity.
3.4-Implementation Tools
Software Requirements:-
1. Operating System: You can use Windows, macOS, or Linux for your development
environment. Linux is commonly preferred due to its compatibility with many deeplearning
libraries and tools.
2. Python: Python is the primary programming language for most machine learning and
computer vision projects.
57
scikit-image: This library is useful for various image processing tasks and feature
extraction.
5. Video and Image Processing Libraries:
FFmpeg: If your project involves working with video data, FFmpeg is a powerful
tool for video processing, conversion, and manipulation.
Pillow (PIL): This Python Imaging Library is helpful for working with image data.
58