Object-Based Hybrid Deep Learning Technique For Recognition of Sequential Actions

Uploaded by

robby mahendra

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

Object-Based Hybrid Deep Learning Technique For Recognition of Sequential Actions

Uploaded by

robby mahendra

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Received 29 April 2023, accepted 23 June 2023, date of publication 3 July 2023, date of current version 10 July 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3291395

Object-Based Hybrid Deep Learning Technique

for Recognition of Sequential Actions
YO-PING HUANG 1,2,3,4 , (Fellow, IEEE), SATCHIDANAND KSHETRIMAYUM 1,

AND CHUN-TING CHIANG 1

1 Department of Electrical Engineering, National Taipei University of Technology, Taipei 10608, Taiwan
2 Department of Electrical Engineering, National Penghu University of Science and Technology, Penghu 88046, Taiwan
3 Department of Computer Science and Information Engineering, National Taipei University, New Taipei City 23741, Taiwan
4 Department of Information and Communication Engineering, Chaoyang University of Technology, Taichung 41349, Taiwan

Corresponding author: Yo-Ping Huang ([email protected])

This work was supported in part by the National Science and Technology Council, Taiwan, under Grant MOST108-2221-E-346-006-MY3
and Grant MOST111-2221-E-346-002-MY3; and in part by the AU Optronics Corporation Research Projects under Grant 209A221
and Grant 210A212.

ABSTRACT Using different objects or tools to perform activities in a step-by-step manner is a common
practice in various settings, including workplaces, households, and recreational activities. However, this
approach can pose several challenges and potential hazards if the correct sequence of actions is not followed
and the object or tool is not used in the appropriate sequence; therefore, it must be addressed to ensure safety
and efficiency. These issues have garnered significant attention in recent years. Previous research has relied
on using body keypoints to detect actions, but not the objects or tools used during activity. As a result, the
lack of a system to identify the target objects or tools being used while performing tasks increases the risk
of accidents and mishaps during the process. This study suggests a possible solution to the aforementioned
issue by introducing a model that is both efficient and durable. The model utilizes video data to monitor and
identify daily activities, as well as the objects involved in the process, thus enabling real-time feedback and
alerts to enhance safety and productivity. The suggested model separates the overall recognition process into
two components. Firstly, it utilizes the advanced BlazePose architecture for pose estimation, and interpolates
any undetected and wrong-detected landmarks to enhance the precision of the posture estimation. After
this, the features are forwarded to a long short-term memory network to identify the actions performed
during the activity. Secondly, the model also employs an enhanced YOLOv4 algorithm for object detection,
to accurately identify the objects used in the course of the activity. Finally, a durable and efficient activity
recognition model has been developed, which achieves 95.91% accuracy rate in identifying actions, a mean
average precision score of 97.68% for detecting objects, and overall activity recognition model that is capable
of processing at a rate of 10.47 frames per second.

INDEX TERMS Human activity recognition, long short-term memory (LSTM), object detection, pose
estimation, standard operating procedures (SOPs).

I. INTRODUCTION following the proper sequence can lead to accidents, such

Performing activities that involve different human actions as injuries from blades or bits, or damage to the workpiece.
and objects require careful attention to safety and efficiency. Mishandling hot surfaces, not allowing appliances to cool
If the appropriate action sequence and the correct object down, or improper use of heat sources can result in burns
or tool are not used, it can pose significant challenges and or scalds. These challenges and hazards must be addressed
potential hazards. For example, using power tools without to ensure that the activity is carried out safely and efficiently.
To address these concerns, human activity recognition (HAR)
The associate editor coordinating the review of this manuscript and can be used to monitor the activity and ensure that it
approving it for publication was Kostas Kolomvatsos . follows standard operating procedures (SOPs) that outline
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
VOLUME 11, 2023 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 67385
Y.-P. Huang et al.: Object-Based Hybrid Deep Learning Technique for Recognition of Sequential Actions

step-by-step processes to complete the task. Human pose esti- and their skeleton representation generated from body land-
mation (HPE) is a popular research field in computer vision marks. The physical differences between some actions are
that plays a significant role in activity recognition [1], [2], [3]. very small or even identical, making it difficult to iden-
The majority of these techniques rely on using optical sensors tify activities that are identical yet interacting with different
to take RGB images in order to determine body landmarks objects, such as in households, recreational, workplace
and the overall position. It is also possible to combine with activities of persons involving machine operation, mate-
other computer vision technologies for 3D animation, fitness, rial movement, maintenance, assembly, product and process
virtual and augmented reality, and rehabilitation [4], [5], [6]. design, etc.
HAR on the other hand, is a crucial computer vision task Therefore, with the growing popularity of HAR and object
that enables machines to examine the identified body land- detection in the computer vision field, it is better to have
marks from HPE models and comprehend various human a system that can accurately recognize actions sequence in
activities [7], [8], [9]. Many researchers have been driven an activity as well as detect objects used during the activity
to advance HAR systems in real-world setting by the rapid will be of profound benefit. This would aid in analyzing
growth of artificial intelligence, smart phones, and CCTV and monitoring a person’s activity to determine if they are
systems. This drive has been motivated by the role of HAR adhering to the SOPs with appropriate objects.
systems in health, security and behavioral studies. Some of The goal of this research is to create an activity recognition
their applications include patient monitoring systems [10], model for a person from video information that can detect
[11], ambient assisted living (AAL) [12], [13], surveillance their actions sequence as well as the objects being used while
systems [14], [15], gesture recognition [16], [17], behavior they are performing an activity. To achieve this, a person’s
analysis [18], and a range of healthcare systems [19], [20]. pose estimation is discovered using BlazePose [22] and unde-
In particular, vision-based human activity recognition sys- tected or wrong-detected landmarks were interpolated using
tems, which evaluate input in the form of video or image to linear interpolation method, then the information is processed
identify performed activities are quite complicated. This is by a recurrent neural network that can learn sequential order
because the appearance of the body changes dynamically due dependency, known as long short-term memory (LSTM).
to various types of clothing, occlusions caused by viewing Object detection method is carried out in the second part
angles, background context, etc. [21]. And the performance using an enhanced YOLOv4 algorithm to recognize the object
would be poor if the occlusion is very high. It is also interest- in the person’s hand while they are performing the activ-
ing to note that the majority of current studies only address the ity. Finally, a lightweight and robust system for recognizing
recognition of an action, and none really gives insight about person’s activities is created by combining the two models.
the object they use during the activity. Fig. 2 depicts the suggested architecture. Three challenges
Fig. 1 shows some example pictures of confusing cases, are considered to be resolved in this study: (1) human pose
where a person performs an action with and without object, estimation-based action detection using LSTM, (2) an object
detection model to detect objects being used in an activity,
and (3) an activity recognition model to classify the overall
activity.

FIGURE 2. Proposed activity recognition framework.

The followings are the key contributions in this study:

1) An action recognition technique is proposed that uti-
lizes body landmark information from the sequence of
FIGURE 1. Example of a confusing case for action detection. (a) RGB
images of a person performing an action with (left) and without (right) frames. We further detect object being used in order to
object, and (b) corresponding skeletal representations. make recognition system more informative.