Recognizing Human Activities by Key Frame in Video Sequences
Recognizing Human Activities by Key Frame in Video Sequences
Guojian Cheng
School of Computer, Xian Shiyou University, Xian, P.R.China
Email: [email protected]
I. INTRODUCTION
Human activity recognition is one of the most
appealing, yet challenging problems of computer vision
[1]. Reliable and effective solutions to this problem
would be highly useful in many areas, such as behavioral
biometrics [2], contend-based video analysis [3], security
and surveillance [4,5]. However, current methods to
these problems are very limited, and understanding
human behavior by computers remains unresolved.
Human activity recognition has been a widely studied
topic, but the solutions to this problem that have been
Manuscript received September 21, 2009; revised December 5,
2009; accepted February 10, 2010.
Based on Key-frame based Human Activity Recognition, by Hao
Zhang, Zhijing Liu, Haiyong Zhao, and Guojian Cheng, which
appeared in the Proceedings of 2009 WASE Global Congress on
Science Engineering, Taiyuan, China, December 25-27, 2009. 2009
WASE.
This research was supported by funds from National Natural Science
Foundation of China (NSFC) under Grant 40872087.
*corresponding author.
819
r=
Figure 1. The flowchart of activity recognition.
w(t )
.
h(t )
820
find that the activities i.e. walk, run and bend are
periodic obviously, as shown in Fig. 3(a,b,c). The
minima in rs are determined to the separate point in two
cycles, and the maximums to the key frame as below
821
Feature Extraction
TR f ( , )
=
= R{ f ( x, y )}
f ( ) = TR2f ( , )d ,
(2)
1 if x = 0
0 otherwise
( x) =
822
D (1,1) = d (1,1)
D (i 1, j )
D (i, j 1)
DIST ( A, B ) = D(m, n)
v=
180
u
return activitynumber;
}
Experiment Data
In experiments, the number of key frame in 6 types of
activities is shown in Table 1. Each category is divided
into two parts, including modeling and test.
The resultant silhouettes contain holes and intrusions
due to imperfect subtraction, shadows and color
similarities with the background. To train the activity
models, holes, shadows and other noise are removed
manually. The synthetic data are taken as ground truth
TABLE I.
Number
of frame
Total key
frame
Gallery
frame
Grobe
frame
Walk
Run
Bend
Jump
Crouch
Faint
125
85
126
38
16
13
109
78
113
32
823
TABLE II.
Activity
Type
Recognition
rate (%)
Walk
Run
Bend
Jump
Crouch
Faint
97.25
96.15
100
93.75
100
100
Figure 6.
data.
The raw data include such cases as disjoint silhouettes,
silhouettes with holes and silhouettes with missing parts.
Compared with the ground truth data, they are
incomplete data.
Shadow and other noises may add an extra part to the
human silhouette, and thus induce redundant data.
The incomplete data and redundant data are of low
quality, and thus they are used for testing the
performance of the transform. Fig. 6 shows some
such examples.
In order to test the robustness of transform, we
select the frame with less feature information rather than
the frames shown in Fig. 4. In other words, we use the
frames near the key one to substitute. These artificially
generated data are defined as key frame loss data.
Fig. 6 shows the transform of the walking shape
in different data case. For the cases of incomplete data
and ground truth data, the transform is similar, but
the transform of redundant data varies significantly in the
peak of the curve. In fact, transform is sensitive to
Comparison
As shown in Table 3, our method not only
outperforms other three methods, but also has three
advantages as follow. Firstly, it has lower computational
complexity. The key frames extracted from video
sequences are utilized in feature matching, so that it costs
less time compared with Chens work [11], which
matches features in two sequences. Secondly, its
representation is simple. Though Oikonomopoulos work
[12] represents activities with a codebook in details, it is
more complex and ambiguous than our method using
transform descriptor. Finally, our method is general for
824
ACKNOWLEDGMENT
The authors would like to thank CASIA to provide
activity database and the anonymous reviewers for their
constructive comments.
REFERENCES
[1] Pavan Turaga and Rama Chellappa, V. S. Subrahmanian,
Octavian Udrea, Machine Recognition of Human
Activities: A Survey, IEEE Trans. Circuits Syst. Video
Technol., vol. 18, no. 11, pp. 1473-1488, November 2008.
[2] S. Sarkar, P. J. Phillips, Z. Liu, I. R. Vega, P. Grother, and
K. W. Bowyer, The Human ID gait challenge problem:
Data sets, performance, and analysis, IEEE Trans. Pattern
Anal. Mach. Intell., vol. 27, no. 2, pp. 162177, February
2005.
[3] Y. Rui, T. S. Huang, and S. F. Chang, Image retrieval:
Current techniques, promising directions and open issues,
J. Visual Commun. Image Represent., vol. 10, no. 1, pp.
3962, March 1999.
825