Finding Periodicity in Space and Time
Finding Periodicity in Space and Time
435
Proceedings of the International Conference on Computer Vision,
Bombay, India, January 4-7, 1998
1
spectral energy at the highest amplitude frequency and its
multiples and the sum of the energy at the frequencies half
way between. Besides the value of the periodicity measure
itself, there is no checking on the signal harmonicity along
the curve, which is a weakness of the method. The peri-
odicity measure for an entire sequence is the maximum of
pf averaged among pixels whose highest power spectrum
values appear on the same frequency. The nal periodicity Frame 20 Frame 40
measure is used to distinguish periodic and non-periodic
motion by thresholding.
In [3], ow based algorithms are used to transform the
image sequence so that the object in consideration is sta-
bilized at the center of the image frame. Then ow mag-
nitudes in tessellated frame areas of periodic motion were
used as feature vectors for motion classi cation. In this
paper, we show that ow based methods are very sensitive
to noise.
This work di er from the above in the following ways: Frame 60 Frame 80
1) the harmonic relationship among spectral peaks is ex- Figure 1: Frames 20, 40, 60, and 80 of the 97 frame Walker
plicitly veri ed; 2) a more accurate measure of periodic- sequence, with frame size 320 240.
ity in the form of harmonic energy ratios is proposed; 3)
multiple fundamentals can be extracted along a temporal
line; 4) the values of fundamental frequencies are used in
processing to help distinguish periodicity of di erent ac-
tivities; 5) regions of periodicity are actually segmented;
and 6) the proposed algorithm does not use optical ow, (a) (b)
and is robust to noise. Figure 2: Head and ankle level XT slices of Walker se-
quence. (a) Head level. (b) Ankle level. As it is, the
2 Method periodicity in (b) is dicult to characterize.
The algorithm for periodicity detection and segmentation
consists of two stages: (1) object tracking by frame align- in front of the other. The XT and YT slices of the data
ment; (2) simultaneous detection and segmentation of re- cube reveal the temporal behavior usually hidden from the
gions of periodicity. Object tracking is by itself a research viewer. Figure 2 shows the head and ankle level XT slices
area. Decoupling object tracking and periodicity detection of the Walker sequence. In (a), the head leaves a non-
conceptually modularizes the analysis and allows the use periodic straight track while the walking ankles in (b) make
of other tracking algorithms. a crisscross periodic pattern. As it is, the periodicity in
Throughout this section, an image sequence Walker will (b) is dicult to characterize. It will be shown that frame
be used to illustrate the technical points. More challenging alignment transforms data into a form in which periodicity
examples are given in Section 3. can be easily detected and measured.
To align a sequence to a particular moving object, the
2.1 Frame Alignment trajectory of the object is rst detected. A ltering method
In this work, two types of image sequences are considered similar to the one in [8] is used here to avoid the noise sen-
for frame alignment. In practice, a large number of image sitivity of the optical ow based methods (demonstrated in
sequences can be categorized into one of these two types: Section 3). Applying 1-D median ltering along the tem-
(I) area of interest, typically a moving object, is as a whole poral dimension of the sequence ( lter length 11 was used
stationary to the camera, but the background can be mov- for Walker), the resulting sequence has mostly the back-
ing; (II) little ego-motion is involved and each moving ob- ground. The di erence sequence between the original and
ject as a whole is moving approximately frontoparallel to the background contains mainly the moving objects. Since
the camera along a straight line and at a constant speed. the object trajectories in consideration are approximately
Four frames of a sequence with a person walking across linear, the projections of the trajectories onto the XT and
the image plane is shown in Figure 1. This is a typical type YT planes (averaged XT and YT images of the di erence
II sequence. Although there are no re-occurring scenes, we sequence) are straight lines. These lines can be detected
experience the notion of repetitiveness when viewing the via a Hough transform to give the X or Y positions of the
sequence. This is due to our ability to xate on the moving moving objects in each frame. We call these position values
person, so that the person appears to be walking in place. alignment indices. The averaged XT image of the Walker
The e ect of xating can be accomplished computationally di erence sequence and the line found by the Hough trans-
by realign the image frames. Obviously, frame alignment is form method are shown in Figure 3. Each horizontal line
not necessary for type I sequences, but in fact is a process represents a frame, and the diagonal white line marks the
of transforming type II sequences into type I. object X location in each frame. Note that multiple ob-
In the following, the term data cube is used to refer to ject trajectories can be detected simultaneously using this
the 3-D (X: horizontal; Y: vertical; and T: temporal) data procedure, as will be shown in Section 3.1.
volume formed by stacking all the frames in a sequence, one Using the alignment indices, image frames in a sequence
2
(a) (b)
Figure 3: (a) Averaged XT image of the Walker sequence
after background removal. (b) Line found in (a) by using
a Hough transform method. (a1) (a2)
250
(b1) 180
(b2)
(a) (b)
160
200
140
Power Spectra
150
Gray Scale
ence sequence. The area of interest is clearly shown. (b)
100
80
20
0 0
(c1) (c2)
Time (by frame number) Frequency
i ed position in the XY plane. After alignment, the ob- Figure 5: Signals and their power spectra along temporal
ject should appear to be moving in place. This in e ect lines (columns in images). (a1) and (b1): head and ankle
is equivalent to xating on an object when viewing a se- level XT slices of aligned and cropped Walker sequence.
quence in which the object's position changes frame by (a2) and (b2): each column is the 1-D power spectra of
frame. The aligned sequences are passed to the second the corresponding column in (a1) and (b1). (c1) and (c2):
stage of the algorithm. details along the white vertical lines in (b1) and (b2). Pe-
2.2 Finding Regions of Periodicity riodicity in (b1) is re ected by the spectral harmonic peaks
in (b2).
In the second stage, 1-D Fourier transforms are performed
along the temporal dimension of an aligned sequence. The
spectral harmonic peaks are detected and used to com- and (b1), normalized among all temporal lines in the data
pute the temporal signal harmonic energy. A periodicity cube. Figure 5 (c1) and (c2) show details along the white
template is generated by using the extracted fundamental vertical lines in (b1) and (b2). While the head level slice in
frequencies and the ratios between the harmonic energy (a1) shows no harmonicity, the periodicity of the moving
and the total energy at each frame pixel location. The ankles in (b1) is re ected by the spectral harmonic peaks
original sequence is then masked for regions of periodicity. in (c2). We refer to the spectral energy corresponding to
To save computation and storage, an aligned sequence the harmonic peaks as the temporal harmonic energy and
can be cropped to limit processing to the area of inter- propose using the temporal harmonic energy ratio, which
est. The cropping does not a ect the periodicity detec- is the ratio between the harmonic energy and the total
tion. The location and size of the cropping window can energy along a temporal line, as a measure of temporal
be estimated from the average XY image of the aligned periodicity at the corresponding frame pixel location.
di erence sequence. Figure 4 shows such XY image of the For spectral harmonic peak detection, we adapt the 2-D
Walker sequence and the aligned and cropped original se- peak detection algorithm in [6] for 1-D signals. The sig-
quence with splits near the center of the frames to show nal along a temporal line is rst zero-meaned and Gaus-
the inside of the data cube. sian tapered, and then its power spectrum computed via
Now consider an aligned and cropped data cube. Frame a fast Fourier transform. To locate the harmonic peaks,
pixels with the same X and Y locations form straight lines local maxima of the power spectrum are found using size
in the cube. Call these lines the temporal lines. If the 7 neighborhood and excluding values below 10% of the en-
cropped frame size is Nx by Ny , then there are Nx Ny tire spectral range. A local maximum marks the location
temporal lines in the data cube. In the aligned sequence, of a spectral harmonic peak when its frequency is either
the object of interest moves in place. If the object is mov- a fundamental or a harmonic. A fundamental is de ned
ing cyclically in any manner, the periodicity will be re- as a frequency that can be used to linearly express the
ected in some of the temporal lines. Figure 5 (a1) and frequencies of some other local maxima. A harmonic is a
(b1) show the head and the ankle level XT slices of 64 frequency that can be expressed as a linear combination
frames (Frame 17 to 80) of the data cube in Figure 4 (b). of some fundamentals. Starting from the lowest frequency
Each column in the images is a temporal line. These im- to the highest, each local maxima is checked rst for its
ages are the aligned and cropped version of the two XT harmonicity | if its frequency can be expressed as a linear
slices in Figure 2. Columns in Figure 5 (a2) and (b2) are combination of the existing fundamentals, and then for its
the 1-D power spectra of the corresponding columns in (a1) fundamentality | if the multiples of its frequency, com-
3
bined with the multiples of existing fundamentals, coincide
with the frequency of another local maximum. A toler-
ance of one sample point is used in the frequency match-
ing. Note that multiple fundamental frequencies can exist
along a temporal line.
Due to the nature of the temporal signal and the e ect
of the Gaussian taper, a spectral harmonic peak usually
does not appear as a single impulse. In this work, a peak (a) (b)
support region is determined by growing from the detected
peak location outward along the frequency axis until the Figure 6: (a) Temporal harmonic energy ratio values of
spectral value is below 5% of the spectrum range. After the aligned Walker sequence. High value indicates more
the spectral peaks and their supports are identi ed, it is periodic energy at the location. (b) Using the alignment
straightforward to compute the harmonic energy ratio as- indices, the four frames in Figure 1 are masked by the
sociated with a fundamental frequency and its harmonics. template shown in (a) and then stacked together.
The peak detection technique discussed above fails
when a temporal line contains only one sinusoidal signal,
which produces a single spectral peak. However, this situ-
ation arises only when the edge of a moving object has a si-
nusoidal pro le. An example is a vertical sine grating pat-
tern horizontally translating frontoparallel to the camera
at a constant speed. Natural edges, patterns, and surfaces
hardly ever have such a pro le. Therefore, higher harmon-
ics usually accompany the fundamentals of the temporal
signals.
Applying the peak detection procedure to all temporal
lines in a data cube, the periodicity template of the aligned
sequence is built by registering the fundamental frequen-
cies and the corresponding values of temporal harmonic
energy ratio at each pixel location in a data structure ar-
ray of frame size. At places where no periodicity is found,
the template data structure has value zero. Under circum-
stances such as a noisy background, some speckles may
appear in the template. Simple morphological closing and
opening operations can be applied to remove the speckles.
Figure 6 (a) shows the temporal harmonic energy ratio
values of the Walker sequence after one closing and one
opening operation with a circular structuring element of
diameter 3. The larger the energy ratio value, the more
periodic energy is at the location. As expected, the bright-
est region is the wedge shape created by the walking legs.
The head, the shoulder, and the outline of the backpack
are detected because the walker bounces. The hands ap- Figure 7: Left column: frames 40, 61, and 88 of the Trio
pear at the front of the body since in most parts of the sequence. Right column: frames in the left column masked
sequence the walker was xing his gloves and moving his by the periodicity templates.
hands in a rather periodic manner. Note that the mov-
ing background and parts of the walker do not appear in The Walker and Trio sequences were recorded by a hand-
the template since there is no periodicity present in those held consumer-grade camcorder. The Dog and Wheels se-
areas. quences were taken by the same camera set on a tripod.
Using the alignment indices generated at the rst stage, The Jumping Jack sequence was recorded by a xed Beta-
the periodicity template of a sequence can be used to mask cam camera in an indoor setting. Except for the Jumping
the original sequence for the regions of periodicity in each Jack, none of the subjects in the sequences was aware of the
frame. Figure 6 (b) shows the four frames in Figure 1 after lming; hence the activities are natural and exhibit nat-
they are masked and then stacked together. ural irregularities. All original sequences have 320 240
Since the non-periodic activities of the background do frame size.
not light up in the templates, it is clear that the sequence
cropping for ecient computation does not a ect the pro-
cessing results. These examples are used to demonstrate 1) the e ec-
tiveness of the new algorithm in nding and characterizing
3 Examples periodicity in various settings; 2) the robustness of the al-
gorithm under noisy conditions; and 3) the noise sensitivity
In addition to the Walker sequence, four examples are of optical ow based estimation methods, which have been
used here to demonstrate the e ectiveness of the pro- used for trajectory detection in many existing works, but
posed algorithm: Trio, Dog, Wheels, and Jumping Jack. are avoided by the method proposed here.
4
(a) (b)
Figure 8: (a) Averaged XT image of the Trio sequence
after background removal. (b) Lines found in (a) by using
the Hough transform method.
3.1 Trio
Trio is a 156 frame sequence of three people walking and
passing each other. Frames 40, 61, and 88 of the sequence
are shown in the left column of Figure 7. As in the Walker
example, the averaged XT image is computed after the
background removal. The lines in the XT image are de-
tected via Hough transform. Figure 8 shows the averaged
XT image and the detected lines. These lines provide the
alignment indices of each objects. Note that the alignment
indices of three objects are estimated simultaneously.
To generate the periodicity templates, the original se-
quence is aligned and cropped for each moving person. All
aligned sequences contain 64 frames. Figure 9 shows ex-
ample frames of each aligned sequences and the harmonic
energy ratio values of the periodicity templates. Again, the
goal here is not to segment out the people, but to detect
and characterize regions of periodicity, such as legs, arms,
the outline of bouncing head and shoulder, and even the
dangling straps of the backpack. Finally, the templates are
used to mask the original sequence. Examples are shown Figure 9: Example frames of aligned sequences and the
in the right column of Figure 7. harmonic energy ratio values of the periodicity templates
Notice that, besides the center person, there is a sec- for each individuals of the Trio sequence. First two
ond or even a third person passing through in all three columns: example frames. Right column: harmonic en-
aligned sequences. However, these passersby have no ef- ergy ratio values.
fect on the results of periodicity detection since they are
one-time events on a temporal line, and therefore do not quency is removed. The fence region of the frame in (a) is
contribute to the temporal harmonic energy. The Trio ex- shown in (g) while other regions of periodicity are shown
ample demonstrates that the proposed algorithm is well in (h).
suited for the detection of multiple periodicities, even un-
der the circumstances of temporary object occlusion. 3.3 Wheels
3.2 Dog The examples shown so far all involve walking. However,
the algorithm is not limited to periodicity caused by hu-
Dog is a 104 frame sequence where a person walks two man activities, but works in general for any periodic space-
dogs in front of a picket fence. Figure 10 (a) shows frame time phenomenon.
46 of the original sequence, and (b) shows frame 13 of the Wheels is a 64 frame sequence of a car passing by a
64-frame aligned sequence. Images (c1) and (c2) show the building. Near the top of the building, two spinning wheels
rst and second fundamental frequencies in the periodic- are connected by a gure 8 belt. One side of the belt is
ity template, while (e1) and (e2) are the corresponding patterned and appears periodic. Every region with peri-
harmonic energy ratios. Note that there are double funda- odicity should be captured: the hub caps, the wheels, and
mentals at many pixel locations. one side of the belt. As shown in Figure 11, the algorithm
The complication here is the picket fence. In the orig- accomplishes just that.
inal sequence, the fence is part of the xed background,
exhibiting pure spatial periodicity. However, when the se- 3.4 Jumping Jack
quence is aligned to the person and the dogs, the fence
starts to move in the background, leaving a periodic sig- There is no translatory motion in the Jumping Jack se-
nature on many temporal lines. As shown in (c1) and (e1), quence, and the background is smooth. This sequence and
the fence area lights up in the periodicity template. its noisy versions (corrupted by additive Gaussian white
Figure 10 (d) shows the fundamentals with value noise (AGWN) of variance 100 and 400) are used to demon-
0:875, which is the temporal frequency of the fence in strate the robustness of the new algorithm in the presence
the aligned sequence. The fundamental frequency values of noise, and also to show the noise sensitivity of the optical
are used to extract the fence. Figure 10 (f) shows the ow based motion estimation. The length of the sequences
harmonic energy ratios in the template after the fence fre- used here is 128 due to the cycle of the jumping motion.
5
(a) (b)
(c1) (e1)
(c2) (e2)
(d) (f)
(g) (h) The third row of Figure 12 shows the 57th TY (not
YT!) image of each sequence, revealing the tracks left by
Figure 10: Dog sequence. (a) Frame 46 of original se- the right hand and leg. The rows in these images are tem-
quence. (b) Frame 13 of aligned sequence. (c1) and (c2): poral lines, and the corresponding power spectra are shown
rst and second fundamental frequencies in periodicity in the fourth row of the gure. The periodicity templates
template. (e1) and (e2): harmonic energy ratios corre- can be found in the bottom row. Although the noise causes
sponding to the frequencies in (c1) and (c2). (d) Funda- some degradation in the arm regions, the templates are
mentals with the fence frequency. (f) Harmonic energy well preserved overall. The reason why the proposed al-
ratios after the fence frequency is removed. (g) Frame 46 gorithm is not a ected by large amounts of white noise in
masked to show fence region. (h) Frame 46 masked to the input is that white noise only contributes to the rela-
show other regions of periodicity. tively smooth part of the power spectrum. As long as the
noise energy is not so high that it overwhelms the spectral
Most of the related work uses ow based methods to harmonic peaks, the algorithm works.
locate moving objects in a sequence. However, the noise
sensitivity of the ow based method can be a drawback. 3.5 Walker
The optical ow magnitudes shown here were obtained by The detection results of the Walker sequence were shown
using the hierarchical least-squares algorithm [9], which is in Section 2. Here we show the results from noisy inputs
based on a gradient approach described by [10] [11]. Two (original sequence corrupted by AGWN of variance 100
pyramids are built, one for each of the two consecutive and 400), using 64 frames. The resulting periodicity tem-
frames, and motion parameters are progressively re ned plates in Figure 13 show that, unlike optical ow based
by residual motion estimation from coarse images to the methods, the proposed algorithm is robust in the presence
higher resolution images. This algorithm is representative of noise.
of the existing optical ow estimation techniques. The
optical ow magnitudes of the Jumping Jack frame 61 are
shown in the second row of Figure 12. Given a clean input,
4 Discussion
the ow magnitudes can be used to segment the moving Compared to the method used in [4], the periodicity mea-
object. However, the algorithm is mostly ine ective under sure proposed here in the form of the temporal harmonic
the noisy conditions. energy ratio is a more accurate and more reliable measure
6
Original AGWN Var=100 AGWN Var=400
7
5 Summary [10] J. Bergen et al. Hierarchial model-based motion esti-
A new algorithm for nding periodicity in space and time mation. In Proc. ECCV, pages 237{252, 1992.
is presented. The algorithm consists of two main parts: [11] B. Lucas and T. Kanade. An iterative image registra-
1) object tracking by frame alignment, which transforms tion technique with an application to stereo vision.
data into a form in which periodicity can be easily detected In Proc. Image Understanding Workshop, pages 121{
and measured; 2) Fourier spectral harmonic peak detection 131, 1981.
and energy computation to identify regions of periodicity
and measure its strength. This method allows simultane-
ous detection, segmentation, and characterization of spa-
tiotemporal periodicity, and is computationally ecient.
The e ectiveness of the technique and its robustness to
noise over optical ow based methods are demonstrated
using a variety of real-world video examples.
Periodicity templates are proposed as a new way of
characterizing spatiotemporal periodicity. The templates
contain information such as the fundamental frequencies
and the temporal harmonic energy ratios at each frame
pixel location. The periodicity templates and the template
generating algorithm are useful tools for applications such
as action recognition, video databases, and video surveil-
lance.
Acknowledgments
The authors would like to thank Aaron Bobick and Sandy
Pentland for insightful discussions, Jim Davis for the
Jumping Jack sequence, and John Wang for the hierar-
chical optical ow estimation program.
References
[1] D.D. Ho man and B.E. Flinchbuagh. The interpre-
tation of biological motion. Biological Cybernatics.,
pages 195{204, 1982.
[2] M. Allmen and C.R. Dyer. Cyclic motion detection
using spatiotemporal surface and curves. In Proc.
ICPR, pages 365{370, 1990.
[3] R. Polana and R. Nelson. Low level recognition of
human motion. In IEEE Workshop on Motion of Non-
rigid and Articulated Objects, pages 77{82, Austin,
TX, Nov. 11-12 1994.
[4] R. Polana and R. C. Nelson. Detecting activities. In
Proc. CVPR, pages 2{7, New York, NY, June 1993.
[5] A.F. Bobick and J.W. Davis. Real-time recognition
of activity using temporal templates. In Proc. Third
IEEE Workshop on Appl. of Comp. Vis., pages 39{42,
Sarasota, FL, Dec. 1996.
[6] F. Liu and R. W. Picard. Periodicity, directionality,
and randomness: Wold features for image modeling
and retrieval. IEEE T. Pat. Analy. and Machine In-
tel., 18(7):722{733, July 1996.
[7] H. Wold. A Study in the Analysis of Stationary Time
Series. Stockholm, Almqvist & Wiksell, 1954.
[8] S. A. Niyogi and E. H. Adelson. Analyzing gait with
spatiotemporal surfaces. In IEEE Workshop on Mo-
tion of Non-rigid and Articulated Objects, pages 64{
69, Austin, Texas, Nov. 11-12 1994.
[9] J.Y.A. Wang. Layered Image Representation: Identi-
cation of Coherent Components in Image Sequences.
PhD thesis, Dept. of EECS, MIT, Cambridge, Sept.
1996.