0% found this document useful (0 votes)
6 views11 pages

Angle Insensitive Human Motion and Posture Recognition Based on 4D Imaging Radar and Deep Learning Classifiers

The document presents a study on angle-insensitive human motion and posture recognition using 4D imaging radar and deep learning classifiers, addressing challenges in classifying movements at unfavorable angles and static postures. A hierarchical processing pipeline is proposed that leverages multiple data representations, achieving a classification accuracy of 87.1% on a custom dataset. The study emphasizes the advantages of mm-wave radar technology in human activity recognition, particularly in indoor environments for health monitoring applications.

Uploaded by

yoyetor739
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views11 pages

Angle Insensitive Human Motion and Posture Recognition Based on 4D Imaging Radar and Deep Learning Classifiers

The document presents a study on angle-insensitive human motion and posture recognition using 4D imaging radar and deep learning classifiers, addressing challenges in classifying movements at unfavorable angles and static postures. A hierarchical processing pipeline is proposed that leverages multiple data representations, achieving a classification accuracy of 87.1% on a custom dataset. The study emphasizes the advantages of mm-wave radar technology in human activity recognition, particularly in indoor environments for health monitoring applications.

Uploaded by

yoyetor739
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Delft University of Technology

Angle-insensitive Human Motion and Posture Recognition Based on 4D imaging Radar


and Deep Learning Classifiers

Zhao, Yubin; Yarovoy, Alexander; Fioranelli, Francesco

DOI
10.1109/JSEN.2022.3175618
Publication date
2022
Document Version
Accepted author manuscript
Published in
IEEE Sensors Journal

Citation (APA)
Zhao, Y., Yarovoy, A., & Fioranelli, F. (2022). Angle-insensitive Human Motion and Posture Recognition
Based on 4D imaging Radar and Deep Learning Classifiers. IEEE Sensors Journal, 22(12), 12173-12182.
https://doi.org/10.1109/JSEN.2022.3175618

Important note
To cite this publication, please use the final published version (if applicable).
Please check the document version above.

Copyright
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent
of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Takedown policy
Please contact us and provide details if you believe this document breaches copyrights.
We will remove access to the work immediately and investigate your claim.

This work is downloaded from Delft University of Technology.


For technical reasons the number of authors shown on this cover page is limited to a maximum of 10.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2022.3175618, IEEE Sensors
Journal
IEEE SENSORS JOURNAL, VOL. XX, NO. XX, XXXX 2017 1

Angle-insensitive Human Motion and Posture


Recognition Based on 4D imaging Radar and
Deep Learning Classifiers
Yubin Zhao, Alexander Yarovoy, Fellow, IEEE, Francesco Fioranelli, Senior Member, IEEE

Abstract— The need for technologies for Human Activity Recog-


nition (HAR) in home environments is becoming more and more
urgent because of the aging population worldwide. Radar-based
HAR is typically using micro-Doppler signatures as one of the
main data representations, in conjunction with classification al-
gorithms often inspired from deep learning methods. One of the
limitations of this approach is the challenging classification of
movements at unfavorable aspect angles (i.e., close to 90◦ ) and of
static postures in between continuous sequences of activities. To
address this problem, a hierarchical processing and classification
pipeline is proposed to fully exploit all the information available
from millimeter-wave (mm-wave) 4D imaging radars, specifically the
azimuth and elevation information in conjunction to the more conventional range, Doppler, received power, and time
features. The proposed pipeline uses the two complementary data representations of Point Cloud (PC) and spectrogram,
and its performance is validated using an experimental dataset with 6 activities performed by 8 participants. The results
show good performance of the proposed pipeline compared with alternative baseline approaches in the literature, and
the effect of key parameters such as the amount of training data, signal-to-noise levels, and virtual aperture size is
investigated. Leave-one-subject-out test is also applied to study the impact of body characteristics on the generalizability
of the trained classifiers.
Index Terms— Human Activity and Posture Recognition, Imaging Radar, Multiple Angle Classifier, Deep Learning

I. I NTRODUCTION or interact with the sensor [7], [8].


The increasingly aging population and the prevalence of The basic principle of radar-based HAR is that each human
non-communicable diseases have made in recent years more activity has unique kinematic patterns, and such patterns can
and more urgent the capability to monitor patient health at be represented by measurable features such as the velocity of
home. This raises the importance of indoor human activity different human body parts along with the physical extent of
recognition (HAR) enabling automatic monitoring systems to their movements, namely micro-Doppler signatures [9], [10].
potentially improve life quality, reduce hospitalization, and Radar-based HAR methods are generally data-driven and can
most importantly provide timely help in case of emergencies, be divided into two categories in terms of the applied data
such as in case of serious fall or stroke events [1], [2]. representations and corresponding classification algorithms. To
From a technical perspective, automated HAR was origi- the first category belong, studies where handcrafted features
nally mostly based on visual aids [3] or wearable sensors [4], are extracted from the pre-processed radar data and then used
[5]. However, both types of sensors - cameras of various kinds together with supervised Machine Learning algorithms. For in-
and inertial measurement units - exhibit inherent limitations stance, studies such as [11]–[13] used handcrafted features and
[6], such as poor functionality in darkness or intense light machine learning algorithms (e.g. Support Vector Machine,
conditions as well as potential privacy issues for cameras, and SVM) to classify different human activities, intended as indi-
end-users’ inconvenience to carry or wear sensors that might vidual motions performed by human subjects (e.g., walking,
lead to poor compliance and usage. On the other hand, radar sitting still, boxing, and so on). To the second category belong,
is gaining attention for its potential advantages: it provides other studies where radar data are represented and treated as
consistent sensing quality regardless of light conditions, and image-like or video-like inputs which are treated by means of
works contactless, without end-users needing to carry, wear, Deep Learning algorithms. For example, spectrogram images
were used as the input to Deep Convolutional Neural Network
This work was in part supported by the NWO KLEIN (M1) RAD-ART models [14]–[16], and sequences of spectrogram images were
project awarded to F. Fioranelli. processed by Long-Short Time Memory models, a variant of
Y. Zhao, A. Yarovoy and F. Fioranelli are with the Faculty of Electrical
Engineering, Mathematics and Computer Science, TU Delft, Delft, The recurrent neural network [17], [18].
Netherlands.(e-mail: [email protected]). Radar-based HAR approaches that depend strongly on the

1530-437X (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: TU Delft Library. Downloaded on June 01,2022 at 06:34:10 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2022.3175618, IEEE Sensors
Journal
2 IEEE SENSORS JOURNAL, VOL. XX, NO. XX, XXXX 2017

Doppler information suffer from two inherent limitations. proposed pipeline also shows robust results in case of low
Firstly, the classified movements are typically artificially lim- signal-to-noise ratio (SNR), varying dimensions of the virtual
ited to be performed along the line-of-sight direction of the apertures (i.e., the number of array channels that worsen or
radar, or at a small aspect angle so that the Doppler infor- improve angular resolutions), and leave-one-subject-out test
mation remains representative enough. The second limitation to validate performance for unseen individuals.
is that static postures on their own or in between continuous The rest of the paper is organized as follows. Section II de-
sequences of movements are rarely investigated, since their scribes the proposed method. Section III presents the measured
micro-Doppler signatures are not easily distinguishable from dataset for validating the performance of the proposed method.
the static clutter. Section IV discusses the attained results for the proposed
To overcome the first limitation, the usage of distributed method and its comparison with the state of the art. Finally,
or even multistatic radar systems has been proposed, in order conclusions are drawn in Section V.
to simultaneously sample and reconstruct the micro-Doppler
signatures from different aspect angles [18]–[22]. While effec- II. D ESCRIPTION OF THE P ROPOSED P IPELINE AND
tive, this approach requires the usage of multiple nodes, with C OMPARATIVE B ASELINES
an increase in complexity of the overall system and the need to
cope with the synchronization of data from the different sensor Radar conventionally used in HAR (i.e., radar with a single
nodes. Another approach, recently proposed thanks to the receiver and operating at the relatively low carrier frequency
availability of millimeter-wave (mm-wave) MIMO (Multiple in the 5.8 or 24 GHz ISM bands) generates data from which
Input Multiple Output) radars, is based on the use of 4D information related to four intrinsic features of the object
imaging radar that exploits azimuth and elevation resolution can be extracted: range as 1D spatial information, Doppler
capabilities to attain additional spatial information on the proportional to the target’s radial velocity, received power
subject’s posture [23], [24]. While promising, this approach proportional to the Radar Cross Section of the object, and
is still not investigated in detail in the literature and there is a the temporal relations from the movements of the body parts.
scope to define effective classification processing pipelines that The usage of these four features or representations of the data
can take the advantages of mm-wave 4D imaging capabilities has been thoroughly analyzed in the literature, and it appears
for HAR. that the state-of-the-art research mostly relies on the Doppler
To address the aforementioned radar-based HAR issues of information [25]–[28].
unfavorable orientations (in terms of aspect angles with respect To overcome the inherent limitations of Doppler informa-
to radar’s line-of-sight) and static human postures, this work tion, additional intrinsic features must be introduced. 4D imag-
proposes a radar-based classification pipeline that exploits the ing radar at mm-wave frequencies can provide an estimation
richer information provided by mm-wave 4D imaging radar. of the spatial occupancy of the human body in height and
The main contributions of the proposed pipeline are as follows. width in different positions. Specifically, the usage of multiple
channels in an antenna array allows estimating the angles of
• Unlike in the other studies such as [23] where the arrival of the targets. At mm-wave frequencies, human bodies
radar point clouds are treated as images, the proposed are perceived as extended targets, with multiple scatterers
pipeline is designed with the goal to exploit all the six generated by each moving body part forming the so-called
intrinsic ’features’ obtained by imaging radar, namely point clouds (PCs) [23]. This scattering behavior, combined
range, azimuth, elevation, Doppler, received power, and with the angular estimation capabilities on both azimuth and
time information, rather than focusing on just one specific elevation, enables a new, broader ’feature space’ to explore
data representation. for radar-based HAR. However, when operating at mm-wave,
• The hierarchical structure of the proposed pipeline sim- a disadvantage to account for is that the detection range is
plifies the task of HAR in a multi-angle scenario, with shorter than at lower frequencies conventionally used for HAR
the help of the designated neural networks, T-Net, and such as the ISM bands of 2.4, 5.8, and 24 GHz, because of
achieves classification of both static postures and dynamic the higher propagation losses. Nevertheless, current mm-wave
motions. systems at 60-77 GHz show good detection and classification
• The pipeline is designed to be robust to a noisy and capabilities at the ranges of interest for HAR applications.
limited amount of radar data by replacing the Max Pool- The overview of the proposed method to exploit all these
ing layer in T-Net and PointNet with Average Pooling, features is given in Figure 1. Specifically, this method exploits
and deliberately using lighter-weight neural networks, the six intrinsic features of range, azimuth, elevation, Doppler,
respectively. received power, and time by combining both PCs and spec-
To validate the performance of the proposed pipeline, a custom trograms as the input data representations. The hierarchical
experimental dataset is collected including 8 human subjects structure of the classification pipeline includes the so-called
performing 6 in-place activities (including 4 motions and 2 ’orientation classification module’ to first classify to which
postures) at 5 different orientations. The measurement was orientation the human subject is facing toward (e.g., 0, 45,
conducted in an office-like room to simulate an indoor real-life 90, 135, 180◦ ). Then, based on the predictions made by this
environment. The proposed pipeline attains the classification module, the ’PC classification module’ predicts which posture
accuracy of 87.1%, which is significantly higher than the or motion pair the input belongs to (more details about the
state-of-the-art alternatives applied to the same dataset. The definition of motion pair is given later in this section). Finally,

1530-437X (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: TU Delft Library. Downloaded on June 01,2022 at 06:34:10 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2022.3175618, IEEE Sensors
Journal
AUTHOR et al.: IMAGING RADAR-BASED HUMAN ACTIVITY RECOGNITION 3

Fig. 1. Overview of the proposed classification pipeline, where the main contributions are the parallel processing and fusion of PCs and
spectrograms, and the usage of T-Net to obtain angular orientation insensitivity. Besides the data generation and pre-processing operations (top
box), the three main modules include the orientation classification, the PC classification, and the spectrogram classification.

for those inputs predicted as a motion pair, the ’spectrogram TABLE I


classification module’ is then utilized to classify to which PARAMETER SPECIFICS USED IN THE DATA GENERATION MODULE .
Parameter Value Parameter Value
specific motion the input sample belongs, e.g., bending over or Number of range and Number of azimuth
256 128
Doppler FFT points FFT points
standing up from bending, or, sitting down or standing up from Number of elevation Number of reference cells
64 8
sitting. The descriptions of the main modules are as follows. FFT points in range domain
Number of reference cells Number of guard cells
4 12
in Doppler domain in Doppler domain
1) The data generation module starts with measuring Number of guard cells
4
Scaling factor for
6.3
in Doppler domain OS-CFAR algorithm
experimental data via the imaging radar. Given the Overlapped window
Window length for STFT 128 127
measured raw imaging radar data, the signal processing length for STFT
Power threshold below the
flow involves two parallel branches: (a) 2D Fast Fourier maximum power level per spectrogram
40dB

Transform (FFT) is applied on fast-time and slow-


time domains to estimate range and Doppler spectra;
then 2D FFT is also applied on the virtual channel
domain to obtain the azimuth and elevation information. 2) The orientation classification module includes a so-
PCs are generated by applying the order-static constant called T-Net architecture modified from [29]. T-Net was
false alarm rate detector in the range-Doppler domain, originally used to transform the input PC to achieve
subsequently detecting local peak values in the azimuth angle insensitivity. T-Net outputs a 3×3 transformation
domain and the global peak value for the elevation matrix used to be multiplied with the input PC such
domain. Each coherent processing interval (frame du- that the rotated PCs are transformed to the same aspect
ration) is 100ms. PCs from 20 frames are aggregated to angle. Essentially, the feature learning layers within
represent one segment of activity, since the PC generated T-Net can learn the geometric characteristics of PCs,
from a single coherent processing interval is typically which are related to the human orientation. Therefore, in
not dense enough to represent the shape of a human the proposed implementation the multi-perception layers
body [23]. (b) spectrograms are generated by applying a in the original T-Net are modified to be compatible
Short Time Fourier Transform (STFT) on the slow-time with a classification task, and the max pooling layer
axis aggregated over the range bins where the human is replaced with the average pooling layer to achieve
subject is present. The specific parameters used for the more robustness to noisy data. Based on the predictions
data generation module are listed in Table I. on human orientation, the multi-angle human activity

1530-437X (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: TU Delft Library. Downloaded on June 01,2022 at 06:34:10 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2022.3175618, IEEE Sensors
Journal
4 IEEE SENSORS JOURNAL, VOL. XX, NO. XX, XXXX 2017

classification problem is simplified to be a uni-angle the proposed pipeline and is named as baseline-1.
problem. Furthermore, additional comparisons are made with other
3) For each predicted orientation of the human subject, the pioneering studies investigating the feasibility of applying
PC classification module is then used. This module imaging radar for radar-based HAR. In particular, [23] used
is adapted from PointNet [29] by proposing a new ’snapshots’ of PCs aggregated over the interval of activity as
global symmetric function average pooling to replace the input to their designed deep convolutional neural network.
the original max pooling in order to be robust to Their method essentially treats imaging radar-based HAR as a
noise. The PC classification module makes a prediction 2D image classification problem, unlike the proposed pipeline
according to the spatial distribution of PCs aggregated that processes the 3D coordinates of the detected PCs. It is
over 20 frames. As previously mentioned, without this crucial to determine whether the full use of all the intrinsic
aggregation, the sparsity of the radar PC over one or few features provided by 4D imaging radar helps to achieve better
frames would make the classification task for human HAR performance, so [23] is used as a replica of Orientation
movements or postures too challenging. An undesired classification module, PC classification module and the entire
consequence of this aggregation over time is that dy- pipeline for comparison. This comparative architecture is
namic activities and motions that are implicitly shorter named baseline-2. Furthermore, since the classifier in [23] is
than the duration of 20 frames would result in very sim- not explicitly compared with other image-based classifiers in
ilar PCs, for example sitting down & standing up from the literature, four of them - namely, VGG [32], ResNet [33],
sitting, or bending over & standing up from bending. DenseNet [34] and ViT [35] - are also used for comparison in
Therefore, the output of this module may include classes this paper. They are named baseline-3 to baseline-6.
that are possible motion pairs (e.g., (a) sitting down on a
chair or standing up from sitting, and (b) bending over or III. DATASET D ESCRIPTION
standing up from bending), or static postures such as (c) To validate the performance of the proposed pipeline, a mm-
sitting still and (d) standing still. These are summarized wave MIMO FMCW radar developed by Texas Instruments
in Table II. (cascaded AWR2243 radar) was used to collect an experimen-
4) The last module is the spectrogram classification mod- tal dataset. Mm-wave FMCW MIMO radar has been a popular
ule. This module takes the predicted class from the choice in short-range applications such as indoor HAR thanks
previous module and uses the spectrogram as the input to its flexibility, low cost, and small physical size as an off-the-
to the AlexNet [30] to recognize the individual motion shelf product, mostly driven by the technological development
class within the motion pair (a) or (b). The choice of in the automotive sector. Specifically, the cascaded AWR2243
AlexNet [30] is due to its relatively simple structure and radar operates at 79GHz with 12 transmitters and 16 receivers.
easier convergence compared to deeper neural networks, Using MIMO configuration, the radar produced a virtual array
but it theoretically can be replaced by other spectrogram- with an aperture of 43λ × 3λ (see Figure 2), where λ is the
based classification approaches from the literature [25], center wavelength of the transmitted signal. For the purpose
[26] if desired. Therefore, the final classification output of comparative studies, we mostly inherited the FMCW wave-
of the proposed pipeline is a combination of modules form parameters from [23], which are given as follows: chirp
using both PC and spectrograms as input data represen- duration 63 µsec, chirp slope 60 MHz/µsec, chirps per frame
tations. 128, frame period 100 ms, frequency bandwidth 2.84 GHz, and
A/D sampling rate 2.7 MHz. The radar specifications derived
As claimed by [31], an angle-insensitive HAR pipeline from these parameters are as follows: range resolution of 52.8
should be robust to achieve angle-insensitive HAR given mm, azimuth resolution of 1.4 degrees (at broadside), elevation
training data collected at multiple orientations, or even ideally resolution of 18 degrees (at broadside), velocity resolution
given training data collected at one orientation only. Thus, two of ±0.0286 m/s. These resolutions are expected to provide
definitions are given for how to use a classification pipeline in sufficient information on human dynamics as well as body
terms of different combinations of training data with respect shapes.
to human orientation. These include training with data from
one human orientation and testing with multiple orientations, TABLE II
which is termed as SAC (Single Angle Classifier), and MAC L IST OF MOTIONS , MOTION PAIRS , AND POSTURES FOR THE MEASURED
(Multiple Angle Classifier) if training is performed with data DATASET. M OTION PAIRS AND POSTURES REPRESENT THE
INTERMEDIATE OUTPUT CLASSES ( A - B - C - D ) OF THE PC
collected at multiple orientations. It should be noted that
CLASSIFICATION MODULE , WHEREAS INDIVIDUAL MOTIONS AND
the orientation classification module becomes insignificant for POSTURES REPRESENT THE FINAL OUTPUT CLASSES OF THE
SAC cases, and thus is bypassed in the proposed pipeline. PROPOSED PIPELINE (1-6).
Angle Sensitivity Matrix (ASM) and Angle Sensitivity Vec-
Motion
tor (ASV) are used as metrics to evaluate the classification Motion Posture
pair
performance of SAC and MAC, respectively, as proposed in 1. Sitting down c 5. Sitting still
[31]. The former has two dimensions of training and test a
2. Standing up from sitting d 6. Standing still
orientations, and the latter is compacted from a matrix into a 3. Bending over
b
vector, as data of all orientations are used for training together. 4. Standing up from bending
The work in [31] is used as a comparative baseline to evaluate

1530-437X (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: TU Delft Library. Downloaded on June 01,2022 at 06:34:10 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2022.3175618, IEEE Sensors
Journal
AUTHOR et al.: IMAGING RADAR-BASED HUMAN ACTIVITY RECOGNITION 5

noise to raw data, the final obtained SNR values are 20dB,
18dB, 15dB, 13dB, 10dB and 8dB. Moreover, ”small-aperture
datasets” were generated by selecting the raw data of only
a subset of virtual channels for subsequent signal processing
(essentially the subarray-1 to 4 shown in Figure 2). This was
done to test the effect of reduced angular resolutions on the
generated PCs and on the subsequent classification.

IV. E XPERIMENTAL R ESULTS AND D ISCUSSION


A. Results of the Proposed Pipeline
This section presents the results of the proposed pipeline
classifying 6 activities. It should be noted that the training
and testing are independently repeated 5 times to increase the
reliability of the results, so the results presented in this section
express the values averaged from 5 realizations.
The orientation classification module provides very accurate
Fig. 2. Visualization of the virtual channels in the used AWR2243 predictions of human subjects’ orientation with an accuracy
imaging radar, where array and subarrays are defined according to the
number of channels used for the subsequent signal processing. of 97.5%, which will significantly simplify the task for PC
and spectrogram classification modules. The PC classification
module also makes promising predictions with an average
TABLE III
B ODY CHARACTERISTICS OF THE 8 PARTICIPANTS .
accuracy of 97.9% (defined as the sum of the true positive
samples for classes a-d divided by the total number of samples
Subject index 1 2 3 4 5 6 7 8 Mean±Std
Height (cm) 180 168 170 180 185 178 177 177 176.9±5.5 in Table IV). The spectrogram classification module, however,
Weight (kg) 75 70 70 70 95 82 72 72 75.7± 8.8 has less favorable performance (83.8%) according to the sum
of the number of true positive samples for classes 1-2 and
3-4 divided by the number of true samples for classes a and
The dataset was collected in an office-like room with b, respectively, in Table IV. Specifically, the binary classi-
tables, chairs, and cabinets to simulate a real-life indoor fication accuracy for sitting down/standing up from sitting
environment. The radar was placed at 0.75m height from the is 85.7%, and the binary classification accuracy for bending
ground to illuminate the whole human body in the field-of- down/standing up from bending is 81.0%. These numbers
view. A chair was placed 2.7m away from radar in the Y- are lower than the results in some state-of-the-art studies,
axis direction and participants performed activities around it. for example, almost 100% classification accuracy of sitting
Overall, 6 activities were included in the dataset, consisting down and standing up from sitting was achieved in [27] and
of 4 most common daily motions, and 2 postures that can more than 90% in [28]. However, it should be noted that
be viewed as the transitional states between such motions this difference in performance is related to the fact that these
(Table II). The measurements include eight human subjects, studies did not consider the multi-angle scenario, as more than
whose body characteristics are very similar, as shown in Table 65% of the misclassifications of classes 1-4 are actually caused
III. During the measurements, postures and motions were by the samples where the human subjects performed motions
recorded separately. Specifically, a complete time interval for at 90◦ orientation. Considering all the above, each stage of
each measurement was 2 minutes. During this time, the human the hierarchical pipeline exhibits satisfactory performance with
subjects were asked to perform either one static posture, e.g. classification accuracy between 81.0% and 99.0%, leading to
sitting still on the chair, or a motion pair, e.g., sitting down an overall average classification accuracy of 87.1% for the
and standing up from sitting for a period of approximately 2 proposed pipeline classifying 6 activities.
seconds for each individual motion. Labels are then generated
manually for these data by visual segmentation. The measured TABLE IV
dataset consists of 2,239 samples and is divided into 80% for C ONFUSION MATRIX FOR THE PROPOSED PIPELINE TO CLASSIFY 6
training and 20% for testing. In addition to the processing ACTIVITIES , WHERE VALUES WERE AVERAGED FROM 5 INDEPENDENT
chain mentioned in Section II, snapshots of the front view of REALIZATIONS .

the aggregated PCs as in [23] are also stored for comparative a b c d


studies. True/Pred.
1 2 3 4 5 6
To further examine the performance of the proposed pipeline 1 347 73 1 0 0 0
a
in more challenging conditions, some additional datasets were 2 48 413 0 0 0 0
generated based on the measured one. Specifically, ”noisy 3 0 2 324 76 0 3
b
data” were generated by adding additive white Gaussian noise 4 0 1 77 332 0 0
to the measured raw data (i.e., prior to any processing steps c 5 0 5 0 3 432 24
to generate spectrograms or PCs) while assuming the original d 6 0 5 0 5 9 443
measured data to be noise-free. Based on the amount of added

1530-437X (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: TU Delft Library. Downloaded on June 01,2022 at 06:34:10 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2022.3175618, IEEE Sensors
Journal
6 IEEE SENSORS JOURNAL, VOL. XX, NO. XX, XXXX 2017

B. Results of Ablation Study


Through the removal of specific modules in the proposed
pipeline, this section is expected to study their individual
contribution. For example, the PC classification module alone
could be adapted to directly predict which activity the subject
is performing; or, in case of the orientation classification
module being removed, the PC classification module and
spectrogram classification module together could accept input
samples from all orientations. Table V shows the classification
results attained in these ablation studies by mean accuracy/F1-
score and their standard deviation from five different realiza-
tions, where each realization includes an independent training
and testing. It is reasonable to conclude that each classifi-
cation module makes a crucial contribution to the overall
performance of the proposed pipeline. Meanwhile, the results Fig. 3. ASVs of the proposed pipeline and the baseline-1 [31] as a
attained with only one classification module on its own are function of orientation.
significantly lower than the others, as the accuracy and F1-
score in the first two rows are significantly lower than their
counterparts in the other rows. This not only demonstrates the
advantage of the hierarchical structure, but also reveals the
significance of the two complementary data representations
that exploit all available features of mm-wave 4D imaging
radar.
Fig. 4. ASMs of (a) the proposed pipeline for 4 motions, (b) the
TABLE V baseline-1 [31] for 4 motions, and (c) the proposed pipeline for 6
S UMMARY OF THE ACCURACY AND F1- SCORE RESULTS FROM THE activities. The horizontal axis is test angle, the vertical axis is training
ABLATION STUDY WITH THE DIFFERENT MODULES IN THE PROPOSED angle, and from top to bottom and from left to right the values are 0, 45,
PIPELINE . 90, 135 and 180◦ , respectively.
Used Modules Accuracy F1
PC classification 74.3%±1.1% 74.1%±1.2%
spec. classification 73.5%±3.2% 74.4%±1.9% baseline-1. Three main points are drawn from this analysis
Ori classification & as follows:
77.1%±2.5% 76.4%±2.1%
spec. classification

PC classification & • Figure 3 shows that the closer 90 the human orien-
80.9%±1.0% 80.7%±1.2%
spec. classification tation is toward, the worse the results are, considering
Ori classification &
PC. classification
81.9%±1.8% 81.4%±1.2% all methods. This is presumably due to the fact that
Full Pipeline 87.1% ±1.2% 86.7%±1.3% movements with a torso orientation of 90◦ do not generate
representative Doppler features for classification.
• Figure 4 shows that the proposed pipeline outperforms the
method in [31] for the seen test data, as shown by the cells
C. Results vs the Comparative Baseline Approaches on the diagonal. Yet, the proposed pipeline appears to be
This subsection presents the results comparing the proposed more sensitive to angle variations than Yang’s method
pipeline with the alternative baselines from the literature [31] given test data collected at an orientation close to
described in Section II, together with the ASM and ASV the training data (e.g., the cells on the non-diagonal upper
metrics from [31] as a function of training and testing angles left or bottom right parts). Meanwhile, for those results
for the different human activities1 . obtained from a test angle far from the angle of the
When using the proposed pipeline as a SAC, the orientation training data (e.g., the cells on the upper right or bottom
classification module becomes pointless since only the data left parts), the proposed pipeline again shows superiority
of one orientation can be used for training. As a result, thanks to the use of additional spatial information.
only the PC and the spectrogram classification modules are • Following the definitions of quantitative metrics for angle
utilized from the pipeline. The corresponding ASM and ASV sensitivity in [31], the mean value (X̄) and L2-distance
in comparison with those of the baseline-1 [31] are given (||v||2 ) of ASV and ASM are reported in Table VI. As
in Figures 3 and 4, and Table VI. It should be noted that, it can be seen, the proposed pipeline mean accuracy is
since the conventional spectrogram-based classifiers lack the significantly higher than that of baseline-1, and the L2-
necessary spatial information to classify static postures, only distance is smaller. Therefore, quantitatively speaking, the
four motions are considered in our implementation of the proposed pipeline is better at classifying motions and /
or postures than baseline-1 in terms of both MAC and
1 Neural networks in [31] and [23] are implemented by the authors and
SAC.
trained from scratch, as the pre-trained models or training data were not made
publicly available Table VII shows the results in terms of accuracy and F1

1530-437X (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: TU Delft Library. Downloaded on June 01,2022 at 06:34:10 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2022.3175618, IEEE Sensors
Journal
AUTHOR et al.: IMAGING RADAR-BASED HUMAN ACTIVITY RECOGNITION 7

TABLE VI
Q UANTITATIVE METRICS INSPIRED FROM THE baseline-1 [31].
C OMPARISON OF RESULTS USING THE PROPOSED PIPELINE AND THEIR
CLASSIFICATION APPROACH .

MAC-based results SAC-based results


Method Activity
X̄ ||v||2 X̄ ||v||2
proposed 4 motions 82.1% 9.4% 47.7% 10.5%
proposed 6 activities 83.8% 7.4% 50.8% 9.9%
baseline-1 [31] 4 motions 76.3% 10.8% 45.6% 10.9%

scores for the different baselines from the literature [23], [32]–
[35] and the proposed pipeline. The proposed pipeline shows
a lead 12.2% in accuracy and 13.1% in F1-score compared
with baseline-2 to 6 when classifying 6 activities end-to-end.
Furthermore, the accuracy advantage of the proposed pipeline
is at least 22.4% for classifying human orientations and at least
11.5% when the image classifier baselines are used as a replica
of the proposed PC classification module. All these results Fig. 5. Classification accuracy of the proposed pipeline and two
selected baselines with respect to varying number of training samples.
appear to suggest that fully exploiting all the information
provided by an imaging radar, e.g. using together spectrograms
and the PC coordinates as input to classifiers, is more helpful shrinks sharply from 87.1% to 74.0% accuracy for 6 activities
in achieving good classification results than treating PCs as and from 83.0% to 73.5% for 4 motions, as the number of
’snapshot’ images, as the latter approach implicitly causes a training samples decreases. Nevertheless, the test accuracy
loss of information. remains on average higher than or approximately equal to
the state-of-the-art baselines in [23], [31] as the number of
TABLE VII
training data is reduced. up to the point of using only 20% of
C LASSIFICATION ACCURACY AND F1 SCORE OF THE PROPOSED
PIPELINE VS BASELINES IN THE LITERATURE FOR THE TASKS OF the training data (equivalent to 358 samples per activity).
ORIENTATION CLASSIFICATION , PC- BASED CLASSIFICATION , AND To conclude, in terms of different quantitative metrics
OVERALL HAR WITH 6 CLASSES . Diff INDICATES PERFORMANCE presented in this subsection, the proposed pipeline in general
DIFFERENCES COMPARED TO THE PROPOSED PIPELINE . has significantly superior performance over the baselines in 1)
Method Task Acc. Diff. F1 Diff. angle insensitivity, 2) classification accuracy, and 3) robustness
proposed HAR 87.1% 0% 86.7% 0% against a limited number of training samples.
baseline-2 HAR 74.9% -12.2% 73.6% -13.1%
baseline-3 HAR 70.1% -17.0% 68.9% -17.8%
baseline-4 HAR 65.3% -21.8% 64.5% -22.2%
baseline-5 HAR 62.7% -24.4% 61.9% -24.8%
baseline-6 HAR 58.4% -28.7% 58.2% -28.5% D. Results for Different SNR
proposed Ori. classification 97.5% 0% 97.3% 0%
baseline-2 Ori. classification 75.1% -22.4% 73.9% -23.4%
This section focuses on one of the most influential pa-
baseline-3 Ori. classification 53.4% -44.1% 52.8% -44.5% rameters in radar systems, SNR. Because of the randomness
baseline-4 Ori. classification 56.3% -41.2% 56.4% 40.9% nature of additive noise, data generation, training and testing
baseline-5 Ori. classification 62.8% -34.7% 61.5% -34.8%
baseline-6 Ori. classification 62.1% -35.4% 61.6% -35.7% are repeated for five independent realizations and averaged for
proposed PC classification 99.0% 0% 97.2% 0% every value of considered SNR, as described in Section III.
baseline-2 PC classification 86.0% -13.0% 85.8% -11.4% Figure 6 shows the average test accuracy and the standard
baseline-3 PC classification 87.5% -11.5% 87.4% -9.8%
baseline-4 PC classification 83.5% -15.5% 82.9% -14.3% deviation of these five realizations in terms of varying SNR
baseline-5 PC classification 80.2% -18.8% 79.9% -17.3% levels. As can be seen, the classification performance decrease
baseline-6 PC classification 81.8% -17.2% 81.4% -15.8% almost linearly along with decreasing SNR levels, and the
accuracy could drop to nearly 50% for an SNR of 8dB, which
Radar data measurement is generally more complex and is approximately 35% lower than the measured data (assumed
time-consuming than using vision-based sensors, thus limiting to be noise-free in this evaluation). These results suggest that
the typical radar-based HAR dataset to be much smaller than noisy data could significantly undermine the performance of
the computer vision datasets. To be more specific, Yang’s work the proposed pipeline. Last and most importantly, the perfor-
[31] and Kim’s work [23] included 60 and 288 samples per mance gain due to the proposed replacing of max pooling
activity, respectively, whereas our dataset includes roughly with average pooling is clearly shown by comparing the blue
300 training samples per activity. Therefore, an important curve with the red in Figure 6. This appears to indicate that
comparison is about the classification pipeline performance average pooling as a symmetric function fits better the task
with respect to a limited number of training samples. Figure of processing noisy radar data compared to the original max
5 presents the test accuracy of the baselines and the proposed pooling in [29].
pipeline trained with only a randomly selected percentage of For data-driven classification methods, it is also interesting
the training data available from the measurements. The results to evaluate whether differences in SNR between training
in Figure 5 show that the performance of the proposed pipeline and test data could influence classification performance. In

1530-437X (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: TU Delft Library. Downloaded on June 01,2022 at 06:34:10 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2022.3175618, IEEE Sensors
Journal
8 IEEE SENSORS JOURNAL, VOL. XX, NO. XX, XXXX 2017

from a different one. Hence, the cross-examination results of


training on the large-aperture dataset and testing on the small-
aperture dataset and vice versa are also given in Table IX. The
following findings are drawn from this analysis. First, since the
largest decrease in performance is as small as 9.2% in accuracy
and 9.1% in F1-score, it is reasonable to conclude that the
proposed pipeline has promising robustness to be applied on
different imaging radars, i.e. with different and often smaller
apertures. Secondly, as the classification accuracy continuously
drops from the first to the fifth row in Table IX, it is clearly
established that the classification results are significantly influ-
enced by the MIMO aperture size. Thirdly, the significance of
the consistency between training and test data is highlighted
according to the huge accuracy drop as shown in the cross-
examination evaluations, suggesting that simply re-using the
data collected with different radar may not be an optimal
Fig. 6. Classification accuracy with respect to varying SNR levels. data augmentation method without some manipulation either
Average pooling is used in the proposed pipeline, whereas max pooling on the data or on the classification pipeline. Finally, in the
refers to the original PointNet and its transformation network T-Net in cross-examination evaluation, the combination of array and
[29].
subarray-2 always provides substantially better results. This
could be the result of our signal processing chain, i.e., local
this case, we can cross-validate the results of ’training-with- peak value detection is utilized in the azimuth domain but
measured-data’ and ’testing-with-noisy-data’, and vice versa. not in the elevation domain so that PCs are generated by the
The results are listed in Table VIII. These results show that virtual arrays with the same horizontal (i.e., azimuthal) MIMO
radar-based PCs and spectrograms in noisy conditions are aperture are similar.
unlikely to be sufficiently representative for classifying human
body shapes and postures, hence additional work would be TABLE IX
needed to make the pipeline more robust to severe degradation P ERFORMANCE OF THE PROPOSED PIPELINE WITH DATA WITH VARYING
MIMO APERTURE , I . E . DIFFERENT ANGULAR RESOLUTIONS .
in SNR. C ROSS - EXAMINATION RESULTS OF TRAINING & TESTING WITH
DIFFERENT APERTURES ARE ALSO SHOWN
TABLE VIII
C ROSS - EXAMINATION OF THE ROBUSTNESS OF THE TRAINED PIPELINE
Train array Test array Acc. Diff. F1 Diff.
GIVEN TESTING DATA OF UNSEEN SNR LEVELS , WHERE meas. array array 87.1% 0% 86.7% 0%
INDICATES THE ORIGINAL MEASURED DATASET THAT IS ASSUMED TO BE
subarray-1 subarray-1 83.6% -3.5% 83.4% -3.3%
subarray-2 subarray-2 83.3% -3.8% 83.0% -3.7%
NOISE - FREE .
subarray-3 subarray-3 79.4% -7.7% 79.2% -7.5%
Train SNR Test SNR Acc. Diff. F1 Diff. subarray-4 subarray-4 77.8% -9.3% 77.5% -9.1%
meas. meas. 87.1% 0% 86.7% 0% array subarray-1 26.4% -60.7% 25.7% -61.0%
meas. 20dB 67.8% -19.3% 67.4% -19.3% array subarray-2 58.7% -29.4% 58.8% -27.9%
meas. 18dB 65.3% -21.8% 64.6% -22.1% array subarray-3 27.4% -50.7% 26.8% -59.9%
meas. 15dB 60.0% -21.1% 58.7% -28.0% array subarray-4 31.6% -55.5% 31.8% -54.9%
meas. 13dB 55.3% -31.8% 52.8% -33.9% subarray-1 array 28.1% -59.0% 26.3% -60.4%
meas. 10dB 528% -34.3% 49.8% -36.9% subarray-2 array 72.8% -14.3% 67.8% -18.9%
meas. 8dB 47.8% -39.3% 44.4% -42.3% subarray-3 array 26.8% -60.3% 25.9% -60.8%
subarray-4 array 29.0% -58.1% 23.6% -63.1%
20dB meas. 69.0% -18.1% 68.7% -18.0%
18dB meas. 50.1% -37.0% 50.4% -36.3%
15dB meas. 49.6% -37.5% 50.8% -35.9%
13dB meas. 41.8% -45.3% 43.2% -43.5%
10dB meas. 44.9% -42.2% 45.1% -43.6% F. Results of Leave-One-Subject-Out Test
8dB meas. 31.4% -55.7% 30.8% -55.9%
It is expected that different human participants may have
their own body size and height, and thus exhibit specific char-
acteristics in their kinematic patterns when performing daily
E. Results for Variations in MIMO Aperture activities. In the interest of training a generally-applicable
Using an imaging radar with a relatively smaller MIMO classification pipeline, it is significant to learn whether the
aperture can be economical and power-efficient, whereas this body physical characteristics and their kinematic patterns can
comes at the cost of reduced angular resolutions. The results be directly linked by the proposed pipeline. In an ideal
obtained from four separate pairs of training and test datasets situation where human subjects share similar body physical
are given in Table IX, where the definitions of subarray-1 to 4 characteristics, like in our dataset (see Table III), assuming that
are visualized in Figure 2. Moreover, we investigate the results the kinematic patterns are very similar across individuals of
when training the proposed pipeline with the data obtained similar body shape and size, these could be learnt by observing
from one of the virtual radar apertures, but testing with the data a small set of individuals. However, as it can be seen in

1530-437X (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: TU Delft Library. Downloaded on June 01,2022 at 06:34:10 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2022.3175618, IEEE Sensors
Journal
AUTHOR et al.: IMAGING RADAR-BASED HUMAN ACTIVITY RECOGNITION 9

The proposed hierarchical classification pipeline attains an


accuracy of 87.1%, significantly outperforming the state-of-
the-start methods that only utilize either point cloud or spec-
trogram in isolation. It is also demonstrated that the proposed
pipeline has substantial robustness against possible unfavor-
able conditions such as low SNR levels, a limited amount
of training data, or relatively poor angular resolutions. The
possibility of transferring trained models amongst different
people is also demonstrated with a leave one subject out test,
even if additional work is needed to increase performance in
this case. The data generation module can be further improved
by introducing automated segmentation methods such that
each data sample contains no redundant information but just
the desired activity on its own. Furthermore, architectures
such as recurrent neural networks (e.g. GRU or LSTM cells)
could be further integrated into the pipeline to model temporal
connections between activities performed at different time
Fig. 7. Classification accuracy of the PC classification module (blue
bars), the spectrogram classification module (red bars), and the entire steps one after the other. Moreover, the pipeline should be
pipeline (orange bars) in the leave-one-subject-out test, where the validated with additional data in terms of number of partici-
horizontal axis represents the index of the left-out subject. pants and types of activities. More participants, where possible
with diverse body size and height, will help explore the
generalization capabilities of the proposed pipeline to different
Figure 7, in the case of classifying activities of an unseen physical characteristics. More activities, such as for example
human subject, the mean accuracy drops to 63.5%, which is falling to the ground or other critical activities, will support
approximately 20% lower than the results in Section IV-A. To the application of the proposed pipeline in a more realistic
be more specific, the mean PC classification accuracy and the HAR setting.
mean spectrogram classification accuracy are approximately
80% and 71%, showing decreases of more than 17% and 12%,
respectively. ACKNOWLEDGMENT
Firstly, this finding fits the typical patterns in the leave-one- The authors are grateful to all the volunteers who helped
subject-out tests also observed in other studies with different with the data collection.
data for HAR [19], [20]. Secondly, these values suggest that,
despite very similar body characteristics, individuals still have R EFERENCES
distinct kinematic patterns due to their specific way of moving [1] S. A. Shah and F. Fioranelli, “RF sensing technologies for assisted daily
when performing the daily activities. These differences are living in healthcare: A comprehensive review,” IEEE Aerospace and
what cause the performance of the proposed pipeline to suffer Electronic Systems Magazine, vol. 34, no. 11, pp. 26–44, 2019.
[2] C. S. Florence, G. Bergen, A. Atherly, E. Burns, J. Stevens, and
when a leave-one-subject-out test is applied. In addition, as C. Drake, “Medical costs of fatal and nonfatal falls in older adults,”
shown in Figure 7, the lowest accuracy is attained from subject Journal of the American Geriatrics Society, vol. 66, no. 4, pp. 693–698,
6 (height 178 cm, weight 82 kg) instead of subject 5 whose 2018.
[3] N. Lu, Y. Wu, L. Feng, and J. Song, “Deep learning for fall detection:
body characteristics (height 185 cm, weight 95 kg) can be seen Three-dimensional CNN combined with LSTM on video kinematic
as an outlier. This again appears to show that the similarity data,” IEEE journal of biomedical and health informatics, vol. 23, no. 1,
of the kinematic patterns does not necessarily exist even for pp. 314–323, 2018.
[4] S. C. Mukhopadhyay, “Wearable sensors for human activity monitoring:
people who have very similar body characteristics of shape and A review,” IEEE sensors journal, vol. 15, no. 3, pp. 1321–1330, 2014.
size, suggesting the importance of having as much diversity as [5] T. R. Bennett, J. Wu, N. Kehtarnavaz, and R. Jafari, “Inertial measure-
possible in the training data in order to have a more generally- ment unit-based wearable computers for assisted living applications:
A signal processing perspective,” IEEE Signal Processing Magazine,
applicable classification pipeline. vol. 33, no. 2, pp. 28–35, 2016.
[6] K. Chaccour, R. Darazi, A. H. El Hassani, and E. Andres, “From
V. C ONCLUSION fall detection to fall prevention: A generic classification of fall-related
systems,” IEEE Sensors Journal, vol. 17, no. 3, pp. 812–822, 2016.
This work presents a pipeline for recognizing human mo- [7] C. Li, Z. Peng, T.-Y. Huang, T. Fan, F.-K. Wang, T.-S. Horng, J.-
tions and static postures performed toward multiple orien- M. Munoz-Ferreras, R. Gomez-Garcia, L. Ran, and J. Lin, “A review
on recent progress of portable short-range noncontact microwave radar
tations using a 4D imaging radar. The proposed pipeline systems,” IEEE Transactions on Microwave Theory and Techniques,
starts with the data generation module, including two parallel vol. 65, no. 5, pp. 1692–1706, 2017.
processing chains to generate PC and spectrogram as input [8] J. A. Nanzer, “A review of microwave wireless techniques for human
presence detection and classification,” IEEE Transactions on Microwave
data representations. Through the combination of PCs and Theory and Techniques, vol. 65, no. 5, pp. 1780–1794, 2017.
spectrograms to represent the varying human body shapes and [9] F. Fioranelli, J. Le Kernec, and S. A. Shah, “Radar for health care: Rec-
kinematic patterns, respectively, the information provided by ognizing human activities and monitoring vital signs,” IEEE Potentials,
vol. 38, no. 4, pp. 16–23, 2019.
4D mm-wave imaging radar in terms of range, Doppler, az- [10] D. Tahmoush, “Review of micro-Doppler signatures,” IET Radar, Sonar
imuth, elevation, received power, and time are fully exploited. & Navigation, vol. 9, no. 9, pp. 1140–1146, 2015.

1530-437X (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: TU Delft Library. Downloaded on June 01,2022 at 06:34:10 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2022.3175618, IEEE Sensors
Journal
10 IEEE SENSORS JOURNAL, VOL. XX, NO. XX, XXXX 2017

[11] M. Jia, S. Li, J. Le Kernec, S. Yang, F. Fioranelli, and O. Romain, [32] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
“Human activity classification with radar signal processing and machine large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
learning,” in 2020 International Conference on UK-China Emerging [33] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
Technologies (UCET). IEEE, 2020, pp. 1–5. recognition,” in Proceedings of the IEEE conference on computer vision
[12] S. Zhu, J. Xu, H. Guo, Q. Liu, S. Wu, and H. Wang, “Indoor human and pattern recognition, 2016, pp. 770–778.
activity recognition based on ambient radar with signal processing [34] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely
and machine learning,” in 2018 IEEE international conference on connected convolutional networks,” in Proceedings of the IEEE confer-
communications (ICC). IEEE, 2018, pp. 1–6. ence on computer vision and pattern recognition, 2017, pp. 4700–4708.
[13] Y. Kim and H. Ling, “Human activity classification based on micro- [35] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
Doppler signatures using a support vector machine,” IEEE transactions Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in
on geoscience and remote sensing, vol. 47, no. 5, pp. 1328–1337, 2009. neural information processing systems, vol. 30, 2017.
[14] R. Zhang and S. Cao, “Real-time human motion behavior detection via
CNN using mmwave radar,” IEEE Sensors Letters, vol. 3, no. 2, pp.
1–4, 2018.
[15] B. Erol, S. Z. Gurbuz, and M. G. Amin, “Motion classification using
kinematically sifted acgan-synthesized radar micro-Doppler signatures,”
IEEE Transactions on Aerospace and Electronic Systems, vol. 56, no. 4, Yubin Zhao received the B.S. degree in elec-
pp. 3197–3213, 2020. trical engineering from University of Electronic
[16] L. Tang, Y. Jia, Y. Qian, S. Yi, and P. Yuan, “Human activity recognition Science and Technology of China, Chengdu,
based on mixed CNN with radar multi-spectrogram,” IEEE Sensors China, in 2019 and the M.S. degree in electrical
Journal, vol. 21, no. 22, pp. 25 950–25 962, 2021. engineering from Delft University of Technology,
[17] J. Zhu, H. Chen, and W. Ye, “A hybrid CNN–LSTM network for the Delft, The Netherlands, in 2022.
classification of human activities based on micro-Doppler radar,” IEEE
Access, vol. 8, pp. 24 713–24 720, 2020.
[18] H. Li, A. Shrestha, H. Heidari, J. Le Kernec, and F. Fioranelli, “Bi-
LSTM network for multimodal continuous human activity recognition
and fall detection,” IEEE Sensors Journal, vol. 20, no. 3, pp. 1191–1201,
2019.
[19] R. G. Guendel, M. Unterhorst, E. Gambi, F. Fioranelli, and A. Yarovoy,
“Continuous human activity recognition for arbitrary directions with
distributed radars,” in 2021 IEEE Radar Conference (RadarConf21). Alexander Yarovoy (F’15) graduated from the
IEEE, 2021, pp. 1–6. Kharkov State University, Ukraine, in 1984 with
[20] Y. Zhao, R. Gundel, A. Yarovoy, and F. Fioranelli, “Distributed radar- the Diploma with honor in radiophysics and
based human activity recognition using vision transformer and CNNs,” electronics. He received the Candidate Phys.
in presented at the 18th European Radar Conference, London, UK, 2022. & Math. Sci. and Doctor Phys. & Math. Sci.
[21] D. P. Fairchild and R. M. Narayanan, “Multistatic micro-Doppler radar degrees in radiophysics, in 1987 and 1994, re-
for determining target orientation and activity classification,” IEEE spectively. In 1987, he joined the Department
Transactions on Aerospace and Electronic Systems, vol. 52, no. 1, pp. of Radiophysics at the Kharkov State University
512–521, 2016. as a Researcher and became a Full Professor
[22] F. Fioranelli, M. Ritchie, and H. Griffiths, “Aspect angle depen- there, in 1997. From September 1994 through
dence and multistatic data fusion for micro-Doppler classification of 1996 he was with Technical University of Ilme-
armed/unarmed personnel,” IET Radar, Sonar & Navigation, vol. 9, nau, Ilmenau, Germany as a Visiting Researcher. Since 1999, he is with
no. 9, pp. 1231–1239, 2015. the Delft University of Technology, Delft, the Netherlands. Since 2009,
[23] Y. Kim, I. Alnujaim, and D. Oh, “Human activity classification based he leads there a Chair of Microwave Sensing, Systems and Signals.
on point clouds measured by millimeter wave MIMO radar with deep He has authored and coauthored more than 450 scientific or technical
recurrent neural networks,” IEEE Sensors Journal, vol. 21, no. 12, pp. papers, six patents and fourteen book chapters. His main research
13 522–13 529, 2021. interests are in high-resolution radar, microwave imaging and applied
[24] D. Nickalls, J. Wu, and N. Dahnoun, “A real-time and high performance electromagnetics (in particular, UWB antennas). He is the recipient of
posture estimation system based on millimeter-wave radar,” in 2021 10th the European Microwave Week Radar Award for the paper that Best
Mediterranean Conference on Embedded Computing (MECO). IEEE, advances the state-of-the-art in radar technology, in 2001 (together
2021, pp. 1–4. with L.P. Ligthart and P. van Genderen) and 2012 (together with T.
[25] J. Le Kernec, F. Fioranelli, C. Ding, H. Zhao, L. Sun, H. Hong, Savelyev). In 2010, together with D. Caratelli Prof. Yarovoy got the Best
J. Lorandel, and O. Romain, “Radar signal processing for sensing in Paper Award of the Applied Computational Electromagnetic Society
assisted living: The challenges associated with real-time implementation (ACES). He served as an Associated Editor of the International Journal
of emerging algorithms,” IEEE Signal Processing Magazine, vol. 36, of Microwave and Wireless Technologies from 2011 till 2018 and as a
no. 4, pp. 29–41, 2019. Guest Editor of five special issues of the IEEE Transactions and other
[26] S. Z. Gurbuz and M. G. Amin, “Radar-based human-motion recognition journals. He served as the General TPC chair of the 2020 European
with deep learning: Promising applications for indoor monitoring,” IEEE Microwave Week (EuMW’20), as the Chair and TPC chair of the 5th
Signal Processing Magazine, vol. 36, no. 4, pp. 16–28, 2019. European Radar Conference (EuRAD’08), as well as the Secretary of
[27] F. J. Abdu, Y. Zhang, and Z. Deng, “Activity classification based on fea- the 1st European Radar Conference (EuRAD’04). He served also as
ture fusion of FMCW radar human motion micro-Doppler signatures,” the co-chair and TPC chair of the Xth International Conference on GPR
IEEE Sensors Journal, vol. 22, no. 9, pp. 8648–8662, 2022. (GPR2004). In the period 2008 to 2017, he served as Director of the
[28] Q. An, S. Wang, A. Hoorfar, W. Zhang, H. Lv, S. Li, and J. Wang, European Microwave Association (EuMA).
“Range-max enhanced ultra-wideband micro-Doppler signatures of
behind wall indoor human activities,” 2020. [Online]. Available:
Francesco Fioranelli (Senior Member, IEEE)
https://arxiv.org/abs/2001.10449
received the Ph.D. degree with Durham Uni-
[29] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on
versity, Durham, UK, in 2014. He is currently
point sets for 3D classification and segmentation,” in Proceedings of the
a tenured Assistant Professor at TU Delft, The
IEEE conference on computer vision and pattern recognition, 2017, pp.
Netherlands, and was an Assistant Professor
652–660.
with the University of Glasgow (2016–2019), and
[30] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
a Research Associate at University College Lon-
with deep convolutional neural networks,” Advances in neural informa-
don (2014–2016). His research interests include
tion processing systems, vol. 25, 2012.
the development of radar systems and automatic
[31] Y. Yang, C. Hou, Y. Lang, T. Sakamoto, Y. He, and W. Xiang,
classification for human signatures analysis in
“Omnidirectional motion classification with monostatic radar system
healthcare and security, drones and UAVs de-
using micro-Doppler signatures,” IEEE Transactions on Geoscience and
tection and classification, automotive radar, wind farm, and sea clutter.
Remote Sensing, vol. 58, no. 5, pp. 3574–3587, 2019.

1530-437X (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: TU Delft Library. Downloaded on June 01,2022 at 06:34:10 UTC from IEEE Xplore. Restrictions apply.

You might also like