0% found this document useful (0 votes)
7 views

Performance Analysis of Data Augmentation Approaches for Improving Wrist-Based Fall Detection System

The article analyzes various data augmentation techniques to improve the performance of wrist-based fall detection systems, which are critical for elderly safety. It highlights the challenges of class imbalance and data scarcity in deep learning applications for fall detection and identifies the conditional diffusion model as the most effective approach, enhancing the F1 score significantly. The study utilizes the UP-Fall Detection Dataset to validate the effectiveness of these augmentation methods in enhancing detection performance.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Performance Analysis of Data Augmentation Approaches for Improving Wrist-Based Fall Detection System

The article analyzes various data augmentation techniques to improve the performance of wrist-based fall detection systems, which are critical for elderly safety. It highlights the challenges of class imbalance and data scarcity in deep learning applications for fall detection and identifies the conditional diffusion model as the most effective approach, enhancing the F1 score significantly. The study utilizes the UP-Fall Detection Dataset to validate the effectiveness of these augmentation methods in enhancing detection performance.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Article

Performance Analysis of Data Augmentation Approaches for


Improving Wrist-Based Fall Detection System
Yu-Chen Tu, Che-Yu Lin , Chien-Pin Liu and Chia-Tai Chan *

Department of Biomedical Engineering, National Yang Ming Chiao Tung University, Taipei City 112, Taiwan;
[email protected] (Y.-C.T.); [email protected] (C.-Y.L.); [email protected] (C.-P.L.)
* Correspondence: [email protected]; Tel.: +886-2-28267371

Abstract: The aging of society is a global concern nowadays. Falls and fall-related injuries
can influence the elderly’s daily living, including physical damage, psychological effects,
and financial problems. A reliable fall detection system can trigger an alert immediately
when a fall event happens to reduce the adverse effects of falls. Notably, the wrist-based
fall detection system provides the most acceptable placement for the elderly; however, the
performance is the worst due to the complicated hand movement modeling. Many works
recently implemented deep learning technology on wrist-based fall detection systems
to address the worst, but class imbalance and data scarcity issues occur. In this study,
we analyze different data augmentation methodologies to enhance the performance of
wrist-based fall detection systems using deep learning technology. Based on the results, the
conditional diffusion model is an ideal data augmentation approach, which improves the
F1 score by 6.58% when trained with only 25% of the actual data, and the synthetic data
maintains a high quality.

Keywords: data augmentation; deep learning technology; wrist-based fall detection;


wearable sensor

1. Introduction
The health issues are worthy of notice in an aged society [1–3]. There were statistically
Academic Editor: Ewa Korzeniewska
about 506 million people over 65 in 2008, which will grow to an estimated 1.3 billion by
Received: 25 February 2025 2040 [4]. In addition, according to the 2021 Current Population Survey’s Annual Social and
Revised: 20 March 2025
Economic Supplement, reported by the U.S. Census Bureau, about 28% of U.S. older adults
Accepted: 27 March 2025
live alone [5]. Falls and fall-related injuries are the most critical health issues for older adults,
Published: 29 March 2025
especially for those living alone. In the United States, around 25% of older adults report a
Citation: Tu, Y.-C.; Lin, C.-Y.; Liu,
fall, and 10% of fall events lead to a serious injury every year [6]. Moreover, compared with
C.-P.; Chan, C.-T. Performance
Analysis of Data Augmentation
older adults who live alone and with others, such as their family or health care professionals,
Approaches for Improving there is a 6% chance of a higher incidence of falls over time [7]. The consequences of falls
Wrist-Based Fall Detection System. include physical damage, psychological effects, and financial problems. Physical damages,
Sensors 2025, 25, 2168. https:// such as fractures and bruises [8,9], may reduce functional independence and have more
doi.org/10.3390/s25072168
chance of becoming recurrent fallers, even death. As to psychological effects, fear of falling
Copyright: © 2025 by the authors. again makes older adults lose confidence in their safety, which may cause a decline in their
Licensee MDPI, Basel, Switzerland. willingness to engage in activities of daily living (ADLs) [10]. As regards the financial
This article is an open access article
problems, the overall medical cost for fatal and nonfatal falls was approximately $50 billion
distributed under the terms and
every year, resulting in substantial burdens for society [11].
conditions of the Creative Commons
Attribution (CC BY) license
In recent years, plenty of studies have been putting effort into developing an effective
(https://creativecommons.org/ fall detection system to minimize the negative influence of falls. The aim of the fall detection
licenses/by/4.0/). system is to send an immediate alert when falls occur. Implementing one or multiple

Sensors 2025, 25, 2168 https://doi.org/10.3390/s25072168


Sensors 2025, 25, 2168 2 of 18

inertial measurement units (IMUs) in fall detection systems is the most common technique
for acquiring daily motion information. An IMU consists of a tri-axial accelerometer, a
gyroscope, and a magnetometer. The accelerometer provides the acceleration reference
when moving, the magnetometer provides a heading reference, and the gyroscope measures
how the orientation changes with time [12–14]. Chen et al. [15] used an IMU device worn
on the waist to detect fall events. The threshold-based method was applied to determine
the impact phase of falls. Also, considering the possibility that older adults could not
report their self-situation when the falls occurred, a network of fixed motes in the home
environment was used to target the location of victims. Hussain et al. [16] used the SisFall
dataset [17], a public dataset that acquires motion data from an IMU worn on subjects’
waists, to accomplish an activity-aware fall detection system. The machine learning-based
method was employed to detect the occurrences of falls and to recognize the ADLs before
the falls. The result of the system using the k-nearest neighbors (KNNs) classifier achieved
the highest 99.80% accuracy and the best 96.82% accuracy in recognizing multiple falling
activities using random forest (RF).
The placement of the IMU also plays a vital role in developing a functional fall
detection system [18–21]. Several details should be considered comprehensively, such as
the performance and the acceptance. Chai et al. [18] evaluated the performance of a fall
detection system using different sensor fusions. The best 94.10% accuracy occurs when
the sensor fusion of IMUs worn on the chest, elbows, wrists, thighs, and ankles, a total of
nine IMUs. As for the single IMU, the highest 92.51% accuracy occurs when the sensor
is worn on the chest, and for the sensors worn on the wrists, however, the result has
the worst 78.83% accuracy. Kangas et al. [19] aimed to assess different low-complexity
fall detection algorithms using a tri-axial accelerometer attached at the waist, wrist, and
head. The results suggested that the sensor placement should be at the waist or head to
obtain the most sensitive fall detection. Although it is reasonable that the ideal position
must be located at the trunk due to the usual stability associated with maintaining the
upper body balance [22], users might not widely accept it because of discomfort. Hence,
a wrist-based fall detection system seems the most acceptable for older adults owing
to its user-friendliness and less stigma of using a medical device [23,24], but the worst
performance should be solved. The reason for the worst performance of the wrist-based fall
detection system is its high complexity motion modeling [25,26]. Hence, many researchers
have focused on implementing deep learning technology into wrist-based fall detection
systems due to the ability to extract features automatically [27,28]. However, fall events are
rare in our daily lives, indicating that the deep learning-based fall detection system has a
data imbalance issue, which may result in bias and potentially weaken the performance of
deep learning technology in detecting falls.
This study aims to analyze the performance of different data augmentation approaches
for a wrist-based fall detection system, including data transformation, synthetic minority
over-sampling technique, autoencoder, variational autoencoder, and denoising diffusion
probabilistic model. The UP-Fall Detection Dataset [29] is employed to validate different
approaches to data augmentation since it provides a useful resource to fairly compare
different data augmentation methods. The wrist-based IMU signals, especially the fall
event signals, are expanded through the data augmentation algorithm. In addition, the
influence on the performance of detection and the quality of synthetic data, such as the
Train-on-Synthetic-Test-on-Real (TSTR) score and divergence comparison, is discussed in
this paper.
Sensors 2025, 25, x FOR PEER REVIEW 3 of 19
Sensors 2025, 25, 2168 3 of 18

2. Related Work
2. Related Work
Five classic data augmentation algorithms are utilized to expand the volume of
trainingFive classic
data. data
In this augmentation
section, the dataalgorithms are utilized
augmentation to expand
algorithms the volume
are described of training
briefly.
data. In this section, the data augmentation algorithms are described briefly.
2.1. Data Transformation
2.1. Data Transformation
Data transformation is the most straightforward data augmentation methodology.
Data transformation is the most straightforward data augmentation methodology.
By tuning the physical properties of IMU signals, including rotation, permutation, time-
By tuning the physical properties of IMU signals, including rotation, permutation, time-
warping, magnitude-warping, jittering, and scaling [30], data transformation can
warping, magnitude-warping, jittering, and scaling [30], data transformation can syntheti-
synthetically generate a large quantity of training data, expecting to solve the class
cally generate a large quantity of training data, expecting to solve the class imbalance issue,
imbalance issue, as shown in Figure 1.
as shown in Figure 1.

Figure 1. The illustration of various data transformation methods [30].


Figure 1. The illustration of various data transformation methods [30].
2.2. Synthetic Minority Over-Sampling Technique (SMOTE)
2.2. Synthetic Minority Over-Sampling Technique (SMOTE)
Synthetic Minority Over-sampling Technique (SMOTE) [31] is a common approach
Synthetic Minority
to synthesizing Over-sampling
new samples based onTechnique (SMOTE)
the distribution [31] is a common
of minority approach
classes. The SMOTE
to process
synthesizing
has three steps to expand the data volume. Firstly, each sample in theSMOTE
new samples based on the distribution of minority classes. The minority
process has three steps
class computes to expand
its k nearest the datausing
neighbors volume.
the Firstly,
Euclideaneachdistance,
sample in the minority
commonly k = 3.
class computes its k nearest neighbors using the Euclidean distance, commonly
Second, one sample is randomly chosen from these nearest neighbors, and new samples k = 3.
Second, one sample
are generated is randomly
following chosen
Equation (1): from these nearest neighbors, and new samples
are generated following Equation (1):
X =X + |X −X | × δ; δ ∈ [0, 1], (1)
𝑋 new= 𝑋 chosen+ |𝑋 nearest− 𝑋 chosen| × 𝛿; 𝛿 ∈ 0, 1 , (1)
where𝑋Xchosen denotes
where denotes the
the initially
initiallyselected
selectedsample,
sample,andandXnearest
𝑋 denotes the sample
denotes selected
the sample
from kfrom
selected nearest neighbors.
k nearest Finally,
neighbors. all samples
Finally, in the in
all samples minority class will
the minority classbewill
chosen, and the
be chosen,
above steps will be repeated to expand the minority class.
and the above steps will be repeated to expand the minority class.

2.3. Autoencoder (AE)


2.3. Autoencoder (AE)
Autoencoder (AE) [32] is an unsupervised deep learning algorithm. Typically, an
Autoencoder (AE) [32] is an unsupervised deep learning algorithm. Typically, an
autoencoder model is composed of two neural network models, namely, an encoder and a
autoencoder model is composed of two neural network models, namely, an encoder and
decoder. Firstly, the encoder reduces the dimension of the input data into latent vectors,
a decoder. Firstly, the encoder reduces the dimension of the input data into latent vectors,
which is a compressed, low-dimensional representation of the input data. Then, the decoder
which is a compressed, low-dimensional representation of the input data. Then, the
reconstructs the data based on the latent vectors as the output data. Finally, the AE model
decoder reconstructs the data based on the latent vectors as the output data. Finally, the
will minimize the error between input and output data to generate a clean signal.
AE model will minimize the error between input and output data to generate a clean
signal.
2.4. Variational Autoencoder (VAE)
Variational Autoencoder (VAE) [33] is an advanced model of AE, which also consists
2.4. Variational Autoencoder (VAE)
of an encoder and a decoder. Unlike AE, a VAE model limits the encoding stage, making
Variational Autoencoder (VAE) [33] is an advanced model of AE, which also consists
the latent vectors follow the Gaussian distribution. Additionally, because its mean value
of an encoder and a decoder. Unlike AE, a VAE model limits the encoding stage, making
and standard deviation can parameterize the Gaussian distribution, we can theoretically
the latent vectors follow the Gaussian distribution. Additionally, because its mean value
generate whatever we want through the VAE model.
Denoising Diffusion Probabilistic Model (DDPM), or Diffusion Model [34], is a novel
generative model. Two main processes are designed to generate samples based on
training data: the forward diffusion process and the reverse diffusion process. First, the
Sensors 2025, 25, 2168
forward diffusion process is defined as a Markov chain; namely, the training data corrupt4 of 18
the Gaussian noise iteratively during the forward diffusion stage. Notably, when the
iteration step is close to infinity, the noisy data can be seen as an isotropic Gaussian
2.5. Denoising Diffusion Probabilistic Model (DDPM)
distribution. Second, the aim of the reverse diffusion process is to train a denoising model
Denoising
for denoising Diffusion
the latent vectorsProbabilistic
of GaussianModel (DDPM),
distribution or Diffusion
to clean Model [34], is a novel
data iteratively.
generative model. Two main processes are designed to generate samples based on training
data: the forward
3. Materials diffusion process and the reverse diffusion process. First, the forward
and Methods
diffusion process is defined as a Markov chain; namely, the training data corrupt the
This section provides the proposed validation method for different data
Gaussian noise iteratively during the forward diffusion stage. Notably, when the iteration
augmentation approaches. It begins with a description of the dataset used in this study
step is close to infinity, the noisy data can be seen as an isotropic Gaussian distribution.
and the preprocessing techniques applied. The details of various data augmentation
Second, the aim of the reverse diffusion process is to train a denoising model for denoising
methods used for comparison are presented, outlining their implementation. Finally, the
the latent vectors of Gaussian distribution to clean data iteratively.
effectiveness of the proposed augmentation approach is assessed by evaluating its impact
on fall detection performance using commonly applied evaluation metrics, which are
3. Materials and Methods
discussed at the end of this section.
This section provides the proposed validation method for different data augmentation
approaches. It begins with a description of the dataset used in this study and the prepro-
3.1. Dataset
cessing techniques applied. The details of various data augmentation methods used for
In this study, the UP-Fall Detection Dataset [29] is adopted to evaluate the
comparison are presented, outlining their implementation. Finally, the effectiveness of the
effectiveness of different data augmentation techniques. The dataset includes six types of
proposed augmentation approach is assessed by evaluating its impact on fall detection
activities of daily living (ADLs) and five distinct types of falls, with each activity being
performance using commonly applied evaluation metrics, which are discussed at the end
performed three times. It was collected from 17 healthy young participants, nine males
of this section.
and eight females, aged between 18 and 24 years. Data acquisition was conducted using
a multimodal
3.1. Datasetapproach, incorporating wearable sensors, ambient sensors, and vision-
based equipment, as illustrated in Figure 2b.
In this study, the UP-Fall Detection Dataset [29] is adopted to evaluate the effectiveness
Specifically, five augmentation
of different data Mbientlab MetaSensor
techniques.(published by Mbientlab,
The dataset includes six San Francisco,
types SF, of
of activities
USA)daily
wearable devices were used to capture raw motion data from a
living (ADLs) and five distinct types of falls, with each activity being performedthree-axis
accelerometer
three times. andIt awas
three-axis
collectedgyroscope at a sampling
from 17 healthy frequency ofnine
young participants, 18.4males
Hz. These
and eight
sensors were placed on the left wrist, under the neck, inside the right pants pocket,
females, aged between 18 and 24 years. Data acquisition was conducted using a multimodal at the
center of the waist
approach, (attached to
incorporating a belt), and
wearable on the
sensors, left ankle,
ambient as shown
sensors, in Figure 2a.equipment,
and vision-based Only
data as
collected from the sensor
illustrated in Figure 2b. on the left wrist were utilized in this study to enhance user
convenience in the daily use of the wrist-based fall detection system.

(a) (b)

Figure 2. Distribution
Figure of theofsensors.
2. Distribution (a) Wearable
the sensors. sensors
(a) Wearable and EEG
sensors headset
and EEG located
headset on the
located onhuman
the human
body.body.
(b) Layout of theofcontext-aware
(b) Layout sensors
the context-aware and camera
sensors views
and camera [29].[29].
views

Specifically, five Mbientlab MetaSensor (published by Mbientlab, San Francisco, CA,


USA) wearable devices were used to capture raw motion data from a three-axis accelerom-
eter and a three-axis gyroscope at a sampling frequency of 18.4 Hz. These sensors were
placed on the left wrist, under the neck, inside the right pants pocket, at the center of the
waist (attached to a belt), and on the left ankle, as shown in Figure 2a. Only data collected
from the sensor on the left wrist were utilized in this study to enhance user convenience in
the daily use of the wrist-based fall detection system.
Sensors 2025, 25, 2168 5 of 18

3.2. Data Preprocessing


To prepare the dataset for analysis, missing data were first removed. Specifically, trials
2 and 3 for subject 8 in activity 11 were excluded due to unknown equipment errors. After
data cleaning, a sliding window technique was applied for segmentation. Fall samples are
divided into 100 reading windows (approximately 5.43 s) with a 50% overlap to preserve
temporal continuity. For ADL events, a different preprocessing approach was employed
based on sample duration. If an ADL event sample contains fewer than 1000 readings
(approximately 54.35 s), it is processed using the same sliding window technique as the
fall samples. For longer ADL recordings, the event is divided into ten equal segments, and
100 reading windows are randomly extracted from each segment to ensure balanced data
representation. After segmentation, the dataset contained a total of 2282 ADL windows
and 756 fall windows. To normalize the input features, all segmented windows underwent
min–max normalization, scaling the values to a range between −1 and 1.

3.3. Implementation and Configuration


To address the data imbalance issue, each data augmentation method is specifically
designed to generate a greater number of synthetic fall data than ADL data. This approach
ensures that, when the real and synthetic datasets are combined, the number of fall data
matches that of ADL, thereby achieving a balanced distribution for the training dataset.
Furthermore, to maintain fairness in the evaluation process, all augmentation methods
generate several synthetic data equal to the size of the real training dataset. The following
section details the implementation of different data augmentation methods.
First, three common data transformation techniques—rotation, scaling, and permutation—
are employed to augment both ADL and fall signals. In the rotation process, the three-axis
acceleration and the three-axis angular velocity signals are rotated independently in a
randomly chosen direction. For scaling, the signals are either expanded or compressed by
a randomly selected factor ranging from 0 to 2, adjusting the signal magnitude accordingly.
The permutation transformation involves segmenting each signal into four equal-length
parts and randomly rearranging their positions to introduce variability. These transfor-
mations are applied separately to generate augmented versions of the signals and are
individually evaluated to assess their impact on the fall detection system. Finally, the
results obtained from the three different transformation methods are averaged to determine
their overall effectiveness in enhancing data augmentation.
As for the SMOTE, the number of nearest neighbors k is set to 5 as the default value,
and the random state is fixed at 42 to ensure reproducibility of SMOTE. To maintain a
consistent number of synthetic samples, the sampling strategy is designed to match the
total number of synthetic data with that of the real data. Additionally, this approach ensures
that the combined dataset of real and synthetic samples maintains an equal distribution
between fall and ADL data.
The autoencoder, composed of an encoder and a decoder, is illustrated in Figure 3. The
encoder compresses input signals into a latent space representation that preserves essential
information. It consists of three convolutional layers (Conv1D) with 1 × 3 filters, each
followed by a max-pooling layer (MaxPool1D) with a 1 × 2 kernel, creating an alternating
structure. The decoder mirrors this arrangement with three transposed convolutional layers
(Transposed Conv1D) and three up-sampling layers (Upsample), ensuring a balanced and
symmetric architecture. Its objective is to reconstruct the original signals while retaining
key features from the encoded representation.
(MSE) is adopted as the loss function, with a batch size of 64, and training is conducted
for a total of 100 epochs. The hyperparameters listed in Table 2 represent the ranges and
Sensors 2025, 25, 2168 values explored during the tuning process to identify the optimal configuration for the
6 of 18
model.

Figure 3.
Figure Autoencoder structure
3. Autoencoder structure overview.
overview.

The detailed architecture is presented in Table 1, where the numerical values corre-
Table 1. Architecture of the autoencoder.
spond to the parameters of different layers: Conv1D (number of output channels, kernel
Module (kernel size), Transposed
size), MaxPool1D Layer Conv1D (number of output Output_Size
channels, kernel
size), and Upsample (size). The model is trained using the Adam optimizer98)
Conv1D (64, 3) (64, with a learning
rate of 3 × 10−4 to maintain stable MaxPool1D
learning(2) dynamics. The mean squared (64, 49)error (MSE) is
adopted as the loss function, withConv1D
a batch(128,
size 3) (128, 47) for a total of
of 64, and training is conducted
Encoder
100 epochs. The hyperparameters listed in Table 2 represent the ranges(128,
MaxPool1D (2) and 23)
values explored
Conv1D (256, 3)
during the tuning process to identify the optimal configuration for the model. (256, 21)
MaxPool1D (2) (256, 10)
Upsample
Table 1. Architecture of the autoencoder. (28) (256, 21)
Transposed Conv1D (128, 3) (128, 23)
Module Layer(61)
Upsample Output_Size
(128, 47)
Decoder
Transposed Conv1D
Conv1D (64, 3)(64, 3) (64, 49)98)
(64,
MaxPool1D
Upsample (126)(2) (64,
(64, 98)49)
Conv1D
Transposed (128, 3)
Conv1D (6, 3) (128,
(6, 100)47)
Encoder
MaxPool1D (2) (128, 23)
Conv1D—one-dimensional convolutional layer; MaxPool1D—one-dimensional max-pooling layer;
Conv1D (256, 3) (256, 21)
Transposed Conv1D—one-dimensional transposed
MaxPool1D (2) convolutional layer; (256,Upsample—one-
10)
dimensional up-sampling layer. Upsample (28) (256, 21)
Transposed Conv1D (128, 3) (128, 23)
Upsample (61)
Table 2. List of tried hyperparameters of autoencoder. (128, 47)
Decoder
Transposed Conv1D (64, 3) (64, 49)
Hyperparameter Upsample (126) Searchspace (64, 98)
Learning rate TransposedFloat{10Conv1D (6, , 103) , 3 × 10 5 × 10 (6,, 10 100) , 5 × 10 }
Int{50, 100, 300, 500}
Epochsconvolutional layer; MaxPool1D—one-dimensional max-pooling layer; Transposed
Conv1D—one-dimensional
Conv1D—one-dimensional Int{2, 3, 4, 5} up-sampling layer.
transposed convolutional layer; Upsample—one-dimensional
Conv1D layers
Conv1D (number of output channels) Int{32, 64, 128, 256, 512}
Table 2. List of tried hyperparameters
Conv1D (kernel size) of autoencoder. Int{1, 3, 5}
Hyperparameter Searchspace
The VAE follows an encoder–decoder
n structure and introduces probability and o
Learning rate Float 10 −5 , 10−4 , 3 × 10−4 , 5 × 10−4 , 10−3 , 5 × 10−3
randomness, allowing the generation of new samples within a continuous latent space.
Epochs Int{50,
This characteristic makes it well suited for generative 500} This architecture is
100, 300,tasks.
modeling
Conv1Dillustrated
layers in Figure 4 and detailed in Table 3. Int{2, 3, 4, 5}
Conv1D (number of output channels) Int{32, 64, 128, 256, 512}
In this framework, the encoder compresses input data into a latent distribution,
Conv1D (kernel size) Int{1, 3, 5}
learning to represent it using a mean (μ) and standard deviation (σ). It consists of four
convolutional layers, each with 64 filters, followed by a max-pooling layer with a pooling
The VAE follows an encoder–decoder structure and introduces probability and ran-
domness, allowing the generation of new samples within a continuous latent space. This
characteristic makes it well suited for generative modeling tasks. This architecture is
illustrated in Figure 4 and detailed in Table 3.
combining mean squared error (MSE) and Kullback–Leibler (KL) divergence, is applied
to balance reconstruction accuracy and latent space regularization. The model is trained
Sensors 2025, 25, 2168 for a total of 100 epochs. Table 4 presents the hyperparameter ranges designed7 and of 18
explored during tuning to determine the most effective model configuration.

Figure 4. Variational autoencoder (VAE)


(VAE) structure
structureoverview.
overview.

Table 3. The architecture


Table 3. architecture of
of the
the variational
variational autoencoder
autoencoder (VAE).
(VAE).

Module
Module LayerLayer Output_Size
Output_Size
Conv1D (64, 5)(64, 5)
Conv1D (64, 100)
(64, 100)
Conv1D (64, 5)
Conv1D (64, 5) (64, 100)
(64, 100)
Encoder
Encoder Conv1D (64, 3)(64, 3)
Conv1D (64, 100)
(64, 100)
Conv1D Conv1D
(64, 3)(64, 3) (64,
(64, 100)100)
MaxPool1D
MaxPool1D (2) (2) (64,
(64, 50)
50)
Bottleneck
Bottleneck Linear
Linear (128) (128) (64,
(64, 128) 128)
Linear (50) (64, 50)
Linear (50) (64, 50)
Upsample (100) (64, 100)
UpsampleConv1D(100)(64, 3) (64, 100)
(64, 100)
Decoder Conv1D (64, 3) (64, 100)
Decoder Conv1D (64, 3) (64, 100)
Conv1D (64, 3)(64, 5)
Conv1D (64, 100)
(64, 100)
Conv1D Conv1D
(64, 5)(6, 5) (64,(6,100)
100)
Conv1D—one-dimensional convolutionalConv1D (6, 5)
layer; MaxPool1D—one-dimensional (6, 100)
max-pooling layer; Linear—linear
transformation layer; Upsample—one-dimensional up-sampling layer.
Conv1D—one-dimensional convolutional layer; MaxPool1D—one-dimensional max-pooling layer;
Linear—linear transformation layer; Upsample—one-dimensional up-sampling layer.
In this framework, the encoder compresses input data into a latent distribution, learn-
ing to represent it using a mean (µ) and standard deviation (σ). It consists of four convolu-
Table 4. List of tried hyperparameters of VAE.
tional layers, each with 64 filters, followed by a max-pooling layer with a pooling factor of
Hyperparameter
2. At the bottleneck, two fully connected layers transform Searchspace
the latent features into the mean
Learning rate Float{10 , 10
and standard deviation, defining a probabilistic latent space. , 3 × 10 The,5 × 10 , 10
decoder ,5 ×
starts 10 an}
with
upsampling layer Epochs
of size 100, followed by four convolutionalInt{50, 100, 300, 500}
layers that reconstruct the
Conv1D layers Int{2, 3, 4, 5}
data from the latent representation. The model is trained using the Adam optimizer with a
Conv1D (number
learning rate of 5 × of 10
output
−4 and channels) Int{32, 64, 128,
a batch size of 64. A customized 256,
loss 512} combining
function,
Conv1D
mean squared (kernel
error (MSE)size)
and Kullback–Leibler (KL) divergence, Int{1, 3, 5}is applied to balance
reconstruction accuracy and latent space regularization. The model is trained for a total
Toepochs.
of 100 implementTableand configure
4 presents the DDPM, a simplified
hyperparameter rangesU-Net [35] structure
designed is employed
and explored during
to enhance the diffusion model for data augmentation.
tuning to determine the most effective model configuration. U-Net is a widely used architecture
in diffusion models due to its effective skip connections, which allow feature maps from
corresponding
Table encoder
4. List of tried layers to be
hyperparameters directly copied and concatenated into the decoder.
of VAE.
This mechanism helps retain spatial details that might be lost during decoding, thereby
Hyperparameter
improving the quality of generated data. Additionally, Searchspace
U-Net restores extracted features
Float 10 , 10 , 3 × 10−4 , 5 × 10−4 , 10−3 , 5 × 10−3
 −5 − 4
Learning rate
Epochs Int{50, 100, 300, 500}
Conv1D layers Int{2, 3, 4, 5}
Conv1D (number of output channels) Int{32, 64, 128, 256, 512}
Conv1D (kernel size) Int{1, 3, 5}

To implement and configure DDPM, a simplified U-Net [35] structure is employed to


enhance the diffusion model for data augmentation. U-Net is a widely used architecture
in diffusion models due to its effective skip connections, which allow feature maps from
corresponding encoder layers to be directly copied and concatenated into the decoder.
design is primarily designed for high-dimensional image processing in computer vision.
Whereas wearable sensor data are comparatively lower in dimensionality and
Sensors 2025, 25, 2168 follows a simpler structure. To better accommodate the characteristics of inertial sensor
8 of 18
data, we adopt a modified U-Net architecture specifically designed for time-series data
with shorter timestamps and fewer channels, as illustrated in Figure 5.
The proposed
This mechanism helpsstructure consists
retain spatial of both
details that amight
down-sampling
be lost during and an up-sampling
decoding, thereby
process. The
improving thedown-sampling
quality of generatedprocess
data.incorporates a convolutional
Additionally, U-Net restores layer (Conv1D),
extracted a self-
features to
attention mechanism, and a 1 × 2 max-pooling layer to progressively
their original dimensions through an up-sampling process, enabling the model to capture extract features. In
the details
fine up-sampling process,
and enhance thewe employ
accuracy andskip connections
diversity to restore
of generated lost information and
samples.
consists of a Conv1D, a self-attention mechanism, and an up-sampling
Due to these advantages, U-Net has been extensively applied in various layer fields,
with a includ-
scaling
factor
ing dataofsynthesis
2. Each [36]
Conv1D layer comprises
and medical a 1 × 3 convolutional
image segmentation [37]. However, kernel, followed
its original designby
group normalization and a Gaussian Error Linear Unit (GELU)
is primarily designed for high-dimensional image processing in computer vision. activation function,
ensuring stable
Whereas and efficient
wearable sensorlearning.
data are The model is trained
comparatively lower in with 1000 diffusion
dimensionality andtimesteps,
follows
utilizing the Adam optimizer with a learning rate of 3 × 10 −4. Training is performed with
a simpler structure. To better accommodate the characteristics of inertial sensor data, we
a batch
adopt size of 64
a modified overarchitecture
U-Net 300 epochs.specifically
Table 5 outlines
designed theforhyperparameter
time-series data search space
with shorter
considered in this study for identifying the most
timestamps and fewer channels, as illustrated in Figure 5. suitable model setup.

Figure 5. The architecture of the diffusion model.


Figure 5. The architecture of the diffusion model.
The proposed structure consists of both a down-sampling and an up-sampling process.
Table
The 5. List of tried hyperparameters
down-sampling of diffusion
process incorporates model.
a convolutional layer (Conv1D), a self-attention
mechanism, and a 1 × 2 max-pooling layer to progressively
Hyperparameter extract features. In the up-
Searchspace
sampling process, we
Learning rateemploy skip Float{10 , 10 , 3 × 10 information
connections to restore lost , 5 × 10 , 10and , 5consists
× 10 }
of a Conv1D, a self-attention
Diffusion timesteps mechanism, and an up-sampling layer
Int{100, 300, 500,with
1000}a scaling factor
of 2. Each Conv1D Epochslayer comprises a 1 × 3 convolutional kernel,
Int{100, followed by group
300, 500}
normalization
Down and and a Gaussian
up-sampling Error Linear Unit (GELU) Int{2,
layers activation
3, 4} function, ensuring
stable and efficient learning. The model is trained with 1000 diffusion timesteps, utilizing
3.4.Adam
the Evaluation with a learning rate of 3 × 10−4 . Training is performed with a batch
Methodology
optimizer
size ofIn64this
over 300 epochs.
study, Table 5 outlines
three evaluation theare
metrics hyperparameter
employed to search
assess space considered in
the effectiveness of
this study for identifying the most suitable model setup.
data augmentation: fall detection performance, Train on Synthetic Test on Real (TSTR)
score, and divergence comparison ratio. These metrics evaluate the impact of synthetic
Table 5. List of tried hyperparameters of diffusion model.
data augmentation on classification performance, the model’s ability to generalize from
synthetic to real data, and the degree of similarity between
Hyperparameter synthetic and real data.
Searchspace
Float 10 , 10−4 , 3 × 10−4 , 5 × 10−4 , 10−3 , 5 × 10−3
 −5
Learning rate
Diffusion timesteps Int{100, 300, 500, 1000}
Epochs Int{100, 300, 500}
Down and up-sampling layers Int{2, 3, 4}

3.4. Evaluation Methodology


In this study, three evaluation metrics are employed to assess the effectiveness of
data augmentation: fall detection performance, Train on Synthetic Test on Real (TSTR)
score, and divergence comparison ratio. These metrics evaluate the impact of synthetic
Sensors 2025, 25, 2168 9 of 18

data augmentation on classification performance, the model’s ability to generalize from


synthetic to real data, and the degree of similarity between synthetic and real data.

3.4.1. Fall Detection Performance


To assess the effectiveness of the diffusion model-based data augmentation method in
fall detection, synthetic data are integrated with real training data to train the fall detection
model. The model architecture, as shown in Figure 6, consists of three convolutional
blocks, followed by two fully connected layers and a softmax activation function. Each
convolutional block includes a one-dimensional convolutional layer (Conv1D) with 64
filters of size 1 × 3, batch normalization (BN) to stabilize training, a rectified linear unit
(ReLU) activation function for non-linearity, and max pooling (MP) with a 1 × 2 filter
Sensors 2025, 25, x FOR PEER REVIEW
to reduce spatial dimensions. Additionally, a dropout layer is incorporated to mitigate 10 of 19

overfitting and improve generalization.

Figure 6. The
Figure 6. The architecture
architecture of
of fall
fall detection
detection system.
system.

3.4.2.During training,
Divergence the Adam optimizer is employed with cross-entropy as the loss
Comparison
function. The learning rate is set to 3 × 10−4 , and the model is trained using a batch size of
The divergence comparison aims to assess the relationship between real and
64 for 300 epochs. To evaluate its performance, four commonly used metrics—accuracy,
synthetic data, as well as the distinction between ADL and fall events within the synthetic
precision, recall, and F1 score—are used to quantify the model’s ability to distinguish
data. To evaluate the quality of the generated data, inner-class and outer-class dispersion
between fall events and ADL.
measures were employed, which quantify the distribution of data within and between
classes. TP + TN
Accuracy = , (2)
TP +
Inner-class dispersion measures the degree FP + FNof +variation
TN within each class by
calculating the sum of squared deviations of individual TP data points from their respective
Precision
class mean. The total inner-class dispersion matrix = , (3)
TP + isFPobtained as a weighted sum of the
dispersion matrices of all classes. Mathematically, TP it is represented as
Recall = , (4)
TP + FN
()
𝑆 = Recall 𝑃(Ω ×) 𝑆Precision
F1 − score = 2 × . (5)
Recall + Precision (6)
To assess the model’s ability to 1generalize ( ) to unseen ( )data, the leave-one-group-out
= 𝑃(Ω ) (𝑋 − 𝑚( ) ) (𝑋 − 𝑚( ) )
cross-validation (LOGO-CV) method𝑁was employed. This validation approach simulates a
realistic scenario
()
where the model is tested on subjects whose data are not included in the
()
where 𝑆 denotes the covariance
training set. The dataset, matrix
consisting of of the is
17 subjects, class Ω , and
divided into 𝑚 represents
four groups: thegroups
three mean
containing four subjects each and one group containing five subjects. During training, class
of that class. A lower inner-class dispersion indicates that the data points within a data
are more tightly clustered around their mean, suggesting higher consistency
from three groups are used to train the model, while the remaining group is held out for and quality
in the generated
testing. samples.
This process is repeated until all groups have been used for testing, ensuring a
robust evaluation of the outer-class
On the other hand, dispersion across
model’s performance quantifies the degree
different of separation between
subjects.
different
The Train-on-Synthetic-Test-on-Real (TSTR) score is used to evaluatetheir
classes by measuring the average squared distance between means. This
the effectiveness
metric
of captures
synthetic datahow distinct
in fall the tasks,
detection classesspecifically,
are in the feature
its abilityspace. It is definedbetween
to differentiate as: ADL
and fall events. This metric assesses1how well a model trained exclusively on synthetic
( )
data can generalize to real-world = To 𝑃(Ω
𝑆 data. )
compute 𝑃(Ω ) 𝑆 score, a fall detection model
the TSTR
2
is first trained using only synthetic data and subsequently tested on real data. The model
(7)
architecture and parameters 1 follow the design specifications outlined in Section 3.4.1.
= 𝑃(Ω ) 𝑃(Ω ) (𝑚( ) − 𝑚( ) )(𝑚( ) − 𝑚( ) )
2

A higher outer-class dispersion value indicates greater separation between classes,


suggesting that the generated data preserve clear distinctions between different activity
types.
Sensors 2025, 25, 2168 10 of 18

Additionally, the leave-one-group-out cross-validation (LOGO-CV) method is applied to


ensure a fair and unbiased evaluation, preventing data leakage across the training and
testing phases.
A higher TSTR score indicates that the synthetic data closely resemble real data,
effectively enabling the deep learning model to generalize well to real-world scenarios.
This suggests that the synthetic data capture essential features necessary for fall detec-
tion. However, an excessively high TSTR score may indicate a lack of diversity in the
generated data, potentially leading to model overfitting. Striking a balance between data
similarity and diversity is crucial to ensuring the robustness and generalizability of the fall
detection model.

3.4.2. Divergence Comparison


The divergence comparison aims to assess the relationship between real and synthetic
data, as well as the distinction between ADL and fall events within the synthetic data. To
evaluate the quality of the generated data, inner-class and outer-class dispersion measures
were employed, which quantify the distribution of data within and between classes.
Inner-class dispersion measures the degree of variation within each class by calculating
the sum of squared deviations of individual data points from their respective class mean.
The total inner-class dispersion matrix is obtained as a weighted sum of the dispersion
matrices of all classes. Mathematically, it is represented as

M (i )
SW = ∑ P(Ωi )SW
i =1
M Ni   T (6)
(i ) (i )
= ∑ P(Ωi ) Ni ∑ Xk − m(i) Xk − m(i)
1
i =1 k =1

(i )
where SW denotes the covariance matrix of the class Ωi , and m(i) represents the mean of
that class. A lower inner-class dispersion indicates that the data points within a class are
more tightly clustered around their mean, suggesting higher consistency and quality in the
generated samples.
On the other hand, outer-class dispersion quantifies the degree of separation between
different classes by measuring the average squared distance between their means. This
metric captures how distinct the classes are in the feature space. It is defined as:

M M  (ij)
SB = 1
2 ∑ P ( Ωi ) ∑ P Ω j S B
i =1 j =1
M M (7)
T
1
∑ P(Ωi ) ∑ P Ω j (m(i) − m( j) )(m(i) − m( j) )

= 2
i =1 j =1

A higher outer-class dispersion value indicates greater separation between classes, sug-
gesting that the generated data preserve clear distinctions between different activity types.
To further analyze data quality, a ratio is computed to assess the relationship between
two different groups of samples. The ratio is defined as

SB
ratio = (8)
SW

In this study, two key relationships are examined. The first considers the distinction
between fall and ADL signals within the synthetic dataset. A higher ratio in this context
indicates greater class separability, suggesting improved differentiation between ADL and
fall signals. The second relationship evaluates the similarity between real and synthetic
data. For this analysis, an equal number of samples from each class, both real and synthetic,
Sensors 2025, 25, 2168 11 of 18

are randomly selected. A lower ratio in this case suggests that the synthetic data closely
resemble the real data, reflecting higher realism in the generated samples.

4. Results and Discussions


Table 6 presents the optimized hyperparameters that were explored during the tun-
ing process to determine the most effective configurations for the autoencoder, VAE, and
diffusion model. The selection of these hyperparameters was based on iterative experi-
mentation aimed at achieving the best balance between model performance and training
stability. All experimental results and performance improvements discussed in this study
were obtained using data augmentation methods implemented with these optimized
hyperparameter configurations.

Table 6. Summary of the optimized hyperparameters for the autoencoder, VAE, and diffusion model.

Model Hyperparameter Value


Learning rate 3 × 10−4
Epochs 100
Autoencoder Conv1D layers 3
64, 128, and 256, respectively, in three
Conv1D (number of output channels)
layers
Conv1D (kernel size) 3
Learning rate 5 × 10−4
Epochs 100
VAE Conv1D layers 4
Conv1D (number of output channels) 64
Conv1D (kernel size) 3 and 5, respectively, in four layers
Learning rate 3 × 10−4
Diffusion timesteps 1000
Diffusion model
Epochs 300
Down and up-sampling layers 2

4.1. Impact of Data Augmentation on Fall Detection Performance


The performance evaluation of the fall detection system, trained with various data
augmentation techniques, is presented in Table 7. The evaluation is conducted using both
the complete dataset and a reduced dataset containing only 25% of the real training data
to simulate the scarcity of data in the real world. The process involved using real data to
apply different data augmentation methods, implementing the fall detection classifier, and
testing the system to obtain the final results.

Table 7. Performance comparison of the fall detection system using different data augmentation
methods (values in %).

100% Real 25% Real


Model
Accuracy Precision Recall F1 Accuracy Precision Recall F1
BL 92.94 91.64 78.59 83.87 88.08 75.26 79.97 76.70
DTF 92.93 90.88 79.66 84.14 90.73 80.30 83.83 81.45
SMOTE 92.89 89.09 81.08 84.44 90.39 77.35 86.42 81.44
AE 92.85 91.03 78.91 83.68 91.54 82.12 83.43 82.42
VAE 93.24 91.03 80.64 84.62 91.29 83.14 82.59 81.84
DM 93.30 85.79 87.05 86.00 91.75 82.17 86.07 83.28

The diffusion model-based data augmentation demonstrated the most significant


improvement in fall detection performance among all methods. When trained on the full
Sensors 2025, 25, 2168 12 of 18

dataset, this method achieved the highest accuracy of 93.30% and an F1 score of 86.00%,
outperforming other augmentation techniques. Even in the case where only 25% of the
real training data are available, the diffusion model continued to yield superior results,
attaining an accuracy of 91.75% and an F1 score of 83.28%. Compared with the baseline
method (BL), which relies solely on real data without any augmentation, the diffusion
model led to notable improvements. Specifically, it increases the F1 score by 2.13% when
trained on the full dataset and by 6.58% when trained with only 25% of the real data.
This notable improvement can be attributed to the ability of diffusion models to
generate high-quality, diverse synthetic samples. By leveraging probabilistic modeling,
diffusion models learn the underlying data distribution. Through an iterative process of
progressively adding and removing noise, they generate diverse and realistic synthetic
samples that enhance the training set. This diversity enables the classifier to learn more
generalizable and discriminative features, leading to improved performance, especially
when real data are limited. These results highlight the strong potential of diffusion models
for augmenting sensor data in fall detection applications.

4.2. Quality Assessment of Synthetic Data


The effectiveness of a fall detection system depends not only on the quantity of training
data but also on their quality. Therefore, evaluating the quality of synthetic data generated
by the diffusion model is essential. In this section, three key criteria are used to assess
data quality: visualization of synthetic sensor signals, the Train-on-Synthetic-Test-on-Real
(TSTR) score, and divergence analysis.

4.2.1. Visualization of Sensor Signals


Figures 7 and 8 display the accelerometer and gyroscope signals for real (left) and
synthetic
Sensors 2025, 25, x FOR PEER REVIEW (right) data generated by the diffusion model for ADL and fall events, respectively.
13 of 19
Each plot represents a three-axis sensor with 100 readings captured over a time window of
approximately 5.43 s.

(a) (b)

Figure 7. Accelerometer
Figure 7. Accelerometer and gyroscope signals
and gyroscope signals from
from the
the ADL
ADL events.
events. (a)
(a) Real;
Real; (b)
(b) Synthetic.
Synthetic.
(a) (b)
Sensors 2025, 25, 2168 13 of 18
Figure 7. Accelerometer and gyroscope signals from the ADL events. (a) Real; (b) Synthetic.

(a) (b)

Figure 8.
Figure 8. Accelerometer and gyroscope
Accelerometer and gyroscope signals
signals from
from the
the fall
fall events.
events. (a)
(a) Real;
Real; (b)
(b) Synthetic.
Synthetic.

4.2.2.The diffusion
Train modelTest
on Synthetic successfully captures and reproduces important signal character-
on Real Score
isticsTable
of different
8 presents the TSTR evaluationADL
activities. For instance, activities
results suchdata
for various as walking exhibitmethods.
augmentation periodic
patterns reflected in the
Data transformation generated
(DTF) synthetic
and SMOTE signals.
exhibit theSimilarly, a key feature indicating
highest performance, of fall events is
that
the sudden impact, causing a sharp peak in the sensor readings. To illustrate
both methods generate synthetic data that resemble real-world sensor signals. However, this character-
istic, the highest acceleration
the exceptionally value
high similarity in Figure
between 8 is marked
synthetic withdata
and real ×.” The diffusion
an “suggests a potentialmodel
risk
effectively learns this pattern and generates synthetic signals that resemble
of overfitting, which may limit the model’s ability to generalize beyond the training real fall events.
dataset. As shown in Table 4, the fall detection performance with DTF and SMOTE shows
4.2.2. Train on Synthetic Test on Real Score
slight improvement compared with the baseline, but they do not achieve the best results
Table 8 presents the TSTR evaluation results for various data augmentation methods.
among all methods. In contrast, autoencoder (AE) and VAE yield significantly lower
Data transformation (DTF) and SMOTE exhibit the highest performance, indicating that
performance, suggesting that the synthetic data generated by AE and VAE may not
both methods generate synthetic data that resemble real-world sensor signals. However,
sufficiently capture the underlying characteristics of real sensor signals, leading to
the exceptionally high similarity between synthetic and real data suggests a potential risk
suboptimal performance in distinguishing between ADL and fall events.
of overfitting, which may limit the model’s ability to generalize beyond the training dataset.
As shown in Table 4, the fall detection performance with DTF and SMOTE shows slight
improvement compared with the baseline, but they do not achieve the best results among
all methods. In contrast, autoencoder (AE) and VAE yield significantly lower performance,
suggesting that the synthetic data generated by AE and VAE may not sufficiently capture
the underlying characteristics of real sensor signals, leading to suboptimal performance in
distinguishing between ADL and fall events.

Table 8. The comparison of TSTR score by different data augmentation methods (values in %).

Model Accuracy Precision Recall F1


DTF 99.64 98.92 99.64 99.28
SMOTE 96.69 89.39 98.41 93.67
AE 53.13 53.48 51.52 52.44
VAE 59.30 57.60 75.38 64.95
DM 91.30 75.51 96.45 84.67

The diffusion model demonstrates a balance between synthetic data fidelity and
diversity, achieving an accuracy of 91.30% and an F1 score of 84.67%. While its performance
is slightly lower than that of DTF and SMOTE, it offers a significant advantage in mitigating
overfitting by generating more diverse synthetic samples. This ability to generate diverse
Sensors 2025, 25, 2168 14 of 18

yet realistic samples enhance the model’s capacity to generalize across different activity
scenarios, making it a promising approach for improving fall detection performance in
real-world applications.

4.2.3. Divergence Comparison on Synthetic Data


Table 9 presents a comparison of divergence between synthetic data for ADL and fall
events, as well as between real and synthetic data generated by different data augmentation
methods. Regarding the divergence between fall and ADL synthetic signals, the diffusion
model exhibited an inner-class divergence of 0.0022 and an outer-class divergence of
0.00009, resulting in a ratio of 0.0400. This ratio is the highest among all data augmentation
methods evaluated. A higher ratio indicates a more significant distinction between the
two activity classes, suggesting that synthetic fall data generated by the diffusion model
are significantly different from synthetic ADL data. This distinct separation allows the fall
detection system to effectively learn the differences between these activities, ultimately
enhancing its ability to classify falls accurately.

Table 9. Divergence analysis of ADL and fall synthetic signals and the comparison between real and
synthetic data.

Fall/ADL Real/Synthetic
Model
Inner Outer Ratio Inner Outer Ratio
DTF 0.0023 0.00003 0.0130 5.50319 0.04927 0.00895
SMOTE 0.0019 0.00006 0.0308 5.55463 0.04824 0.00868
AE 0.0018 0.00001 0.0056 5.70100 0.03291 0.00577
VAE 0.0003 0.000002 0.0062 5.43478 0.03999 0.00736
DM 0.0022 0.00009 0.0404 5.45769 0.02520 0.00462

In the comparison between real and synthetic data, the diffusion model achieved
inner-class and outer-class divergence values of 5.45769 and 0.02520, respectively, yielding
the lowest ratio of 0.00462 among all methods. A lower ratio means a closer resemblance
between synthetic and real data, suggesting that the diffusion model generates highly
realistic synthetic signals. This substantial similarity makes diffusion-based synthetic data
a valuable resource for training fall detection models, especially in scenarios where real
sensor data are scarce. The synthetic data enhance model generalization and improve classi-
fication accuracy by supplementing the available real data. As a result, the diffusion model
significantly reduces the reliance on extensive sensor data collection, thereby minimizing
the associated costs in terms of workforce, time, and resources.

4.3. Computational Cost of the Data Augmentation Methods


The performance of fall detection systems can be enhanced through the application of
data augmentation techniques. However, these methods often introduce substantial com-
putational costs, resulting in longer preprocessing times and higher hardware requirements.
As a result, careful consideration of computational efficiency is crucial when choosing
appropriate data augmentation approaches.
Table 10 presents the computational costs of the evaluated augmentation methods,
including both computation time and the number of model parameters. The computation
time shown in Table 10 is measured from the beginning of the augmentation process
through the completion of model training, encompassing the entire leave-one-group-out
evaluation (with four groups in total).
Sensors 2025, 25, 2168 15 of 18

Table 10. Comparison of computational costs for different data augmentation methods.

Model Computation Time (s) Parameter (Byte)


DTF 195.91 -
SMOTE 111.30 -
AE 250.81 207,128
VAE 283.76 462,144
DM 1165.32 21,115,928

Among the methods, the diffusion model exhibits the longest computation time
due to its large number of parameters and the complexity of the forward and reverse
diffusion processes. Although it demonstrates an outstanding ability to generate high-
quality and diverse sensor data, the diffusion model requires the most substantial
Sensors 2025, 25, x FOR PEER REVIEWcomputational resources. 16 of 19

In contrast, traditional methods such as data transformation (DTF) and SMOTE re-
quire the shortest time to complete the augmentation and classification process. However,
suggesting
because thesethatmethods
the diffusion
rely onmodel’s effectiveness
simple statistical is still constrained
operations, the syntheticby thethey
data quantity
produceof
training
may lack data in extremely
fidelity low-data
and deviate fromscenarios.
real sensor Despite this limitation,
data, which the diffusion
could negatively impactmodel
sys-
demonstrates a strong capability to generate high-quality
tem performance. Autoencoder and VAE demonstrate moderate computational demands.and diverse synthetic data,
which can effectively
Autoencoders compress serve
andasreconstruct
a substitute forthrough
data real data, yielding
neural resultswhile
networks, closeVAEs
to those
add
achieved with real samples.
complexity by learning probability distributions. While the data generated by these meth-
ods Figure
may not10achieve
furtherthe highlights
same level theofimprovement achieved
realism as diffusion by applying
models, they offerthe diffusion
a reasonable
model-based data augmentation compared with the baseline,
balance between computational efficiency and augmentation effectiveness, making which does not utilize
them
augmentation. The enhancement
suitable alternatives is more pronounced
when computational resources are when the proportion of real data is
limited.
lower. Specifically, when only 25% of real data are used, generating 400% synthetic data
4.4. The Influence
improves the F1 of Real-to-Synthetic
score by 8.17%. InData Ratio on
contrast, Fall Detection
when training Performance
with 100% real data, the
Figure 9 illustrates
improvement is only 2.59% the fall detection
with performance
the addition of 300% when traineddata.
synthetic with These
varying propor-
findings
tions of realthe
underscore andsubstantial
diffusion model-augmented synthetic data.
effectiveness of diffusion The x-axis
model-based represents
data the ratio
augmentation,
of syntheticin
particularly data to real where
scenarios data. Fordataexample, if the
are scarce. By total available
reducing real for
the need data are 100
large training
amounts of
data,
real using
data, this50% of the real
approach data while
alleviates generating
the burden 200% synthetic
of extensive data means
data collection that 50 real
and annotation,
samples itarea used
making valuableto train
toolthefor
diffusion
improving model,falland a total of
detection 150 samplesin(50resource-
performance real and
100 synthetic)
constrained are used to train the fall detection model.
environments.

Figure 9. Fall detection performance with various ratio of real and diffusion model-augmented
Figure 9. Fall detection performance with various ratio of real and diffusion model-augmented
synthetic data.
synthetic data.
The results indicate that as the proportion of real data decreases, the F1 score of the
fall detection model also declines. Regardless of the real-to-synthetic data ratio, applying
lower. Specifically, when only 25% of real data are used, generating 400% synthetic data
improves the F1 score by 8.17%. In contrast, when training with 100% real data, the
improvement is only 2.59% with the addition of 300% synthetic data. These findings
Sensors 2025, 25, 2168 16 of 18
underscore the substantial effectiveness of diffusion model-based data augmentation,
particularly in scenarios where data are scarce. By reducing the need for large amounts of
real data, this approach alleviates the burden of extensive data collection and annotation,
diffusion model-based data augmentation consistently leads to better F1 scores compared
making it a valuable tool for improving fall detection performance in resource-
with not using it. When the proportion of real data is at least 25%, the synthetic data
constrained environments.
effectively supplement the training set, achieving performance levels comparable with
those obtained using an equivalent amount of real data. However, in cases where only 10%
of the real data are available, the F1 score remains consistently low, around 73%, suggesting
that the diffusion model’s effectiveness is still constrained by the quantity of training data in
extremely low-data scenarios. Despite this limitation, the diffusion model demonstrates a
strong capability to generate high-quality and diverse synthetic data, which can effectively
serve as a substitute for real data, yielding results close to those achieved with real samples.
Figure 10 further highlights the improvement achieved by applying the diffusion
model-based data augmentation compared with the baseline, which does not utilize aug-
mentation. The enhancement is more pronounced when the proportion of real data is lower.
Specifically, when only 25% of real data are used, generating 400% synthetic data improves
the F1 score by 8.17%. In contrast, when training with 100% real data, the improvement is
only 2.59% with the addition of 300% synthetic data. These findings underscore the sub-
stantial effectiveness of diffusion model-based data augmentation, particularly in scenarios
where data are scarce. By reducing the need for large amounts of real data, this approach
alleviates
Figure thedetection
9. Fall burden of extensive data
performance with collection andofannotation,
various ratio makingmodel-augmented
real and diffusion it a valuable tool
for improving
synthetic data. fall detection performance in resource-constrained environments.

Figure 10. Optimal fall detection performance across different proportions of real and diffusion
Figure 10. Optimal fall detection performance across different proportions of real and diffusion
model-based synthetic data.
model-based synthetic data.
5. Conclusions
Falls among the elderly pose a significant global health concern, making the devel-
opment of a reliable fall detection system essential to mitigating the risks associated with
fall-related injuries. A wrist-based fall detection system provides an ideal solution for
the elderly due to its user-friendliness. However, the high complexity of hand motion
modeling increases the difficulty of developing a reliable wrist-based fall detection system.
In addition, collecting and annotating sensor data for fall detection is expensive and labor-
intensive, leading to data scarcity and class imbalance issues. These challenges hinder
the effectiveness of wrist-based fall detection systems using deep learning technology, as
insufficient diverse training data limit model performance.
To address this problem, we compare several data augmentation methodologies, and
a conditional diffusion model-based data augmentation method is recommended. By itera-
tively adding and removing noise from sensor signals, the diffusion model demonstrates
a remarkable ability to generate high-quality and diverse synthetic data. Additionally,
Sensors 2025, 25, 2168 17 of 18

by incorporating class information into the generation process, this method effectively
captures distinctive characteristics of each activity class, ensuring that the synthetic data
closely align with the corresponding real activity patterns. As a result, the wrist-based
fall detection model significantly enhances its ability to differentiate fall events from other
activities accurately.
The results demonstrate that the conditional diffusion model-based data augmentation
method outperforms other approaches, achieving the highest accuracy and F1 score in fall
detection. When applied to the entire dataset, the method improves the F1 score by 2.59%
compared with training without data augmentation. More notably, when the real training
data are reduced to only 25%, the method still yields an F1 score improvement of 8.17%.
These findings highlight the effectiveness of the diffusion model in compensating for data
scarcity, providing a practical solution for maintaining fall detection system performance
with limited real data. The success of this approach underscores its strong potential for real-
world applications, further improving the reliability of wrist-based fall detection systems
using deep learning technology.

Author Contributions: Conceptualization, Y.-C.T.; methodology, Y.-C.T., C.-Y.L., C.-P.L. and C.-T.C.;
software, Y.-C.T. and C.-Y.L.; validation, Y.-C.T., C.-Y.L., C.-P.L. and C.-T.C.; formal analysis, Y.-C.T.
and C.-Y.L.; writing—original draft preparation, Y.-C.T. and C.-Y.L.; writing—review and editing,
C.-T.C.; supervision, C.-T.C.; project administration, C.-T.C. All authors have read and agreed to the
published version of the manuscript.

Funding: This research was funded by the National Science and Technology Council under grant num-
ber NSTC 112-2221-E-A49-013-MY2, and the APC was funded by NSTC 112-2221-E-A49-013-MY2.

Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.

Data Availability Statement: The original data presented in the study are openly available in UP-fall
detection dataset at: https://sites.google.com/up.edu.mx/har-up/ (accessed on 25 February 2025).

Conflicts of Interest: The authors declare no conflicts of interest.

References
1. Cutler, D.M.; Poterba, J.M.; Sheiner, L.M.; Summers, L.H.; Akerlof, G.A. An aging society: Opportunity or challenge? Brook. Pap.
Econ. Act. 1990, 1990, 1–73. [CrossRef]
2. Moye, J.; Marson, D.C.; Edelstein, B. Assessment of capacity in an aging society. Am. Psychol. 2013, 68, 158. [CrossRef] [PubMed]
3. Harper, S. Economic and social implications of aging societies. Science 2014, 346, 587–591.
4. Kaye, A.D.; Baluch, A.; Scott, J.T. Pain management in the elderly population: A review. Ochsner J. 2010, 10, 179–187. [PubMed]
5. United States Census Bureau Census Bureau releases new estimates on America’s families and Living Arrangements. Census.
Gov. Retr. August 2021, 30, 2022.
6. Bergen, G.; Stevens, M.R.; Kakara, R.; Burns, E.R. Understanding modifiable and unmodifiable older adult fall risk factors to
create effective prevention strategies. Am. J. Lifestyle Med. 2021, 15, 580–589.
7. Lee, H.; Lim, J.H. Living alone, environmental hazards, and falls among US older adults. Innov. Aging 2023, 7, igad055. [CrossRef]
8. Terroso, M.; Rosa, N.; Torres Marques, A.; Simoes, R. Physical consequences of falls in the elderly: A literature review from 1995
to 2010. Eur. Rev. Aging Phys. Act. 2014, 11, 51–59. [CrossRef]
9. Donald, I.P.; Bulpitt, C.J. The prognosis of falls in elderly people living at home. Age Ageing 1999, 28, 121–125.
10. Shumway-Cook, A.; Ciol, M.A.; Gruber, W.; Robinson, C. Incidence of and risk factors for falls following hip fracture in
community-dwelling older adults. Phys. Ther. 2005, 85, 648–655. [CrossRef]
11. Florence, C.S.; Bergen, G.; Atherly, A.; Burns, E.; Stevens, J.; Drake, C. Medical costs of fatal and nonfatal falls in older adults. J.
Am. Geriatr. Soc. 2018, 66, 693–698. [CrossRef]
12. Martini, D.; Pettigrew, N.; Wilhelm, J.; Parrington, L.; King, L. Wearable Sensors for Vestibular Rehabilitation: A Pilot Study. J.
Physiother. Res 2021, 5, 31.
13. Guo, Z.; Zhang, Z.; An, K.; He, T.; Sun, Z.; Pu, X.; Lee, C. A Wearable Multidimensional Motion Sensor for AI-Enhanced VR
Sports. Research 2023, 6, 0154. [CrossRef]
Sensors 2025, 25, 2168 18 of 18

14. Seel, T.; Raisch, J.; Schauer, T. IMU-based joint angle measurement for gait analysis. Sensors 2014, 14, 6891–6909. [CrossRef]
15. Chen, J.; Kwong, K.; Chang, D.; Luk, J.; Bajcsy, R. Wearable sensors for reliable fall detection. In Proceedings of the 2005 IEEE
Engineering in Medicine and Biology 27th Annual Conference, Shanghai, China, 17–18 January 2006; pp. 3551–3554.
16. Hussain, F.; Hussain, F.; Ehatisham-ul-Haq, M.; Azam, M.A. Activity-aware fall detection and recognition based on wearable
sensors. IEEE Sens. J. 2019, 19, 4528–4536. [CrossRef]
17. Sucerquia, A.; López, J.D.; Vargas-Bonilla, J.F. SisFall: A fall and movement dataset. Sensors 2017, 17, 198. [CrossRef] [PubMed]
18. Chai, X.; Wu, R.; Pike, M.; Jin, H.; Chung, W.-Y.; Lee, B.-G. Smart wearables with sensor fusion for fall detection in firefighting.
Sensors 2021, 21, 6770. [CrossRef]
19. Kangas, M.; Konttila, A.; Lindgren, P.; Winblad, I.; Jämsä, T. Comparison of low-complexity fall detection algorithms for body
attached accelerometers. Gait Posture 2008, 28, 285–291. [CrossRef]
20. Kangas, M.; Konttila, A.; Winblad, I.; Jamsa, T. Determination of simple thresholds for accelerometry-based parameters for fall
detection. In Proceedings of the 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology
Society, Lyon, France, 22–26 August 2007; pp. 1367–1370.
21. Bourke, A.K.; O’brien, J.; Lyons, G.M. Evaluation of a threshold-based tri-axial accelerometer fall detection algorithm. Gait Posture
2007, 26, 194–199. [CrossRef]
22. Rucco, R.; Sorriso, A.; Liparoti, M.; Ferraioli, G.; Sorrentino, P.; Ambrosanio, M.; Baselice, F. Type and location of wearable sensors
for monitoring falls during static and dynamic tasks in healthy elderly: A review. Sensors 2018, 18, 1613. [CrossRef]
23. Pannurat, N.; Thiemjarus, S.; Nantajeewarawat, E. Automatic fall monitoring: A review. Sensors 2014, 14, 12900–12936. [CrossRef]
[PubMed]
24. Yuan, J.; Tan, K.K.; Lee, T.H.; Koh, G.C.H. Power-efficient interrupt-driven algorithms for fall detection and classification of
activities of daily living. IEEE Sens. J. 2014, 15, 1377–1387.
25. Noury, N.; Rumeau, P.; Bourke, A.K.; ÓLaighin, G.; Lundy, J. A proposal for the classification and evaluation of fall detectors.
Irbm 2008, 29, 340–349. [CrossRef]
26. De Quadros, T.; Lazzaretti, A.E.; Schneider, F.K. A movement decomposition and machine learning-based fall detection system
using wrist wearable device. IEEE Sens. J. 2018, 18, 5082–5089.
27. Mauldin, T.R.; Canby, M.E.; Metsis, V.; Ngu, A.H.; Rivera, C.C. SmartFall: A smartwatch-based fall detection system using deep
learning. Sensors 2018, 18, 3363. [CrossRef]
28. Kraft, D.; Srinivasan, K.; Bieber, G. Deep learning based fall detection algorithms for embedded systems, smartwatches, and IoT
devices using accelerometers. Technologies 2020, 8, 72. [CrossRef]
29. Martínez-Villaseñor, L.; Ponce, H.; Brieva, J.; Moya-Albor, E.; Núñez-Martínez, J.; Peñafort-Asturiano, C. UP-fall detection dataset:
A multimodal approach. Sensors 2019, 19, 1988. [CrossRef]
30. Um, T.T.; Pfister, F.M.; Pichler, D.; Endo, S.; Lang, M.; Hirche, S.; Fietzek, U.; Kulić, D. Data augmentation of wearable sensor
data for parkinson’s disease monitoring using convolutional neural networks. In Proceedings of the 19th ACM International
Conference on Multimodal Interaction, Glasgow, UK, 13–17 November 2017; pp. 216–220.
31. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell.
Res. 2002, 16, 321–357.
32. Zhai, J.; Zhang, S.; Chen, J.; He, Q. Autoencoder and its various variants. In Proceedings of the 2018 IEEE International Conference
on Systems, Man, and Cybernetics (SMC), Miyazaki, Japan, 7–10 October 2018; pp. 415–419.
33. Kingma, D.P. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114.
34. Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851.
35. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of
the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich,
Germany, 5–9 October 2015; Proceedings, Part III 18, 2015. pp. 234–241.
36. Zhang, K.; Li, Y.; Liang, J.; Cao, J.; Zhang, Y.; Tang, H.; Fan, D.-P.; Timofte, R.; Gool, L.V. Practical blind image denoising via
Swin-Conv-UNet and data synthesis. Mach. Intell. Res. 2023, 20, 822–836.
37. Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-unet: Unet-like pure transformer for medical image
segmentation. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 205–218.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like