An Explainable Method for Cost Efficient 250226 210338 Compressed
An Explainable Method for Cost Efficient 250226 210338 Compressed
Abstract—Human fall detection is a crucial topic to study, since literature in order to solve automatically the problematic
there are a lot of cases of person’s fall at hospitals, homes and of fall detection [32]. Overall, although that marker-based
retirement homes. In fact, falls are very costly, especially for methods have proved to be accurate for detecting falls, their
elderly people and persons with special needs, since they may
cause death or serious injuries that require instance medical effectiveness decreases remarkably when markers are placed
intervention. In order to prevent further repercussions after this incorrectly on the body, in addition to being impractical,
type of accidents, modern automated fall detection methods are difficult to install in the environment or to carry around
presented as a type of effective alerting systems that are widely all day. Thus, recent works are focusing on non-wearable
used for emerging healthcare applications. In this study, we methods, notably those based only on video sequences [33].
present a multi-view-based fall detection method that runs in
real time, using only CPU and consequently it can be deployed These methods can be classified into three main groups. The
in hospitals, retirement homes and cribs without any financial first group consists of methods based on hand-crafted feature
problems related to expensive hardware. Indeed, a light weight extraction followed by a supervised classification procedure.
human pose estimator has been adopted in order to detect human However, methods belonging to this first class are in generally
body key-points from two different views in order to solve the not sufficiently accurate and the extraction of hand-crafted
problematic of image depth ambiguity. Then, we extract few
explainable features, based on the automatically detected key- features heavily relies on human experience, while being
points, while being associated to confirmed descriptors of posture not thorough enough to cover all the different cases. The
and balance biomechanics. The extracted features are thereafter second group consists of end-to-end deep learning-based
fed into a machine learning classifier in order to predict whether methods. These methods promise very good results but, on
there is a fall or not. The proposed method has been tested on a the other hand, they learn the fall prediction in the annotated
challenging public dataset, and the preliminary obtained results
show its effectiveness compared to other relevant state-of-the-art data, without explicit regularization and with the risk of
methods. overfitting. The third main group consists of hybrid methods
Index Terms—Fall detection, pose estimation, explainable fea- that are based on deep learned features that are thereafter
tures, multi-view, healthcare processed using supervised classifiers. Nevertheless, the
methods composing this last group are also facing the risk
I. I NTRODUCTION of overfitting, in addition to the black-box aspect of the
In recent years, human motion analysis has been explored extracted features.
in various applications and many effective tools have been
developed to accurately analyze and interpret human behaviour In this study, we are interested by the first class, which
[31]. In fact, by extracting significant characteristics, human is based on investigating hand-crafted features, given that
behaviour can be analyzed and understood very well, without fall detection systems should be designed to perform an
the need for sensors or markers that limit the movements increasingly complex task in environments shared with
of persons [13]. In particular, human fall detection is a humans, and thus the need for explainable models will
crucial topic to study since there are a lot of cases of falls persist. However, this problem remains difficult to solve
at hospitals, homes and retirement homes. Across the globe, for two major reasons. The first reason is related to the
according to the World Health Organization (WHO), falls lack of representative datasets, since most of the available
are the second leading cause of unintentional injury deaths datasets are recorded in laboratories and do not reflect the
worldwide, and each year there are over 37 million falls real use case and consequently presented solutions cannot
[1]. Indeed, the repercussions generally are injuries that generalize well. In addition, by addressing the real-world
require a medical intervention, hospitalization and some-time situations, we may face other difficulties such as occlusion,
death (approximately 684, 000 of death case each year [1]). variation in human clothing, lighting condition and depth
Thus, several approaches have been proposed in the scientific ambiguities in images. The second reason is the problem
of fall detection solutions deployment. Generally, solutions or not. If an abnormal velocity is detected, the body joints
presented in literature cannot be deployed in hospitals, homes are checked to determine whether they are on the ground
or retirement homes and this is related generally to the use or not before concluding that a fall has occurred. Likewise,
of expensive or unpractical hardware. In order to address the Yang et al. [4] have detected falls through head tracking with
above-mentioned problems, we propose a multi-view-based dense Spatio-Temporal Context (STC) algorithm based on
fall detection method that runs in real time, using only CPU Kinect RGB-D images. Others studies have relied on human
and two regular RGB cameras. Thus, the proposed method pose estimators to provide the position of human body joints
can be deployed anywhere without any financial problems used for fall detection. For example, in the work of [5], a
related to expensive hardware. Indeed, we start by estimating video-based fall detection method was proposed using a pose
human poses in videos using a light weight human pose estimator in the first step, followed by a classification step for
estimator. Then, we extract explainable feature, according to a sequence of images into two main classes, fall/Not fall, using
relevant posture and balance biomechanics patterns, based a fully convolutional neural network. Similarly, authors in [17]
exclusively on some human body keypoints. These features have utilized a human pose estimator in order to detect human
are thereafter filtered before being fed to a supervised body joints in RGB video frames. In fact, the extracted joints
classifier in order to predict fall detection. In addition to are used to build spatial and temporal features classified by the
being costly efficient, the main contribution of the method Long-Short-Term-Memory (LSTM) recurrent neural network
resides in investigating interpretable hand-crafted features with the aim of detecting falls.
while exploring multi-camera in order to overcome the issues
of perspective ambiguity and occlusions. B. End-to-end deep learning based-methods
Within the context of automated fall detection methods
The rest of this paper is organized as follows. In Section 2, based on end-to-end deep learning models, Lu et al. [25]
we briefly review the related work on fall detection based on have designed an effective fall detector deep learning model
video analysis. In Section 3, we describe the proposed method. based on the combination of a 3D convolution neural network
Experimental results are discussed in Section 4 to demonstrate and an LSTM recurrent architecture. The model presented
the effectiveness of the proposed method. Finally, in Section has been trained on various kinematic video data and has
5, we conclude the proposed work and present some ideas for shown to be capable of automatically extracting spatial and
future investigations. temporal features. Similarly, Abobakr et al [21] have inves-
tigated an end to end deep learning model for fall detection,
II. R ELATED W ORK while taking videos recorded using Kinect RGB-D sensor as
Due the advances in technologies, cameras have become inputs. In fact, the proposed model uses two different types
accessible to everyone, and can be installed everywhere from of neural networks: convolution neural networks followed by
hospitals and airports to retirement homes. In particular, this LSTM recurrent neural networks. Differently, the fall detection
has made solving the problem of fall detection more practical system introduced in [23] has relied on a multi-stream 3D
than using other sensors. Thus, various methods have been convolutional neural networks for identifying human fall. In
proposed in order to solve the fall detection problem and, fact, the proposed method makes a fusion between multiple
as described in the previous section, these methods can be and consecutive images in a first stage then fed it into a multi-
grouped into three main groups: hand crafted based-methods, stream 3D convolutional neural network in order to make
methods based on end-to-end deep learning models and hybrid prediction to one of the following classes: standing, falling,
methods. fallen and others. Furthermore, in their work, Xiaogang Li
et al. [27] have adopted a convolution neural network model
A. Hand crafted based-methods similar to Alexnet [28] in order to detect falls in a video
Within the framework of fall detection based on hand- surveillance environment.
crafted features, Kishanprasad Gunale et al. [2] have imple-
mented a fall detection method based on background subtrac- C. Hybrid methods
tion followed by feature extraction and classification stages. Hybrid methods are based on extracting features using deep
In fact, they have proposed to form silhouettes using visual learning architectures before classifying inputs according to
features such as Motion History Image (MHI), aspect ratio the extracted features using machine learning classifiers. For
and orientation angle, and then tested different classifiers such instance, Chhetri et al [22] have proposed an automated vision-
as SVM, KNN and decision tree. Similarly, Feng et al. [24] based fall detection system that uses the optical flow for the
have adopted background subtraction as a preprocessing step, data pre-processing followed by a convolution neural network
while combing deep belief networks and restricted Boltzman for feature extraction and classification. In this work, the
machine for the classification stage. Differently, Zhao et al. transfer learning was adopted as the fine-tuning technique
[3] have used the Microsoft Kinect Sensor in order to capture for the convolution neural network. Likewise, Anishchenko
RGB-D images as well as the person’s position and skeleton [29] have also adopted transfer learning to train the Alexnet
joints. The provided data is thereafter investigated in order to [28] convolution neural network architecture to solve the
compute person’s velocity to determine whether it is abnormal problem of fall detection. Similarly, in [30] a wheelchair fall
detector system has been introduced based on a low-cost,
light-weight inertial sensing method based on a hybrid scheme
and unsupervised One-Class SVM (OCSVM) for detection of
cases leading to fall. However, hybrid methods are facing the
risk of overfitting, in addition to the black-box aspect of the
extracted features.
Furthermore, Fig. 5 shows the evolution of the loss as Fig. 4: Confusion matrix for MLP-19 video tests.
a function of training and test data for 50 epochs, and as
we can clearly see, training and test losses are too close
and converge towards zero. Thus, we can conclude that we an influence on prediction. The importance of the features
have no over-fitting problem. Besides, in order to identify is ranked in ascending order as follows: HWR, GPC, Angle,
how many successive frames are needed to make the best AMV, and ”Previous angle”.
decision, we have tested MLP-19 (MLP with 19 neurons in
B. SVM classifier
the hidden layer) on all the videos. Each time, we have made a
decision based on n successive images (from 1 to 6 images). For the SVM classifier, we have performed the same tests
Moreover, in order to test the effectiveness of the proposed as for the MLP. In fact, we have tested the whole method
features, we have carried out several tests. For each test, we on videos, looking for the best number of successive images,
subtract one feature to see the impact of that feature on the and we have finally investigated features effectiveness, by
model performance. The table I shows that all features have subtracting each time one feature from the five we set. For
angle, GPC.
Fig. 7 shows the results of the SVM test with 4 successive Furthermore, Table III shows the best results, while
images using all the 5 features. As a result, angles have no including the effective and ineffective predictive features of
influence on prediction (subtracting this feature has improved each model. As we can see, the SVM classifier has achieved
the results), and for the rest of the features, their importance is the best results, followed by MLP, and LR comes last with
ranked in ascending order as follows: HWR, AMV, Previous poor results compared to the other models. Moreover, the
TABLE II: LR feature effectiveness test.
V. C ONCLUSION
In recent years, human motion analysis has been explored
in various applications and many effective tools have been
developed to accurately analyze human behaviour. In fact,
by extracting significant characteristics, human behaviour can (b)
be analyzed and understood very well, without the need for Fig. 10: An illustrative sample of an human pose estimator
sensors or markers that limit the movements of persons. In failure case: (a) View 1, (b) View 2.
particular, human fall detection is a crucial topic to study
since there are a lot of cases of falls at hospitals, homes
and retirement homes. In this work, we have proposed an
effective human fall detection method that is easy to deploy [10] Gutierrez-Gallego, Jesus & Rodriguez, Victor & Martı́n, Sergio. (2022).
and does not require expensive equipment to operate. In fact, Fall Detection System Based on Far Infrared Images. 1-7. doi:
10.1109/TAEE54169.2022.9840598
we have presented a study analyzing the effectiveness of [11] Wei Liu, Xu Liu, Yuan Hu, Jie Shi, Xinqiang Chen, Jiansen Zhao,
five main features for multi-view-based fall detection, and Shengzheng Wang, and Qingsong Hu. Fall Detection for Shipboard
this study led to the conclusion that the Height/Width ratio Seafarers Based on Optimized BlazePose and LSTM. Sensors. 2022;
22(14):5449. https://doi.org/10.3390/s22145449
and the projection of the body’s center of gravity are the [12] Hernandez-mendez, s.; maldonado-mendez, c.; marin-hernandez, a.;
most important features of the entire list presented, while rios-figueroa, h.v. detecting falling people by autonomous service robots:
being clearly explainable. Moreover, we have evaluated the A ros module integration approach. In proceedings of the 2017 inter-
national conference on elec- tronics, communications and computers
proposed fall detection method with three relevant machine (conielecomp), cholula, mexico, 22–24 february 2017; pp. 1–7. doi:
learning classifiers (MLP, SVM and LR), while comparing 10.1109/CONIELECOMP.2017.7891823
the effectiveness of the features, by eliminating one feature [13] Elaoud, A., Barhoumi, W., Zagrouba, E., & Agrebi, B. (2020). Skeleton-
based comparison of throwing motion for handball players. Journal
each time, before rigorously examining the quantitative results of Ambient Intelligence and Humanized Computing, 11, 419-431. doi:
as well as the qualitataive ones. In further work, we aim https://doi.org/10.1007/s12652-019-01301-6
to develop a multi-person fall detection method that uses [14] Bourke a. k., and lyons g. m. (2008). A threshold-based fall-detection
algorithm using a bi-axial gyroscope sensor. Medical Engineering &
person tracking and matching techniques in order to solve Physics. 30, 84–90. doi: 10.1016/ j.medengphy.2006.12.001
the problems encountered in this work. We can also test [15] Apichet Yajai, Annupan Rodtook, Krisana Chinnasarn, Suwanna Ras-
other recent multi-person pose estimators that can estimate mequan. (2015). Fall detection using directional bounding box. . 12th
International Joint Conference on Computer Science and Software
human body pose with similar performance to the Blaze pose Engineering (JCSSE): 52-57. doi:10.1109/JCSSE.2015.7219769
estimator. [16] Martı́nez-Villaseñor L, Ponce H, Brieva J, Moya-Albor E, Núñez-
Martı́nez J, Peñafort-Asturiano C. (2019) UP-Fall Detection Dataset:
ACKNOWLEDGMENT A Multimodal Approach. Sensors (Basel). Apr 28;19(9):1988. doi:
10.3390/s19091988. PMID: 31035377; PMCID: PMC6539235.
This work was funded by the Tunisian Ministry of Higher [17] Heinrich, Christian & Koita, Samad & Taufeeque, Mohammad &
Education and Scientific Research (MESRS) and the French Spicher, Nicolai & Deserno, Thomas. (2021). Abstract: Multi-camera,
Ministry of Foreign Affairs and Ministry of Higher Education Multi-person, and Real-time Fall Detection using Long Short Term
Memory. 10.1007/978-3-658-33198-6 29.
under the PHC Utique program in the CMCU project number [18] Espinosa, R., Ponce, H., Gutiérrez, S., Martı́nez-Villaseñor, L., Brieva,
23G1411. Any opinions, findings, and conclusions or recom- J. and Moya-Albor, E., 2019. A vision-based approach for fall detection
mendations expressed in this material are those of the authors using multiple cameras and convolutional neural networks: A case study
using the UP-Fall detection dataset. Computers in biology and medicine,
and do not necessarily reflect the views of MESRS. 115, p.103520. doi: https://doi.org/10.1016/j.compbiomed.2019.103520
[19] Richmond, S. B., Fling, B. W., Lee, H., & Peterson, D. S. (2021) ”The
R EFERENCES assessment of center of mass and center of pressure during quiet stance:
[1] World health organization (WHO), falls, world health organization, current applications and future directions.” Journal of biomechanics 123
11 november 2022. [Online]. Available: https://www.who.int/news- : 110485. doi: https://doi.org/10.1016/j.jbiomech.2021.110485
room/fact-sheets/ detail/falls. [20] Morasso, Pietro. ”Centre of pressure versus centre of mass stabilization
[2] Gunale.K & Mukherji, P. (2018). Indoor human fall detection system strategies: the tightrope balancing case.” Royal Society open science 7.9
based on automatic vision using computer vision and machine learning (2020): 200111. doi: https://doi.org/10.1098/rsos.200111
algorithms. J. Eng. Sci. Technol, 13(8), 2587-2605. [21] Abobakr, A., Hossny, M., Abdelkader, H., & Nahavandi, S. (2018).
[3] Zhao, Feng & Cao, Zhi-Guo & Xiao, Yang & Mao, Jing & Yuan, ”Rgb-d fall detection via deep residual convolutional lstm networks.”
Junsong. (2018). Real-Time Detection of Fall From Bed Using a Digital Image Computing: Techniques and Applications (DICTA). IEEE,
Single Depth Camera. IEEE Transactions on Automation Science and 2018. doi: 10.1109/DICTA.2018.8615759
Engineering. PP. 1-15. doi: 10.1109/TASE.2018.2861382. [22] Chhetri, S., Alsadoon, A., Al-Dala’in, T., Prasad, P. W. C., Rashid,
[4] Yang, Lei & Ren, Yanyun & Hu, Huosheng & Tian, Bo. (2015).(2015). T. A., & Maag, A. ”Deep learning for vision-based fall detection
New fast fall detection method based on spatio-temporal context tracking system: Enhanced optical dynamic flow.” Computational Intelligence
of head by using depth images. Sensors, 15(9), 23004-23019. doi: 37.1 (2021): 578-595. doi: https://doi.org/10.48550/arXiv.2104.05744
23004-19. 10.3390/s150923004. [23] Alanazi, Thamer, and Ghulam Muhammad. ”Human fall detection using
[5] Chen, Ziwei & Wang, Yiye & Yang, Wankou. (2021). 3D multi-stream convolutional neural networks with fusion.” Diagnostics
Video Based Fall Detection Using Human Poses. doi: 12.12 (2022): 3060. doi: https://doi.org/10.3390/diagnostics12123060
https://doi.org/10.48550/arXiv.2107.14633 [24] Feng, P., Yu, M., Naqvi, S.M., & Chambers, J.A. (2014, August). Deep
[6] Bazarevsky, Valentin & Grishchenko, Ivan & Raveendran, learning for posture analysis in fall detection. In 2014 19th International
Karthik & Zhu, Tyler & Zhang, Fan & Grundmann, Matthias. Conference on Digital Signal Processing (pp. 12-17). IEEE, Canada. doi:
(2020). BlazePose: On-device Real-time Body Pose tracking. doi: 10.1109/ICDSP.2014.6900806
https://doi.org/10.48550/arXiv.2006.10204 [25] Lu, N., Wu, Y., Feng, L., & Song, J. (2018). ”Deep learning for fall
[7] Samaan, Gerges & Wadie, Abanoub & Attia, Abanoub & Asaad, detection: Three-dimensional CNN combined with LSTM on video
Abanoub & Kamel, Andrew & Slim, Salwa & Abdallah, Mohamed kinematic data.” IEEE journal of biomedical and health informatics 23.1
& Cho, Young-Im. (2022). MediaPipe’s Landmarks with RNN for (2018): 314-323. doi: 10.1109/JBHI.2018.2808281
Dynamic Sign Language Recognition. Electronics. 11. 3228. doi: [26] Elaoud, A., Barhoumi, W., Drira, H., & Zagrouba, E. (2019, Febru-
https://doi.org/10.3390/electronics11193228 ary). Weighted linear combination of distances within two manifolds
[8] Alsawadi, Motasem & El-kenawy, El-Sayed & Rio, Miguel. (2022). for 3D human action recognition. In VISIGRAPP (5: VISAPP). doi:
Using BlazePose on Spatial Temporal Graph Convolutional Networks 10.5220/0007369006930703
for Action Recognition. Computers, Materials & Continua. 74. doi: [27] Xiaogang Li; Tiantian Pang; Weixiang Liu; Tianfu Wang. (2017).
10.32604/cmc.2023.032499 ”Fall detection for elderly person care using convolutional neural net-
[9] Maaoui,H., Elaoud, A., Barhoumi,W. (2023). An Accurate Random works.” 10th international congress on image and signal processing,
Forest-Based Action Recognition Technique Using only Velocity and biomedical engineering and informatics (CISP-BMEI). IEEE, 2017. doi:
Landmarks’ Distances. In International Conference on Information and 10.1109/CISP-BMEI.2017.8302004
Knowledge Systems (pp. 129-144). Cham: Springer Nature Switzerland. [28] A. Krizhevsky, I. Sutskever, and G. E Hinton. Imagenet classification
doi: https://doi.org/10.1007/978-3-031-51664-1 9 with deep convolutional neural networks. In International Conference
on Neural Information Processing Systems, pp. 1097–1105, 2012. doi:
https://doi.org/10.1145/3065386
[29] L. Anishchenko, ”Machine learning in video surveillance for fall detec-
tion,” 2018 Ural Symposium on Biomedical Engineering, Radioelectron-
ics and Information Technology (USBEREIT), Yekaterinburg, Russia,
2018, pp. 99-102, doi: 10.1109/USBEREIT.2018.8384560.
[30] Sheikh, S.Y., Jilani, M.T. A ubiquitous wheelchair fall detection system
using low-cost embedded inertial sensors and unsupervised one-class
SVM. J Ambient Intell Human Comput 14, 147–162 (2023). doi:
https://doi.org/10.1007/s12652-021-03279-6
[31] Elaoud, A., Barhoumi, W., Drira, H., & Zagrouba, E. (2020). Modeling
Trajectories for 3D Motion Analysis. In Computer Vision, Imaging
and Computer Graphics Theory and Applications: 14th International
Joint Conference, VISIGRAPP 2019, Prague, Czech Republic, February
25–27, 2019, Revised Selected Papers 14 (pp. 409-429). Springer
International Publishing. doi: https://doi.org/10.1007/978-3-030-41590-
7 17
[32] Bhola, G., et Vishwakarma, D. K. (2024). A review of vision-based in-
door HAR: state-of-the-art, challenges, and future prospects. Multimedia
Tools and Applications, 83(1), 1965-2005.
[33] Yao, L., Yang, W., et Huang, W. (2022). A fall detection method based
on a joint motion map using double convolutional neural networks.
Multimedia Tools and Applications, 1-18.