0% found this document useful (0 votes)
47 views

Improving Efficiency of Self-Care Classification Using PCA and Decision Tree Algorithm

The document discusses improving the efficiency of self-care classification for children with physical disabilities using principal component analysis and decision tree algorithms. PCA is used to extract significant features while the decision tree is used to build the classification model. The proposed model achieved an accuracy of 94.29% and PCA-based feature extraction improved the model's performance by an average of 1.7% compared to classifiers without PCA.

Uploaded by

Allan Moreira
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

Improving Efficiency of Self-Care Classification Using PCA and Decision Tree Algorithm

The document discusses improving the efficiency of self-care classification for children with physical disabilities using principal component analysis and decision tree algorithms. PCA is used to extract significant features while the decision tree is used to build the classification model. The proposed model achieved an accuracy of 94.29% and PCA-based feature extraction improved the model's performance by an average of 1.7% compared to classifiers without PCA.

Uploaded by

Allan Moreira
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2020 International Conference on Decision Aid Sciences and Application (DASA)

Improving Efficiency of Self-care Classification


Using PCA and Decision Tree Algorithm
Muhammad Syafrudin Ganjar Alfian Norma Latif Fitriyani
Department of Industrial and Systems Industrial AI Research Center, Nano Department of Industrial and Systems
Engineering Information Technology Academy Engineering
Dongguk University Dongguk University Dongguk University
Seoul, Republic of Korea Seoul, Republic of Korea Seoul, Republic of Korea
[email protected] [email protected] [email protected]

Abdul Hafidh Sidiq Tjahjanto Tjahjanto Jongtae Rhee


Industrial Engineering Fakultas Teknologi Informasi Department of Industrial and Systems
Gunadarma University Universitas Budi Luhur Engineering
Depok, Indonesia Jakarta, Indonesia Dongguk University
[email protected] [email protected] Seoul, Republic of Korea
2020 International Conference on Decision Aid Sciences and Application (DASA) | 978-1-7281-9677-0/20/$31.00 ©2020 IEEE | DOI: 10.1109/DASA51403.2020.9317243

[email protected]

Abstract—Self-care classification for children with physical classification for reducing the cost and time on classifying
disability remains an important and challenging issue. It needs self-care issue [6,7]. In addition, previous studies have
the support from occupational therapists to make decision. demonstrated that machine learning algorithms, i.e., artificial
Data-driven decision making have been widely adopted to make neural network (ANN) [8], fuzzy neural networks (FNN) [9],
decision based on the data with help of expert systems and and hybrid autoencoder [10] has been positively employed as
machine learning algorithms. In this study, we developed an decision tools for self-care classification.
efficient self-care classification model based principal
component analysis (PCA) and decision tree (DT). PCA is used Decision tree (DT) is one of machine learning method that
to extract the significant features, while the DT is used to build has been successfully applied in health area for predicting the
the classification model. We measure several metrics to evaluate in-hospital cardiac arrest [11], length of stay in emergency
the performance of proposed model as compared to other department [12], and risk of patient falls [13]. However, in
models and previous study results. Based on 10-fold cross- the machine learning area, an appropriate feature extraction
validation results, the proposed model outperformed other may influence the classification model performance. Previous
models and previous study results by achieving accuracy of researchers have demonstrated positive impact of principal
94.29%. Furthermore, PCA-based feature extraction has shown component analysis (PCA)-based feature extraction toward
positive result on improving the model’s performance with improving the classification model’s performance for cardiac
average accuracy improvement as much as 1.7% as compared ailments prediction [14], brain diseases diagnosis [15], and
to classifiers without PCA-based feature extraction method.
early diabetic retinopathy detection [16].
Finally, it is projected that the outcomes of the study could assist
the occupational therapist on enlightening the efficiency of self- Therefore, we propose a self-care classification model
care classification and children therapy. based on the integration of PCA and DT for children with
physical disability. To the best of our knowledge, this is the
Keywords—self-care classification, machine learning, feature first time PCA and DT have been employed to enhance the
extraction, pca, decision tree self-care prediction model’s performance. PCA is used to
I. INTRODUCTION reduce and extract the significant features while DT is utilized
to learn and generate the prediction model from the PCA-
The physical disability is the conditions of body which based extracted features. It is expected that the outcomes of
distresses the capability to endure continued physical or our study could assist the occupational therapist on enhancing
mental strength, and person physical ability [1]. There are two the efficiency of self-care classification and children therapy.
most important reasons of physical and motor disabilities such
as trauma and complications during childbirth or genetic II. MATERIAL AND METHOD
disorder. This disability could be long-term failure and might
limit individual activities [2]. Practically, expert occupational A. Dataset
therapist is required to diagnose the physical disability since it We utilized publicly available self-care based on ICF-CY
has complex procedure. There is necessity to gather the dataset, namely SCADI that was gathered by Zarchi et al.
features of exceptional children. The international (2018) from the health centers in Yazd, Iran [8]. The original
classification of functioning, disability, and health: children data comprises of 70 subjects, 29 activities based on ICF-CY
and youth version (ICF-CY) as a conceptual framework is standards and with 205 input attributes. In this dataset, the
provided by World Health Organization (WHO) to categorize self-care activities are grouped into eight category such as
self-care issue [3]. washing oneself, toileting, caring for body parts, dressing,
drinking, eating, looking after one’s safety, and looking after
Artificial intelligence (AI) has gained tremendous one’s health. The input attributes consist of 203 (29×7) which
attention in health domain and been applied as a powerful tool are related with self-care activities and remainder two are age
for analyzing the cognitive psychology data [4] and combating and gender. We followed previous studies [9,10], to convert
with COVID-19 outbreak [5]. Machine learning is one AI the original multi-classification output into binary-
application that have been successfully applied in self-care classification cases. The final classification output is whether
This work was supported by Cooperative Research Program of Center the self-care issue is present in the subject (1) or not present
for Companion Animal Research (Project No. PJ0139862020), Rural
Development Administration, Republic of Korea.

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE SAO CARLOS. Downloaded on March 22,2024 at 15:32:41 UTC from IEEE Xplore. Restrictions apply.
978-1-7281-9677-0/20/$31.00 ©2020 IEEE 224
2020 International Conference on Decision Aid Sciences and Application (DASA)

(0). After data preprocessing, the final dataset comprises of 70 After the features have been extracted by PCA, we then
subjects with 16 and 54 subjects being labeled with the utilized decision tree (DT) classifier to classify the self-care
absence (0) and presence (1) of self-care problem, problem. DT is classification model similar to tree that
respectively. demonstrates all potential decision options and related results.
We utilized the C4.5 version of DT algorithm established by
B. Proposed Model Ross Quinlan to create the decision tree [18]. Algorithm 1
Fig. 1 shows the flow diagram of our proposed model for describes the detail procedure for C4.5 decision tree.
self-care classification. First, we load the SCADI dataset and
apply 10-fold cross-validation to provide general model and Algorithm 1 : Decision tree (C4.5)
prevent from overfitting issue. Secondly, each dataset is
randomly divided into 10-folds, 9-subsets of data are Input : data D
considered as train set and the remaining is considered as a
Output : Tree
test set. Then, the PCA is utilized to extract the best features
from the train set. Finally, decision tree (DT) classifier is the 1. Create node N
employed to learn and generate the classification model from
the PCA-based extracted features. Once the learning process Obtain the top feature that best distinguishes
is done, the DT-based trained model is utilized to classify the 2.
instances in D
self-care problem from the test set. This procedure (with
different possible combination of train and test set) is then run 3. Label node N with
10 times, which gives 10 results for each metric. The final
estimated performance metrics of learning from this test set is 4. for each output i of N
the average of the 10. is the set of instances in D that fulfill
5.
output i
6. if is equal to 0 then
Assign a label with the majority
7.
class in D to node N
8. else
Assign the node replaced by C4.5
9.
(D ) to node N
10. end if
11. end for
12. return N

C. Experimental Setup and Performance Metrics


Machine learning models are applied to detect self-care
problem from a public dataset. All classifier and feature
extraction method are implemented in python V3.6.9 and used
Sklearn V0.22.2 library [19]. All experiments were performed
on a Window machine with i7 intel core processor and 16 GB
memory. The default configurations given by sklearn were
utilized and applied for all classification models used in this
study such as naïve bayes (NB), k-nearest neighbor (KNN),
(MLP), and support vector machine (SVM), and DT to offer
fewer settings with the expectation that the present research
can be replicated by future researchers. We used four
Fig. 1. The flow diagram of the proposed model for self-care performance metrics such as precision, recall, F1-score, and
classification. accuracy to assess and validate the performance of our model
with other well-known classification models such as NB,
Principal component analysis or namely PCA is the KNN, MLP, and SVM. For each model predictions, four
technique used for decreasing the dimension of data [17]. PCA different potentials output exist: TP (true positive), TN (true
captures big principal variability in the data by projecting the negative), FP (false positive), and FN (FN). TP and TN output
high dimensional data to a lower dimension. While the small are precisely categorized, whereas FP output categorized as
variability is ignored, and the dimension is reduced by finding positive (1) when they are actually negative (0), and FN output
the new sets of features which is smaller than the original set classified as negative (0) when they are actually positive (1).
of features. Practically, PCA converts a matrix of N features In our study, the absence (negative) and presence (positive)
into a new data of less than N features. In PCA, the optimal of self-care issues are considered. In addition, we employed
number of component or features is the important parameter following four performance metrics such as accuracy which
setting and need to be tuned in order to get the best result. We can be computed as
presented the analysis of component size for PCA in the
results and discussions section.

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE SAO CARLOS. Downloaded on March 22,2024 at 15:32:41 UTC from IEEE Xplore. Restrictions apply.
225
2020 International Conference on Decision Aid Sciences and Application (DASA)

, (1)

precision is computed as
, (2)
recall is computed as

, (3)

F1-score is computed as
2
1 (4)
Fig. 2. Impact of feature size of PCA on the decision tree accuracy
Furthermore, we applied 10-fold cross-validation for all performance.
classification models and presenting the average result for
each performance metric.
III. RESULTS AND DISCUSSIONS
Table 1 shows the performance of classification models on
self-care classification in term of precision, recall, f1-score
and accuracy. The machine learning algorithms, i.e., NB,
KNN, MLP, and SVM are compared with the proposed model
based on PCA and decision tree for self-care classification. To
provide fair comparison, we utilized the default parameters
settings which were provided by sklearn library for all
classification models used in this study. In this scenario for the
proposed model, the original features from the dataset are
extracted using PCA and the decision tree is employed for the
classification model. The results discovered that our model
outperformed other classification models by up to 98.33%, Fig. 3. Impact of utilizing PCA technique for feature extraction on the
95.96%, and 94.29% for precision, f1-score, and accuracy, classification accuracy.
respectively.

TABLE I. PERFORMANCE OF MACHINE LEARNING ALGORITHMS ON Fig. 4 showed the impact of PCA as compared to several
SELF-CARE CLASSIFICATION. feature selection methods, i.e., Chi-squared and Extra-trees on
Classification Performance evaluation metric (%) the decision tree model accuracy. We optimized the number
model Precision Recall F1-score Accuracy of selected features for chi-squared and found that the
optimum accuracy was achieved when the number of features
NB 91.31 83.00 85.16 78.57
was set to 22. While the extra-trees can find the optimal
KNN 91.19 95.00 91.66 87.14 number of selected features automatically, it is 63. The result
showed that PCA performed superior accuracy up to 8.57%
MLP 93.81 93.33 92.27 88.57
accuracy improvements as compared to original result.
SVM 87.86 100.00 93.17 88.57 However, other feature selection techniques measured in the
Proposed model 98.33 94.33 95.96 94.29 present study, i.e., chi-squared and extra-trees [20], made
poorly accuracy improvement. Thus, PCA was the most
Notes: We highlight the best performing model in underline.
outstanding option for decision tree algorithm, providing best
accuracy for self-care classification.
We also investigated the impact of feature size of PCA on
the decision tree accuracy performance. Fig. 2 showed that the
optimal accuracy of 94.29% was achieved when the feature
size was set to 56. We then used this number to obtain the
optimum performance for the proposed model on self-care
classification.

Fig. 3 showed the impact of utilizing PCA for feature


extraction on the accuracy of classification algorithms. The
result showed that by utilizing the PCA technique for feature
extraction, all the machine learning algorithms were improved
as compared to using all features as input, except for NB and
SVM. Our result confirmed that by extracting features using
PCA, it improved the accuracy of classifiers especially for
decision tree with the average accuracy improvement as much
as 1.7% as compared to classifiers without PCA-based feature Fig. 4. Impact of PCA and other methods on the decision tree
extraction method. classification performance.

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE SAO CARLOS. Downloaded on March 22,2024 at 15:32:41 UTC from IEEE Xplore. Restrictions apply.
226
2020 International Conference on Decision Aid Sciences and Application (DASA)

In addition, we further compared the proposed model [1] R. Lucas-Carrasco, E. Eser, Y. Hao, K. M. McPherson, A. Green, and
(PCA-DT) with the results from previous study. The L. Kullmann, “The Quality of Care and Support (QOCS) for people
with disability scale: Development and psychometric properties,”
performance comparison result is summarized in Table 2. Research in Developmental Disabilities, vol. 32, no. 3, pp. 1212–1225,
Souza et al. (2019) reported the fuzzy neural networks (FNN) May 2011.
gained test accuracy of 85.11% [9]. While Putatunda (2020) [2] R. Lewis Brown and R. J. Turner, “Physical Disability and Depression:
combined autoencoders and deep neural networks called Clarifying Racial/Ethnic Contrasts,” J Aging Health, vol. 22, no. 7, pp.
Care2Vec and achieved accuracy of 91.43% [10]. Generally, 977–1000, Oct. 2010.
the proposed model outperformed all previous study results, [3] World Health Organization. (2007). International classification of
i.e., FNN and Care2Vec with accuracy of 94.29% and average functioning, disability and health: children and youth version: ICF-
CY. World Health Organization.
accuracy improvement by as much as 6.02% as compared to
[4] M. B. Jamshidi, N. Alibeigi, N. Rabbani, B. Oryani, and A. Lalbakhsh,
previous study results. “Artificial Neural Networks: A Powerful Tool for Cognitive Science,”
in 2018 IEEE 9th Annual Information Technology, Electronics and
TABLE II. BENCHMARK WITH PREVIOUS STUDY RESULTS. Mobile Communication Conference (IEMCON), Vancouver, BC, Nov.
2018, pp. 674–679.
Study Validation method Accuracy (%)
[5] M. Jamshidi et al., “Artificial Intelligence and COVID-19: Deep
FNN [9] Hold-out 85.11 Learning Approaches for Diagnosis and Treatment,” IEEE Access, vol.
8, pp. 109581–109595, 2020.
Care2Vec [10] Tenfold cross-validation 91.43
[6] Y.-L. Yeh, T.-H. Hou, and W.-Y. Chang, “An intelligent model for the
Proposed model Tenfold cross-validation 94.29 classification of children’s occupational therapy problems,” Expert
Systems with Applications, vol. 39, no. 5, pp. 5233–5242, Apr. 2012.
Notes: We highlight the best performing model in underline.
[7] M. Syafrudin et al., “A Self-Care Prediction Model for Children with
Disability Based on Genetic Algorithm and Extreme Gradient
The experimental and comparison results indicated that Boosting,” Mathematics, vol. 8, no. 9, p. 1590, Sep. 2020.
the self-care classification can be improved by utilizing PCA [8] M. S. Zarchi, S. M. M. Fatemi Bushehri, and M. Dehghanizadeh,
and decision tree model. By utilizing the proposed model, the “SCADI: A standard dataset for self-care problems classification of
occupational therapist can speed up the self-care classification children with physical and motor disability,” International Journal of
process and thus improving its effectiveness for children Medical Informatics, vol. 114, pp. 81–87, Jun. 2018.
therapy. [9] P. V. C. Souza et al., “Using hybrid systems in the construction of
expert systems in the identification of cognitive and motor problems in
children and young people,” in 2019 IEEE International Conference
IV. CONCLUSIONS on Fuzzy Systems (FUZZ-IEEE), New Orleans, LA, USA, Jun. 2019,
In this study, we proposed an efficient self-care pp. 1–6.
classification model by integrating principal component [10] S. Putatunda, “Care2Vec: a hybrid autoencoder-based approach for the
analysis (PCA) and decision tree (DT) algorithm. PCA was classification of self-care problems in physically disabled children,”
Neural Comput & Applic, May 2020.
used to reduce the feature dimension while the DT was applied
to build the robust classification model. A standardized self- [11] H. Li et al., “Decision tree model for predicting in‐hospital cardiac
arrest among patients admitted with acute coronary syndrome,” Clin
care binary-classification dataset was utilized to measure the Cardiol, vol. 42, no. 11, pp. 1087–1093, Nov. 2019.
performance of the proposed model and of previous study [12] M. A. Rahman, B. Honan, T. Glanville, P. Hough, and K. Walker, “
results. The experimental results showed that the proposed Using data mining to predict emergency department length of stay
model outperformed other classification models in terms of greater than 4 hours: Derivation and single ‐ site validation of a
accuracy, precision, and f1-score. We also presented the decision tree algorithm,” Emergency Medicine Australasia, vol. 32,
no. 3, pp. 416–421, Jun. 2020.
analysis of PCA feature size in improving the DT performance
and found the optimal feature size is 56. In addition, the PCA [13] H. Jung, H.-A. Park, and H. Hwang, “Improving Prediction of Fall Risk
Using Electronic Health Record Data With Various Types and Sources
was superior in improving the DT accuracy performance by at Multiple Times:,” CIN: Computers, Informatics, Nursing, vol. 38,
up to 8.57% as compared to Chi-squared and extra-trees no. 3, pp. 157–164, Mar. 2020.
methods. Furthermore, the comparison study also revealed [14] C. V. Verma and S. M. Ghosh, “Dimensionality Reduction Using PCA
that the proposed model is superior as compared to previous Algorithm for Improving Accuracy in Prediction of Cardiac Ailments
study results. The study is expected to help occupational in Diabetic Patients,” in Proceedings of International Conference on
therapy to efficiently classify the presence or absence of self- Wireless Communication, vol. 36, H. Vasudevan, Z. Gajic, and A. A.
Deshmukh, Eds. Singapore: Springer Singapore, 2020, pp. 443–452.
care problem among the children with disability. Thus,
[15] Z. Li, J. Fan, Y. Ren, and L. Tang, “A novel feature extraction approach
improving self-care therapy. Since the proposed model was based on neighborhood rough set and PCA for migraine rs-fMRI,” IFS,
tested only using a single dataset, it is necessary to further vol. 38, no. 5, pp. 5731–5741, May 2020.
examine the generality of the proposed model once other [16] T. R. Gadekallu et al., “Early Detection of Diabetic Retinopathy Using
standardized self-care activities datasets would be accessible PCA-Firefly Based Deep Learning Model,” Electronics, vol. 9, no. 2,
in the near future. p. 274, Feb. 2020.
[17] I. T. Jolliffe, Principal Component Analysis. 2002.
ACKNOWLEDGMENT [18] Han, J.; Kamber, M.; Pei, J. Data Mining: Concepts and Techniques,
This paper is a tribute made of deep respect for a 3rd ed.; Morgan Kaufmann Publishers: Burlington, MA, USA, 2011.
wonderful person, colleague, advisor, and supervisor, Yong- [19] F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,”
Journal of Machine Learning Research, vol. 12, no. 85, pp. 2825–
Han Lee (1965–2017). 2830, 2011.
REFERENCES [20] P. Geurts, D. Ernst, and L. Wehenkel, “Extremely randomized trees,”
Mach Learn, vol. 63, no. 1, pp. 3–42, Apr. 2006.

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE SAO CARLOS. Downloaded on March 22,2024 at 15:32:41 UTC from IEEE Xplore. Restrictions apply.
227

You might also like