0% found this document useful (0 votes)
12 views

Feature Selection based on F-score for Enhancing CTG Data Classification

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Feature Selection based on F-score for Enhancing CTG Data Classification

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2019 IEEE International Conference on Cybernetics and Computational Intelligence (IEEE CYBERNETICSCOM),

Banda Aceh - Indonesia, August 22-24, 2019

Feature Selection based on F-score for Enhancing


CTG Data Classification
Nina Sevani1, Indra Hermawan2, Wisnu Jatmiko3
Faculty of Computer Science
Universitas Indonesia
Depok, Indonesia
email: [email protected], [email protected], [email protected]

Abstract—The existence of many features refracts the the wrapper category, the quality of the feature selection
manual interpretation process of the Cardiotocography (CTG) methods depends on the classifier. This condition affects the
data. Therefore, feature selection methods are useful to select search space size and computing time, especially for data with
the relevant features that can reduce the complexity of the many features. Meanwhile, in the filter category, the
interpretation. The reduction of the complexity also speeds up performance of the feature selection methods is not affected
time computation besides improving the accuracy of the by the classifier, which results in faster computing time. In the
classification and prediction results. This study proposes a filter category, the methods consider the characteristics of the
statistical approach by using the feature selection method based data regardless of the evaluation criteria to select the features
on F-Score. The method aims to tackle the imbalanced data with
[9]. The last, hybrid categories method is a combination of the
multi-class output. In this method, the features will be assessed
filter and wrapper categories. The methods perform an
individually and rated based on their F-score. The features with
an F-score value above the average will be chosen as the relevant
interaction with the classifier but not evaluate feature set
features. We use Support Vector Machine (SVM) as a classifier iteratively [8]. Several factors can be used to choose the
to implement the F-score method. The experiment also feature selection methods such as simplicity, stability, number
employed other datasets to test the compatibility of the F-score of reduced features, computational requirement, and
method. The scalability and stability testing conducted to classification accuracy [7].
evaluate the performance of the F-score method. The The statistical approach to feature selection methods
experiment result shows that the F-score method can be
works by considering the characteristics of the data. This
implemented successfully. In the case of CTG dataset, the
accuracy of the classifier improves from 94.35% by using 21
approach is utilized in the case of flat data and there is a
features to 99.91% by using eight relevant features. This relation between features. Previous research on CTG was
improvement also can be found in all of the dataset experiment generally carried out by utilizing the evolutionary algorithm
results. as a feature selection method [2][3][6][10][11]. Those
algorithms do not accommodate a statistical approach as part
Keywords— feature selection; statistical approach; CTG of its fitness function.

I. INTRODUCTION Based on [12], the F-score method is a fundamental,


simple, and effective approach, and the application of the F-
The Cardiotocography (CTG) signal consists of two score provide an adequate classification accuracy. The F-score
signals, the Uterine Contraction (UC) and the Fetal Heart Rate which build in by a statistical approach can be used for
(FHR) signal. CTG is one of the non-invasive methods for imbalanced data that has multi-class output [10]. The F-score
monitoring and predicting the fetal condition. There are is a filter category method that assesses the feature
several features in CTG such as acceleration, deceleration, and individually [8]. It selects the subset of features based on a
variability along with its derivatives. Generally, those features threshold without relies on the classifier. This method takes
are used for predicting fetal status [1]. Nowadays the the ranking value to measure distinguish between features and
interpretation of CTG signals is conducted manually. the target class. It is optimizing the feature space in the dataset
However, it is prone to subjectivity, inconsistency, and and removing the irrelevant features. In this research, we
contain different opinion between inter-observer and intra- propose an algorithm that utilized F-score as a feature
observer [2][3]. A large number of CTG features itself can selection method.
refract the manual interpretation process. These problems can
be solved by the implementation of the feature selection This paper utilizes a Support Vector Machine (SVM) as a
methods to select the relevant features. The relevant features classifier to test the selected features. SVM is widely used
are the features which can describe the data properly. The classifier and provide good accuracy for prediction [2][10]. It
utilization of the relevant features facilitate the physician to has the flexibility and the ability to handle high dimensional
read and interpret CTG signals data and has been proof data [11]. The Radial Basis Function (RBF) will be applied to
increase the performance of the prediction results [4][5]. The CTG data, as it is not fully linearly separable [2][10]. The
use of the feature selection methods enables to reduce the accuracy, stability, and scalability testing will be implemented
complexity and speed up the computational time [6][7]. to evaluate the performance of the F-score method. In addition
to CTG, F-score also applied to various dataset even discrete
Generally, the feature selection methods can be grouped data.
into three categories: a wrapper, filter, and hybrid [7][8]. In

978-1-7281-0867-4/19/$31.00
Authorized ©2019
licensed use limited to: Universitas Indonesia. 18
IEEE Downloaded on February 28,2023 at 03:55:00 UTC from IEEE Xplore. Restrictions apply.
II. PROPOSED METHOD A. Dataset
In the process of feature selection, the relevant features CTG dataset that used in this paper was obtained from the
will be sorted by F-score. Those features can describe the CTG UCI Machine Learning Repository [13][14]. UCI Machine
data clearly. The simplicity, the ability to handle high Learning Repository is a website which provides a collection
dimensional data, and suitable for continuous signal, are of a database that can be accessed by the public for free. The
several advantages of the F-score. The use of the relevant CTG dataset consists of 2126 sample. It is divided into three-
features is expected to eliminate bias in the interpretation of class output: Normal, Suspect, and Pathological. There are
CTG signals and also enhance the accuracy of the 1656 samples in the Normal class, 176 samples in the
classification. Figure 1 shows the general steps of the Pathological class, and 295 class in the Suspect class. There
proposed method by using F-score as a feature selection are 21 features on CTG dataset. Those features are
method and SVM as the classifier. The detailed process of the acceleration, deceleration, variability, and their derivation.
F-score features selection method illustrated as the red square The derivation features include the mean value of long-term
in Figure 2. variability, number of prolonged deceleration per second, the
F-score Performance Testing :
mean value of short-term variability, and the percentage of
Dataset
Feature Selected Classifier : - Stability time with abnormal long-term variability.
Selection Features SVM - Scalability
Method - Accuracy Acceleration refers to the CTG signal above the FHR
Fig. 1. The general steps of the proposed method
baseline. It indicates fetal alertness, such as fetal distress. Its
commonly happens in early labor and associated with fetal
Figure 2 shows that the proposed method will engage both movement or uterine contractions[1]. Deceleration is the
the instances and the feature in the dataset for the initial step. opposite of the acceleration. It refers to the CTG signal below
The dataset source was obtained from the UCI Machine the FHR baseline and indicates a fetal disorder. Based on the
Learning Repository [13][14]. The next step is the shape and the association with contractions, the deceleration
computational process, which is the F-score computation of divided into early, late, variable, and prolonged deceleration
each feature. The proposed method also computes the mean [1]. Variability is the most important features to determine
F-score from all the features which set as the threshold to current fetal condition. Short-term-variability (STV) and
choose the relevant features. The selected or relevant features long-term variability (LTV) are the common used features in
that satisfy the condition will be set as an input for the the term of variability. STV refer to beat-to-beat which
classifier. Meanwhile, the unselected features will be ignored. describe the differences between beats. LTV indicate FHR
From the classifier output, the accuracy will be calculated changes in the cycle of 3 to 5 per minutes as a response of
using the confusion matrix. uterine contractions or fetal movement [1].
B. F-Score
F-score is a feature selection method based on a statistical
approach. It sorts the relevant feature by assessing each
feature individually [8][12]. The increasing of the F-score
value means the most relevance feature. Due to the continuity
and imbalance of CTG dataset, F-score will be implemented.
The ranking method will be used to choose the subset of the
features. Equation (1) is the F-score formula used in this study

− ( )= (1)

Here refer to the i-th feature of the CTG data, is the


total instances of class j, and n is the total instances of all the
class. Variable μ indicates mean values of the features of all
class, μj is the mean values of the features in the j-th class, σ is
the standard deviation of class j, and c is the number of the
output class. In this research, c set to be 3.
The selected features have to satisfy the condition and the
mean value of the F-score from all the features (μ) set as the
threshold [12]. If the features have F-score value greater than
the threshold, it will be chosen as the relevant features.
Meanwhile, if the features have F-score value lower than the
threshold, thus it is regarded as irrelevant features. The
selected features will be set as a classifier and tested by using
SVM. Figure 3 shows the illustration of the F-score scheme.
Suppose that there are n features for each m instances in the
dataset. F-score value will be calculate for each feature by
using Equation (1).
Fig. 2. The detailed process of the F-score feature selection method

19
Authorized licensed use limited to: Universitas Indonesia. Downloaded on February 28,2023 at 03:55:00 UTC from IEEE Xplore. Restrictions apply.
TABLE I. F-SCORE FROM EIGHT SELECTED FEATURES
Feature 1 Feature 2 .. . Feature n
Feature Name F-score
Instance 1
Number of acceleration per second 196.03

Instance 2 Number of prolonged deceleration


505.85
per second
.. . Time with abnormal STV
343.82
(in percentage)
Time with abnormal LTV
345.16
Instance m (in percentage)
Mode of histogram 275.12
F-score F-score F-score
.. .
Feature 1 Feature 2 Feature n Mean of histogram 297.63

Median of histogram 248.77


Mean Value of F-score for all the features
Variance of histogram 150.80
Fig. 3. The illustration of F-score method scheme.

C. Support Vector Machine (SVM) The selected features satisfy the condition, whereas their
Generally, SVM is utilize for binary classification and F-score value is greater than F-score mean value, which equals
multi-class classification. SVM works by dividing the class to 147.537743. Table I shows the F-score value for eight
using a surface called a hyperplane. SVM will dissever the selected features. The selected features in Table I show that
hyperplane by optimizing the margin of the class. Equation (2) the number of acceleration, the duration of the deceleration,
express a formula for hyperplane and the variability are selected as the relevant factor for CTG
dataset. This result is in line with [1].
. + =0 (2)
B. SVM Performance with Selected Features
where w is a normal vector to the hyperplane, b is a scalar, and Basically, the automated classification of the fetal status
is the input of the class. Lagrange multiplier will be used to base on CTG can be proven through its performance. This
maximize the margin between hyperplane. Equation (3) is a research utilizing the SVM as a classifier to test the
formulation for hyperplane optimization performance of selected features in predicting fetal status. The
RBF also implemented in correlation to SVM because of the
( , , ) = || || − ∑ ( ( . + ) − 1) (3) CTG dataset is not fully linearly separated. The experiment
results show that there is an improvement of the accuracy by
Here i=1,2,...,l and is the i-th vector of Lagrange using the selected features to classify the fetal status.
multipliers. Assumed that the CTG data is not fully separable
linearly. The RBF was used to replace the yi.xi and maximizing The accuracy measures the overall efficiency of a
the margin [2][10]. The RBF lead to obtained low classifier [12]. In this research, the accuracy is calculated
computational cost. The formulation of RBF express by based on the confusion matrix. The elements of the confusion
Equation (4). The value of ɣ and σ2 are set to be 90 and 0.4 matrix consist of the correlation value between expert
respectively [10] annotation and the classification result. It can be written as

( , ) = exp(
|| ||
) (4) = , (5)

where TP refers to True Positive, TN is True Negative, FP is


III. EXPERIMENT RESULT AND DISCUSSION False Positive, and FN is False Negative. The accuracy value
is equal to 94.35% when all features included in the
This experiment employs three datasets as benchmarks to experiment. However, the accuracy value yields 99.91%
test the performance of the F-score method [13][14]. Those when the experiment employed the selected features. The
datasets are iris dataset, liver disorder, and breast cancer. Iris improvement of accuracy also occurs for the experiment by
dataset consists of 150 instances, the liver disorder has 345 using the benchmark. The accuracy values are shown in Table
instances, and breast cancer has 699 instances. II.
A. Selected Features The experiment conducted by using CTG dataset yield the
In the case of CTG dataset, the experiment conducted with highest accuracy improvement compared to other datasets as
10-fold cross-validation. In addition, eight relevant features shown in Table 2. The accuracy rises more than 5% for CTG
are obtained to describe the dataset. The features contain dataset, while in other datasets only in a range of 2%. This
prolonged deceleration, the number of accelerations, and the result shows that the implementation of F-score as a feature
variability. Prolonged deceleration is an isolated deceleration selection method is suitable for SVM as the classifier, in the
in more than 2 minutes, which lead to bradycardia or case of CTG dataset. The experiment results also show that
tachycardia. Bradycardia and tachycardia are fetal disorders the utilization of the selected features can improve the
that are related to abnormal baseline heart rate [1]. In the accuracy of the classifier, specifically in datasets with many
variability feature, the frequently appear of STV and LTV features such as the CTG dataset. The accuracy obtained in
indicate that there is a problem with the fetus. The presence of this study shows better value than the accuracy in the previous
an acceleration feature is an indicator that the fetus is the study [12]. Based on this outcome, we can convey that the F-
absence of an acidosis. score appropriate method to implement in the CTG dataset.

20
Authorized licensed use limited to: Universitas Indonesia. Downloaded on February 28,2023 at 03:55:00 UTC from IEEE Xplore. Restrictions apply.
C. Stability and Scalability Testing
Stability and scalability are several challenges related to
the feature selection method [8]. Good feature selection
method has to improve the performance of the classifier even
when it is used in different types of the dataset with various
amount of data. Stability testing will prove the ability of F-
score methods, especially when its implemented in a different
dataset. Meanwhile, the scalability testing will prove the
ability of F-score method to handle the various amount of data.
Table II shows the performance of the classification using full
features and using selected features chosen by F-score feature
selection method for all the benchmark dataset.

TABLE II. PERFORMANCE COMPARISON


Fig. 5. Scalability testing result for iris dataset.
All Selected
Number Original Selected
Dataset Features Features
of Class Features Feature
Accuracy Accuracy

CTG 3 21 94.35 8 99.91

Iris 2 3 100 2 100

Liver
2 6 97.98 2 99.71
Disorder

Breast
3 10 98.14 3 100
Cancer

The stability testing showed that the F-score method can


be implemented in all the benchmarking dataset. The
utilization of F-score method also successfully selecting
relevant features. It can improve the accuracy of the classifier Fig. 6. Scalability testing result for liver disorder dataset.
to predict fetal status. This result is proven valid for all the
benchmarking dataset. In liver disorder and breast cancer
dataset, the improvement is about 2%. This improvement is
slightly lower than CTG dataset. We suspected this result is
related to the number of features in each dataset. As shown in
Table 2, the original features of liver disorder dataset and
breast cancer dataset is less than the number of original
features in CTG dataset. Nevertheless, this opinion still needs
further investigation.
The scalability testing completed by dividing instances in
each dataset into several portions. Figure 4 to Figure 7
provide the result of the scalability testing with a various
portion of instances f all the dataset.

Fig. 7. Scalability testing result for breast cancer dataset.

The result of the scalability testing shown that the F-score


method can handle the various amount of data, especially in
all the dataset utilized in this study. Though there is a
difference accuracy value by using a different portion of
instances, but the difference is not significant. The tendency is
that the accuracy by using all the features in the dataset with a
large number of instances is smaller than in the small number
of instances. It is also seen in the various dataset with a small
number of instances, the accuracy of the classifier has no
significant differences. Even if for all the features or only the
selected features from the F-score method, the accuracy is
Fig. 4. Scalability testing result for CTG dataset.
already high.

21
Authorized licensed use limited to: Universitas Indonesia. Downloaded on February 28,2023 at 03:55:00 UTC from IEEE Xplore. Restrictions apply.
IV. CONCLUSION Fetal Hypoxia Assessment,” Computers in Biology and Medicine, 99,
pp. 85-97, 2018.
This study applied F-score method to assess the relevance [4] V. Chudacek et.al, “Assessment of Features for Automatic CTG
between feature and data. The determination of the relevant Analysis Based on Expert Annotation,” Proc. 33rd Annual
features to describe the CTG data is based on the ranking International Conference of the IEEE EMBS, Sept. 2011, pp. 6051-
method. The mean of F-score is implemented as the threshold. 6054.
From the experiment results, the F-score is successfully [5] A. Georgieva, S. J. Payne, M. Moulden, C. W. G. Redman, “Artificial
implemented in CTG dataset and three other datasets Neural Networks Applied to Fetal Monitoring in Labour,” Neural
Comput. Applic., 22, pp. 85-93, 2013.
benchmarking. The utilization of F-score method improves
[6] L. Xu, C. W. G. Redman, S. J. Payne, A. Georgieva, “Feature
the prediction fetal status accuracy. In the future, the Selection Using Genetic Algorithms for Fetal Heart Rate Analysis,”
performance of the F-score method when combined with other Physiological Measurement, 35, pp. 1357-1371, 2014.
classifiers will be tested. Further investigation to assess the [7] G. Chandrashekar and F. Sahin, “A Survey on Feature Selection
correlation between the number of features in the dataset with Methods,” Computers and Electrical Engineering, 40, pp. 16-28, 2014.
the accuracy of the classifier will be conducted. The usage of [8] J. Li et.al., “Feature Selection : A Data Perspective,” ACM Computing
F-score as a statistical approach in the fitness function for Surveys, Vol. 50, No. 6, Article 94, pp. 94:1-94:45, 2017.
wrapper or hybrid feature selection methods also needs to be [9] A. A. Nadri, F. Rad, and H. Parvin, “A Framework for Categorize
explored deeply. Feature Selection Algorithms for Classification and Clustering,”
Bulletin de la Societe Royale des sciences de Liege, Vol. 85, pp. 850-
ACKNOWLEDGMENT 862, 2016.
[10] S. Ravindran, A. B. Jambek, H. Muthusamy, and S-C. Neoh, “A Novel
The dataset was obtained from the UCI Machine Learning Clinical Decision Support System Using Improved Adaptive Genetic
Repository. The authors are very grateful for the support of Algorithm for the Assessment of Fetal Well-Being,” Computational
the data. This research is funded by the grant from and Mathematical Methods in Medicine, Vol. 2015, pp. 1-11, 2015.
Konsorsium Riset Unggulan Perguruan Tinggi (KRUPT) [11] Z. Chen, T. Lin, N. Tang, and X. Xia, “A Parallel Genetic Algorithm
NKB-1070/UN2.R3.1/HKP.05.00/2019. Based Feature Selection and Parameter Optimization for Support
Vector Machine,” Scientific Programming, Volume 2016, pp. 1-10,
2016.
REFERENCES
[12] S. Gunes, K. Polat, and S. Yosunkaya, “Multi-class f-score Feature
[1] R. K. Freeman, M. P. Nageotte, T. J. Garite, and L. A. Miller, Fetal Selection Approach to Classification of Obstructive Sleep Apnea
Heart Rate Monitoring, 4th ed., Lippincott Williams & Wilkins: USA, Syndrome,” Expert System with Applications, 37, pp. 998-1004, 2010.
2012, pp. 85-111.
[13] D. Dua and C. Graff, UCI Machine Learning Repository
[2] H. Ocak, “A Medical Decision Support System Based on Support [http://archive.ics.uci.edu/ml], University of California, School of
Vector Machine and the Genetic Algorithm for The Evaluation of Fetal Information and Computer Science : Irvine, CA, 2019.
Well-Being,” J Med Syst, 37: 9913, pp. 1-9, 2013.
[14] K. P. Bennet and O. L. Mangasarian, “Robust Linear Programming
[3] Z. Comert, A. F. Kocamaz, and V. Subha, “Prognosis Model Based on Discrimination of Two Linearly Inseparable Sets,” Optimization
Image-Based Time-Frequency Features and Genetic Algorithm for Methods and Software, 1, pp. 23-34. 1992.

22
Authorized licensed use limited to: Universitas Indonesia. Downloaded on February 28,2023 at 03:55:00 UTC from IEEE Xplore. Restrictions apply.

You might also like