Rahim 2021
Rahim 2021
Abstract
Background: Several hematological indices have been already proposed to discriminate between iron deficiency
anemia (IDA) and β‐thalassemia trait (βTT). This study compared the diagnostic performance of different hematologi-
cal discrimination indices with decision trees and support vector machines, so as to discriminate IDA from βTT using
multidimensional scaling and cluster analysis. In addition, decision trees were used to determine the diagnostic clas-
sification scheme of patients.
Methods: Consisting of 1178 patients with hypochromic microcytic anemia (708 patients with βTT and 470 patients
with IDA), this cross-sectional study compared the diagnostic performance of 43 hematological discrimination indices
with classification tree algorithms and support vector machines in order to discriminate IDA from βTT. Moreover,
multidimensional scaling and cluster analysis were used to identify the homogeneous subgroups of discrimination
methods with similar performance.
Results: All the classification tree algorithms except the LOTUS tree algorithm showed acceptable accuracy meas-
ures for discrimination between IDA and βTT in comparison with other hematological discrimination indices. The
results indicated that the CRUISE and C5.0 tree algorithms had better diagnostic performance and efficiency among
other discrimination methods. Moreover, the AUC of CRUISE and C5.0 tree algorithms indicated more precise classifi-
cation with values of 0.940 and 0.999, indicating excellent diagnostic accuracy of such models. Moreover, the CRUISE
and C5.0 tree algorithms showed that mean corpuscular volume can be considered as the main variable in discrimi-
nation between IDA and βTT.
Conclusions: CRUISE and C5.0 tree algorithms as powerful methods in data mining techniques can be used to
develop accurate differential methods along with other laboratory parameters for the discrimination of IDA and βTT.
In addition, the multidimensional scaling method and cluster analysis can be considered as the most appropriate
techniques to determine the discrimination indices with similar performance for future hematological studies.
Keywords: Diagnosis, Classification tree algorithms, Hematological discrimination indices, Iron deficiency anemia
(IDA), β‐thalassemia trait (βTT), CRUISE tree algorithm, C5.0 tree algorithm
Background
*Correspondence: [email protected] Microcytic anemia is the most common form of anemia,
2
Department of Biostatistics, Faculty of Medical Sciences, Tarbiat Modares as a predominant hematologic disorder. IDA and βTT are
University, Tehran, Iran
Full list of author information is available at the end of the article
the two common types of microcytic anemia disorders
© The Author(s) 2021. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line
to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this
licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativeco
mmons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Rahim et al. BMC Medical Informatics and Decision Making (2021) 21:313 Page 2 of 13
[1, 2]. The discrimination between IDA and βTT is a vital relationships. These methods are invariant to monotone
issue in hematology studies [3, 4]. IDA is a prevalent dis- transformations of predictor variables, and are robust to
order worldwide, and βTT is, in turn, predominant in the outliers, missing values, and also multicollinearity. These
Mediterranean region [5–10]. algorithms can identify the cutoff points of important
The discrimination between these two hematologic predictors to discriminate the patients. In addition, tree
disorders is necessary to prevent iron overload and its algorithms are easy to interpret as they display results
complications caused by misdiagnosis and inaccurate graphically, making the results understandable without
treatment so as to determine the prenatal causes for requiring statistical experience. These methods can also
hemoglobin chain disorders. However, the differential assist the clinician in decision making [57–62].
diagnosis of IDA from βTT is a major challenge given CART (Classification and Regression Tree) algorithm
that they provide similar experimental conditions [3, 11, is the best-known classic tree algorithm [63], though it
12]. suffers from some problems like greediness and bias in
In addition to complete blood count (CBC), differ- split rule selection. Tree generating in CART is based
ent tests have been already conducted to differenti- on the greedy search algorithm, and this search can-
ate between IDA and βTT precisely; however, they are not find a global optimum [64]. The splitting method in
time-consuming and expensive. The definitive diagnostic CART is biased toward independent variables with more
methods for the IDA and βTT are respectively based on distinct values [65, 66]. Several tree algorithms are pro-
the increase in HbA2 (Hemoglobin A2), the increase in posed to solve the problems of the CART algorithm. In
TIBC (total iron binding capacity), and also the decrease turn, Evtree algorithm (Evolutionary learning of globally
in serum iron and serum ferritin [4, 11, 13–16]. optimal classification and regression trees) [64] has been
Due to the importance of discriminating between these proposed to solve the greediness problem. Tree algo-
types of anemia, various studies have been conducted rithms like Quick, Unbiased and Efficient Statistical Tree
since 1973 to identify appropriate, rapid, and low-cost (QUEST) [67], Classification Rule with Unbiased Inter-
differential indices for discriminating between IDA and action Selection and Estimation (CRUISE) [68], Gener-
βTT [17–41]. The existing gaps in the literature about alized, Unbiased, Interaction Detection and Estimation
hematological indices showed that each hematological (GUIDE) [69], Conditional Inference Trees (Ctree) [70],
index only includes one or some specific blood parame- and Logistic Tree with Unbiased Selection (LOTUS) [62]
ters. In addition, some indices like Nishad [33] and Matos are, in turn, suggested to solve the bias in split rule selec-
and Carvalho [41] are suggested based on the paramet- tion problem.
ric statistical model like the discriminant analysis. How- This study aimed to compare the diagnostic perfor-
ever, this parametric model needs different assumptions mance of the CART algorithm and remedial tree algo-
(multivariate normality and equality of covariance matri- rithms for solving the disadvantages of this algorithm
ces) and violation of these assumptions affects the results and SVM with hematological discrimination indices to
[42]. discriminate between IDA and βTT by using accuracy
Recently, the accessibility of powerful statistical measures such as true positive rate (TPR or sensitiv-
software programs has paved the way for the applica- ity), true negative rate (TNR or specificity), false positive
tion of advanced statistical models such as data min- rate (FPR), false negative rate (FNR), accuracy, Youden’s
ing techniques in the differential diagnosis of IDA from index, positive predictive value (PPV), negative predic-
βTT. However, few studies have already employed such tive value (NPV), positive likelihood ratio (PLR), nega-
advanced statistical methods and data mining techniques tive likelihood ratio (PLR), diagnostic odds ratio (DOR),
for differential diagnosis of hematological data [40, 43– F-measure, and area under the curve (AUC).
52]. Therefore, this study was intended to compare tree Besides, the multidimensional scaling and cluster anal-
algorithms as powerful machine-learning methods and ysis were applied to extract homogeneous subgroups of
support vector machines (SVM) with hematological indi- hematological discriminating indices and classification
ces in differentiation between IDA and βTT. Tree-based tree algorithms with a similar performance according to
methods can determine homogeneous subgroups of the accuracy measures used.
patients needing different treatment strategies or diag-
nostic tests, making these methods useful for subgroup Methods
analysis [53–56]. Sample and disease type
The tree-based methods include nonparametric meth- This study included 1178 patients with hypochromic
ods and need no assumptions about the functional form microcytic anemia from Boghrat clinical center in Teh-
of the data. Besides, they deal with the high-dimen- ran, Iran. CBC analysis of EDTA-K2 anti-coagulated
sional dataset, high-order interactions, and nonlinear blood samples was performed using Sysmex kx-21
Rahim et al. BMC Medical Informatics and Decision Making (2021) 21:313 Page 3 of 13
automated hematology analyzer to measure hematologi- discrimination method with sensitivity, specificity, PPV,
cal parameters such as Hb (Hemoglobin), HCT (hema- NPV, Youden’s index, accuracy, F-measure and AUC near
tocrit), MCV (Mean Corpuscular Volume), MCH (Mean to 1 provided better performance. Likewise, the discrimi-
Corpuscular Hemoglobin), MCHC (Mean Corpuscu- nation method with PLR > 10, NLR < 0.1 and high DOR
lar Hemoglobin Concentration), RBC (Red Blood Cell caused a good performance for discriminating between
Count) and RDW (Red Blood Cell Distribution Width). IDA from βTT [76, 77]. Receiver operating characteris-
In addition, HbA2, TIBC, serum iron and serum ferritin tic (ROC) curve analysis was used to compute the AUC,
were measured for all patients. and compare the value of AUC of discrimination meth-
ods [78].
Inclusion criteria
Patients with hypochromic microcytic anemia (MCV < 80 Multidimensional scaling
fL, MCH < 27 pg), Hb < 12 g.dl for women and Hb < 13 g. Multidimensional scaling method was used to create a
dl for men were included in the study. Among them, 708 map based on the Euclidean distance for showing similar-
patients were diagnosed as βTT with HbA2 > 3.5%, and ity or dissimilarity between observations. This map can
470 patients were diagnosed as IDA with serum ferri- be in one dimension, two dimensions, and three dimen-
tin < 15 ng/ml according to the World Health Organiza- sions or in higher dimensions. Smaller distance among
tion [WHO] [71, 72]. two observations indicates more similar and vice versa.
This used a map in two dimensions for showing similar-
Exclusion criteria ity/dissimilarity among pairs of discrimination methods
Patients with simultaneous presentation of both diseases, through accuracy measures such as sensitivity, specificity,
severe anemia (Hb < 8 g.dl), anemia due to chronic dis- PPV, NPV, Youden’s Index, accuracy, PLR, NLR, F-meas-
ease, infectious disease, chronic inflammation, pregnancy ure, and AUC [79].
or other hemoglobinopathies were excluded.
Cluster analysis
Statistical analysis Cluster analysis is a method for extracting homogene-
Descriptive statistics and univariate analysis ous subgroups of observations. Different algorithms are
Descriptive statistics (mean, standard deviation), median proposed for cluster analysis. This study used a complete-
and interquartile range) were evaluated for different linkage hierarchical algorithm to determine homoge-
blood parameters. Normality of data was assessed using neous subgroups of methods with a similar diagnostic
Shapiro–wilk test. Mann–Whitney U test was also used performance using accuracy measures. The optimal num-
to compare the differences between the hematological ber of methods with a similar diagnostic performance
parameters of both groups (IDA and βTT). P < 0.05 was was selected using 30 appropriate measures. Finally, the
considered to be statistically significant. optimal number was selected based on the majority role
[80].
Hematological discriminating indices for discriminating
between IDA and βTT Software programs and checklists
Hematological indices for discrimination between IDA Data analysis was done using software R 4.0.0. Pack-
and βTT were computed for each patient according to age epiR and package pROC were used to compute the
their formula and cut off. These indices with their for- accuracy measures and ROC curve analysis, respectively.
mula are shown in Additional file 1: Table S1. Classification tree algorithms like CART, J48, Ctree,
Evtree, and C5.0 were fitted using packages rpart, Rweka,
Classification algorithms party, evtree, and C50, respectively. Software for tree
Classification tree algorithms (CART [63], QUEST [67], algorithms like QUEST, CRUISE, GUIDE, and LOTUS
CRUISE [68], J48 [73], GUIDE [69], Ctree [70], Evtree was obtained from http://pages.stat.wisc.edu/~loh/resea
[64], C5.0 [74], and LOTUS [62]) and SVM [75] were rch.html. SVM algorithm and multidimensional scal-
used to discriminate IDA from βTT. ing method were fitted using package MASS and pack-
age e1071, respectively. The cluster optimal number, or
Accuracy measures homogeneous groups of diagnostic discrimination meth-
Diagnostic performance of discrimination indices was ods with a similar diagnostic performances was deter-
compared with classifications tree algorithms using accu- mined using the package of NbClust. This study was also
racy measures such as sensitivity, specificity, FPR, FNR, conducted based on the Strengthening the Reporting of
PPV, NPV, Youden’s index (sensitivity + specificity – 1), Observational Studies in Epidemiology (STROBE) State-
accuracy, PLR, NLR, DOR, F-measure and AUC. The ment: guidelines for reporting observational studies and
Rahim et al. BMC Medical Informatics and Decision Making (2021) 21:313 Page 4 of 13
the Standards for Reporting Studies of Diagnostic Accu- importance (%) for each predictor variable. These
racy (STARD). These checklists can be obtained from algorithms indicated similar ranking of hematologi-
www.equator-network.org. cal parameters importance. In this study, the normal-
ized importance of variables was reported based on the
Results classification tree algorithms with the best diagnos-
This study included 1178 patients with hypochromic tic performance (CRUISE and C5.0 algorithms). This
microcytic anemia (708 patients with βTT and 470 algorithm showed that MCV and HCT variables had
patients with IDA) to compare the diagnostic perfor- the highest and lowest importance for discrimination
mance of hematological discrimination indices with between IDA and βTT, respectively (Additional file 1:
classification tree algorithms and SVM, so as to dis- Table S2).
criminate IDA from βTT. Data balance was, in turn, Figures 1 and 2 indicated that all predictor variables
assessed using Shannon entropy [81, 82]. Additional except HCT and RDW can be used to split the nodes of
file 1: Table S2 indicated the descriptive statistics of tree. First variable splitting of tree-based methods except
hematological parameters across the type of hypochro- tree algorithms such as Evtree, Ctree, and LOTUS were
mic microcytic anemia (IDA and βTT). According to based on the MCV with similar rule splitting. GUIDE
this table, all variables indicated significant difference and CART algorithms showed the same tree structure.
among the groups (P < 0.001). CRUISE, C5.0, CART, Additional file 1: Table S3 displays the values of accu-
and GUIDE algorithms can calculate the normalized racy measures such as sensitivity, specificity, FPR, FNR,
Fig. 1 Tree structure of classification tree algorithms such as J48, CART, GUIDE, QUEST, and CRUISE (red: βeta thalassemia trait and green: iron
deficiency anemia)
Rahim et al. BMC Medical Informatics and Decision Making (2021) 21:313 Page 5 of 13
Fig. 2 Tree structure of classification tree algorithms such as Evtree, Ctree, LOTUS, and C5.0 (red: βeta thalassemia trait and green: iron deficiency
anemia)
PPV and NPV for each discrimination method (Addi- The values of accuracy measures such as Youden’s
tional file 1: Table S3). index, accuracy, PLR, NLR, and DOR for each discrimi-
Additional file 1: Table S3 indicated that none of the nation method are shown in Table 1. According to this
discrimination methods were fully specific for discrimi- table, the highest Youden’s index/accuracy belonged to
nation between IDA and βTT. This table showed that the CRUISE and C5.0 tree algorithms, while the lowest
Janel index and CRUISE tree algorithm had the low- Youden’s index/accuracy was for the MCHC index. Also,
est FPR (while the highest TNR and PPV). In turn, the highest DOR/F-measure belonged to the CRUISE
the lowest TNR belonged to the Telmissani–MCHD and C5.0 tree algorithms, whereas the Roth index and
index, while the lowest PPV was related to the Bess- Bessman (RDW) index had the lowest DOR/F-measure.
man (RDW) index. Shine and Lal index and Roth index Table 1 indicated that only CRUISE tree algorithm had
showed perfect TPR (100%) and NPV (100%) as com- PLR > 10 and discrimination methods with NLR < 0.1
pared to other discrimination methods. Also, these were all tree algorithms except C5.0 tree algorithm and
indices showed the lowest FNR and the highest FPR. indices such as Shine and Lal, Bordbar, Sehgal, and Ker-
The lowest TPR (the highest FNR) was related to the man I.
Bessman (RDW) index, while the lowest NPV belonged The value of discrimination method AUC for dis-
to the Pornprasert (MCHC) index. All tree classifica- crimination between IDA and βTT was shown in
tion algorithms and SVM showed good performance Table 2. The ROC analysis showed that CRUISE and
for discriminating between IDA and βTT based on C5.0 tree algorithms had the highest AUC. According
the accuracy measures like TPR, TNR, PPV and NPV to the AUC, CRUISE and C5.0 tree algorithms indi-
in comparison to other hematological discrimination cated excellent diagnostic accuracy, whereas MCHC
methods (Additional file 1: Table S3). index could not be useful for discrimination between
Rahim et al. BMC Medical Informatics and Decision Making (2021) 21:313 Page 6 of 13
Table 1 Youden’s index, accuracy, positive likelihood ratio (PLR), negative likelihood ratio (NLR) and diagnostic odds ratio (DOR) of
each hematological index and classification tree algorithm for differentiation between iron deficiency anemia (IDA) and β‐thalassemia
trait (βTT) with their 95% confidence interval
Discriminant method Youden’s Index (%) Accuracy (%) PLR NLR DOR
CART/GUIDE 81.58 (76.26–86.05) 91.51 (89.77–93.04) 7.39 (5.83–9.37) 0.07 (0.05–0.09) 114.12 (75.09–173.43)
J48 85.12 (80.27–89.07) 93.38 (91.81–94.73) 8.41 (6.54–10.81) 0.04 (0.03–0.06) 219.56 (133.69–360.55)
QUEST 79.96 (74.43–84.65) 90.49 (88.67–92.11) 7.37 (5.79–9.36) 0.09 (0.07–0.11) 86.09 (58.24–127.27)
CRUISE 88.03 (83.52–91.59) 94.57 (93.12–95.79) 11.09 (8.28–14.86) 0.04 (0.02–0.05) 311.63 (184.41–526.62)
Ctree 81.23 (75.85–85.75) 91.26 (89.49–92.81) 7.47 (5.88–9.49) 0.07 (0.05–0.09) 105.13 (69.81–158.29)
Evtree 83.49 (78.44–87.66) 92.61 (90.97–94.04) 7.65 (6.02–9.72) 0.05 (0.03–0.07) 169.18 (106.14–269.64)
C50 87.81 (83.34–91.34) 94.65 (93.21–95.87) 9.97 (7.58–13.12) 0.03 (0.02–0.04) 374.66 (212.03–662.03)
LOTUS 39.46 (31.61–46.93) 70.97 (68.28–73.55) 2.085 (1.84–2.37) 0.38 (0.33–0.44) 5.49 (4.26–7.085)
SVM 67.93 (61.37–73.78) 84.38 (82.18–86.41) 4.76 (3.92–5.78) 0.17 (0.14–0.21) 27.86 (20.30–38.24)
England and Fraser (E&F) 48.82 (41.84–55.22) 71.65 (68.98–74.21) 5.097 (3.96–6.56) 0.44 (0.40–0.49) 11.44 (8.33–15.70)
RBC 54.89 (47.68–55.22) 79.03 (76.59–81.32) 2.80 (2.44–3.22) 0.21 (0.18–0.26) 13.28 (9.98–17.68)
Mentzer 71.25 (64.95–76.81) 86.33 (84.24–88.24) 4.99 (4.10–6.06) 0.13 (0.11–0.16) 37.66 (26.96–52.60)
Srivastava 58.83 (51.84–65.18) 78.44 (75.98–80.76) 4.74 (3.83–5.86) 0.30 (0.26–0.34) 15.70 (11.62–21.20)
Shine and Lal (S&L) 15.32 (11.66–18.90) 66.21 (63.43–68.91) 1.18 (1.14–1.23) 0 ∞
Bessman (RDW) − 15.83 (− 21.04 to − 10.61) 34.38 (31.67–37.17) 0.20 (0.13–0.30) 1.20 (1.14–1.26) 0.17 (0.11–0.26)
Ricerca 3.70 (0.04–7.52) 60.95 (58.09–63.75) 1.04 (1.01–1.07) 0.46 (0.27–0.78) 2.28 (1.31–3.97)
Green and King (G&K) 62.21 (55.29–68.47) 81.15 (78.80–83.35) 4.25 (3.52–5.13) 0.23 (0.20–0.27) 18.42 (13.68–24.81)
Das Gupta 32.87 (26.06–39.44) 71.48 (68.80–74.04) 1.56 (1.44–1.69) 0.21 (0.16–0.27) 7.52 (5.46–10.36)
Jayabose (RDWI) 57.28 (50.30–63.70) 80.64 (78.27–82.86) 2.83 (2.47–3.25) 0.17 (0.13–0.21) 17.01 (12.57–23.02)
Telmissani—MCHD 2.78 (− 0.68 to 6.40) 60.61 (57.76–63.41) 1.03 (1–1.06) 0.52 (0.30–0.90) 1.99 (1.11–3.57)
Telmissani—MDHL 40.70 (33.65–47.24) 66.81 (64.04–69.50) 4.36 (3.38–5.61) 0.54 (0.49–0.58) 8.11 (5.93–11.10)
Huber —Herklotz 6.02 (− 15.26–11.98) 46.10 (43.22–48.99) 1.47 (1.11–1.95) 0.93 (0.89–0.98) 1.58 (1.14–2.20)
Kerman I 60.66 (54.29–66.44) 83.28 (81.02–85.36) 2.77 (2.44–3.14) 0.08 (0.06–0.11) 35.83 (24.36–52.68)
Kerman II 72.96 (66.78–78.37) 86.93 (84.87–88.80) 5.63 (4.56–6.96) 0.13 (0.11–0.16) 42.01 (29.89–59.03)
Sirdah 70.86 (64.73–76.21) 84.38 (82.18–86.41) 8.57 (6.45–11.38) 0.22 (0.19–0.25) 39.28 (27.37–56.37)
Ehsani 73.38 (67.24–78.75) 87.18 (85.14–89.04) 5.67 (4.58–6.99) 0.13 (0.10–0.16) 43.85 (31.12–61.79)
Bordbar 55.05 (49.02–60.56) 81.58 (79.25–83.75) 2.29 (2.06–2.55) 0.04 (0.03–0.07) 54.87 (32.80–91.82)
Matos and Carvalho 57.27 (50.15–63.77) 77.93 (75.45–80.27) 4.20 (3.45–5.13) 0.30 (0.27–0.35) 13.89 (10.38–18.58)
Janel (11 T) 67.62 (61.41–73.08) 82.26 (79.95–84.40) 8.95 (6.63–12.07) 0.26 (0.23–0.30) 34.29 (23.75–49.50)
CRUISE Index 41.87 (34.09–49.23) 72.24 (69.59–74.78) 2.18 (1.92–2.48) 0.35 (0.30–0.41) 6.21 (4.80–8.05)
Index26 71.07 (64.87–76.50) 84.81 (82.63–86.81) 7.55 (5.81–9.81) 0.20 (0.17–0.24) 37.23 (26.29–52.72)
Hisham 51.70 (44.32–58.58) 77.25 (74.75–79.62) 2.66 (2.32–3.06) 0.25 (0.21–0.29) 10.66 (8.09–14.05)
Hameed 11.68 (5.73–17.35) 48.81 (45.92–51.71) 2.25 (1.64–3.08) 0.87 (0.83–0.91) 2.58 (1.80–3.69)
Ravanbakhsh-F1 54.11 (46.87–60.80) 78.69 (76.24–80.99) 2.74 (2.39–3.15) 0.22 (0.18–0.26) 12.74 (9.59–16.94)
Ravanbakhsh-F2 32.29 (24.46–39.83) 68.68 (65.94–71.32) 1.69 (1.53–1.88) 0.39 (0.34–0.47) 4.26 (3.30–5.50)
Ravanbakhsh-F3 50.98 (43.74–57.73) 77.76 (75.27–80.10) 2.43 (2.14–2.75) 0.21 (0.17–0.25) 11.74 (8.81–15.65)
Ravanbakhsh-F4 46.34 (39.79–52.48) 77.50 (75.01–79.86) 1.96 (1.78–2.16) 0.10 (0.08–0.14) 18.87 (12.99–27.42)
Zaghloul1 4.35 (− 3.32 to 11.86) 47.96 (45.08–50.86) 1.16 (0.97–1.39) 0.94 (0.87–1.01) 1.23 (0.95–1.59)
Zaghloul2 3.27 (− 4.43 to 10.85) 47.54 (44.65–50.44) 1.12 (0.93–1.34) 0.96 (0.89–1.03) 1.17 (0.91–1.51)
Kandhrol1 − 4.91 (− 1.31 to 3.40) 48.89 (46.01–51.79) 0.92 (0.83–1.01) 1.12 (0.98–1.28) 0.82 (0.65–1.04)
Kandhrol2 30.29 (22.67–37.66) 68.59 (65.85–71.24) 1.58 (1.44–1.74) 0.37 (0.31–0.45) 4.28 (3.29–5.57)
Alparslan 38.71 (31.29–45.79) 72.67 (70.02–75.19) 1.82 (1.65–2.02) 0.27 (0.22–0.33) 6.77 (5.13–8.94)
Merdin1 58.60 (51.48–65.09) 79.20 (76.77–81.49) 3.89 (3.25–4.69) 0.27 (0.23–0.31) 14.68 (11.01–19.59)
Merdin2 46.40 (39.10–53.16) 70.97 (68.28–73.55) 3.95 (3.18–4.90) 44.93 (40.56–49.76) 8.79 (6.57–11.75)
Roth 14.89 (11.28–18.44) 66.04 (63.26–68.75) 1.18 (1.13–1.22) 0 ∞
Sargolzaie 29.79 (22.21–36.99) 61.63 (58.78–64.42) 2.57 (2.10–3.15) 0.63 (0.58–0.69) 4.07 (3.09–5.35)
Keikhaei 59.29 (52.21–65.76) 80.31 (77.92–82.54) 3.51 (2.97–4.14) 0.22 (0.19–0.26) 15.69 (11.75–20.95)
Nishad 63.96 (57.17–70.09) 82.94 (80.66–85.04) 3.81 (3.22–4.51) 0.17 (0.14–0.21) 22.16 (16.32–30.09)
Rahim et al. BMC Medical Informatics and Decision Making (2021) 21:313 Page 7 of 13
Table 1 (continued)
Discriminant method Youden’s Index (%) Accuracy (%) PLR NLR DOR
Wongprachum 55.33 (48.04–62.05) 78.35 (75.89–80.67) 3.15 (2.69–3.69) 0.26 (0.22–0.30) 12.36 (9.34–16.34)
Sehgal 64.70 (58.66–70.10) 85.23 (83.07–87.21) 3.027 (2.65–3.46) 0.05 (0.03–0.07) 60.80 (38.73–95.44)
Pornprasert (MCHC) − 32.50 (− 40 to − 24.65) 31.32 (28.68–34.06) 0.40 (0.34–0.47) 1.71 (1.54–1.90) 0.23 (0.18–0.30)
Sirachainan 9.45 (2.23–16.46) 49.58 (46.68–52.47) 1.48 (1.19–1.83) 0.88 (0.83–0.94) 1.68 (1.27–2.21)
IDA and βTT. Table 2 indicated that AUC of all indi- Discussion
ces except indices such as Ricerca, Telmissani–MCHD, The two common types of microcytic anemia disorders
Huber–Herklotz, Zaghloul1, Zaghloul2 and Kandhrol1 are IDA and βTT which have similar clinical and experi-
were significantly more than 0.5, and AUC of discrimi- mental conditions [3, 11, 12]. The discrimination between
nation indices such as RDW and MCHC were signifi- these two disorders is clinically important needing time-
cantly less than 0.5 (P < 0.001). consuming and expensive tests like HbA2, serum iron,
The comparison between AUC values of classifica- serum ferritin and TIBC [4, 11, 13–16]. Several hemato-
tion tree algorithms and hematological discrimination logical indices are proposed for rapid and low-cost dis-
index with the best diagnostic performance among crimination between IDA and βTT which are not fully
hematological indices (Ehsani index) showed that there sensitive and specific for differential diagnose [17–41].
was a statistically significant difference between AUC This study used classification tree algorithms to dis-
values of tree algorithms with Ehsani index (P < 0.05). criminate between IDA and βTT. These are efficient and
In this regard, classification tree algorithms had sig- low-cost detection methods to extract homogeneous
nificantly higher AUC than the mentioned hematologi- subgroups of patients [53–56]. Thus, the diagnostic per-
cal discrimination index. Also, CRUISE and C5.0 tree formance of hematological indices was compared with
algorithms had significantly higher AUC than other tree-based methods to differentiate IDA and βTT using
classification tree algorithms, but there was no sig- various accuracy measures.
nificant difference between AUC values of Ctree and Additionally, multidimensional scaling was used to
CART algorithms (P > 0.05). extract homogeneous subgroups of methods with a simi-
Overall, the results showed that CRUISE and C5.0 lar performance based on the mentioned criteria.
tree algorithms had a better performance for discrimi- The findings showed that none of the mentioned dis-
nation between IDA and βTT in comparison to all crimination methods are fully sensitive and specific in
indices and other classification tree methods. CRUISE discrimination between IDA and βTT. Also, tree-based
tree algorithm extracted six homogenous subgroups methods exhibited high performance for differential
of patients (Fig. 1). According to the tree structure diagnosis in comparison with the other hematological
of CRUISE tree algorithm, it can be concluded that indices. CRUISE tree algorithm indicated better perfor-
patients with MCV > 67.65 or 67.65 < MCV ≤ 71.25 and mance than other discrimination methods based on the
Hb ≤ 11.15 or MCV ≤ 67.65 and Hb ≤ 8.85 and MCHC amount of accuracy measures such as Youden’s index,
≤ 30.32 were classified as βTT. Also, patients with accuracy, PLR, NLR, DOR, F-measure and AUC. These
67.65 < MCV ≤ 71.25 and Hb > 11.15 or MCV ≤ 67.65 criteria included both sensitivity and specificity and
and Hb > 8.85 or MCV ≤ 67.65 and MCHC > 30.32 were indicated the diagnostic performance of discrimination
classified as IDA. method more accurately than other criteria. So, this algo-
In addition, multidimensional scaling method extracted rithm can help physicians make better clinical decision.
three subgroups of methods. The diagram of this analysis Although sensitivity of hematological discrimination
is shown in Fig. 3. One group included hematological dis- methods such as Ricerca, Telmissani—MCHD, Bordbar,
crimination indices such as Pornprasert, RDW, Kandh- Roth, and Shine and Lal (S&L) was higher than that of
rol1, Huber–Herklotz, Sirachainan, Hameed, Zaghloul1, CRUISE tree algorithm, these hematological indices had
and Zaghloul2, while the other group included Shine and a high false positive rate as compared to the CRUISE tree
Lal, Roth, Ricerca, and Telmissani–MCHD. The third algorithm. Moreover, with respect to the other measure-
group in turn included classification tree algorithms, ments, these indices had poor performance in discrimi-
SVM, and some of hematological discrimination indices. nating between IDA and βTT.
Cluster analysis like multidimensional scaling method Consistent with the findings of this study, other stud-
extracted three homogenous groups of discrimination ies demonstrated that Ehsani index had good perfor-
methods. The diagram of this analysis is shown in Fig. 4. mance in discrimination between these two disorders in
Rahim et al. BMC Medical Informatics and Decision Making (2021) 21:313 Page 8 of 13
Table 2 F-measure and AUC of each hematological index and classification tree algorithm for differentiation between iron deficiency
anemia (IDA) and β‐thalassemia trait (βTT) with their 95% confidence interval
Discriminant method F-measure (%) AUC Standard Error 95% CI P–value
Table 2 (continued)
Discriminant method F-measure (%) AUC Standard Error 95% CI P–value
classification algorithms like J48 decision tree, support proposed to determine the hematological discrimination
vector machines (SVM), k-nearest neighbours (K-NN), indices with similar performance for future hematologi-
multilayer perceptron (MLP) and naϊve Bayes (NB) to cal studies.
discriminate between patients with IDA and βTT or
both [50]. In another study, Setsirichok evaluated the
classification of blood characteristics by a C4.5 decision Application in practice for medical studies
tree, a NB classifier and a MLP for classifying eight- In medical diagnostic processes, decision making
een classes of thalassemia abnormality [43]. Likewise, with high diagnostic performance is very important.
Jahangiri et al. (2017) used classification tree algorithms Tree-based methods can be considered as appropriate
for constructing differential scheme and investigating methods for decision making, because they generate dif-
the performance of several tree algorithms for the dif- ferential diagnosis with high accuracy measures (sensitiv-
ferential diagnosis of IDA from βTT. In agreement with ity, specificity, PPV, NPV, PLR, NLR, DOR, accuracy, and
this study, Jahangiri et al. (2017) reported that CRUISE AUC) in comparison to the discrimination indices. In
tree algorithm had the highest AUC, and MCV was an addition, tree algorithms display results graphically, mak-
important predictor variable in the discrimination of ing the results understandable with no statistical exper-
observations into IDA and βTT, and the first split of tise. These algorithms can be thus useful for diagnostic
all algorithms was based on of MCV [47]. Moreover, classification scheme of patients in medical studies. This
Chakraborty et al. (2017) utilized Ada-boost algorithm study thus considered the discrimination between IDA
to generate multiple decision trees by using C4.5 deci- and βTT to prevent iron overload and its complications
sion tree for classification of erythrocytes or anemia caused by misdiagnosis and inaccurate treatment, and
detection. Their proposed approach showed accuracy, also to determine the prenatal causes for hemoglobin
specificity and sensitivity of 97.81%, 99.7% and 97.33% chain disorders.
respectively in detecting abnormal erythrocytes [51].
Comparing the diagnostic performance of several algo-
rithms such as J48, K-NN, artificial neural networks and Conclusions
NB for identifying β-thalassemia carriers, AlAgha con- Given its diagnostic performance, CRUISE and C5.0 tree
cluded that naϊve Bayes had the superior performance algorithms are considered as an appropriate method for
to differentiate between normal and β-thalassemia car- differential diagnosis of patients in comparison to other
riers [52]. methods. Moreover, tree-based methods are useful along
Overall, the CRUISE and C5.0 tree algorithms with with other parameters for discriminating between IDA
the best performance in this study showed better perfor- and βTT. In conclusion, considering the advantages of
mance in comparison with tree algorithms in the previ- tree algorithms, they can help physicians make better
ous studies [43, 87]. clinical decisions. The results showed that multidimen-
Using advanced methods such as tree-based methods sional scaling method and cluster analysis are appropri-
for discriminating between IDA and βTT in addition to ate techniques to determine the discrimination indices
the differential indices can be a good idea for discrimi- with similar performance for future studies. In addition,
nating between these two hematologic disorders. Though the tree-based methods were identified as good methods
each index only includes one or specific blood parame- for extracting homogeneous subgroups of observations
ters, machine learning methods can consider the effects in medical studies.
of all blood parameters simultaneously for data predic-
tion and exploratory modeling. Besides, using decision Supplementary Information
trees for discrimination between IDA and βTT can avoid The online version contains supplementary material available at https://doi.
expensive, time‐consuming, and complicated laboratory org/10.1186/s12911-021-01678-5.
procedures leading to non-satisfactory hematological
indices in discriminating between these two hematologic Additional file 1. Table S1. Discrimination indices for differentiation
between iron deficiency anemia (IDA) and β-thalassemia trait (βTT).
disorders.
Table S2. Descriptive statistics of blood parameters of the study groups
The application of methods like multidimensional and normalized importance (%) of hematological parameters based on
scaling and cluster analysis are deemed to be useful to the CRUISE tree algorithm (SD: standard deviation and IQR: interquartile
range). Table S3. Sensitivity (TPR), specificity (TNR), false positive rate
determine different classification methods with similar
(FPR), false negative rate (FNR), positive predictive values (PPV) and nega-
diagnostic functions. In previous hematological studies, tive predictive values (NPV) of each hematological index and classification
such indices were compared subjectively based on the tree algorithm for differentiation between iron deficiency anemia (IDA)
and β-thalassemia trait (βTT) with their 95% confidence interval.
accuracy measures. Therefore, the application of mul-
tidimensional scaling method and cluster analysis are
Rahim et al. BMC Medical Informatics and Decision Making (2021) 21:313 Page 11 of 13
Acknowledgements 10. Camaschella C. New insights into iron deficiency and iron deficiency
Research reported in this publication was supported by Elite Researcher Grant anemia. Blood Rev. 2017;31(4):225–33.
Committee under award number [987862] from the National Institute for 11. Lafferty JD, Crowther MA, Ali MA, Levine M. The evaluation of various
Medical Research Development (NIMAD), Tehran, Iran. mathematical RBC indices and their efficacy in discriminating between
thalassemic and non-thalassemic microcytosis. Am J Clin Pathol.
Authors’ contributions 1996;106(2):201–5.
Conceptualization, project administration and supervision: AK, Data curation: 12. Bessman JD, Gilmer PR, Gardner FH. Improved classification of anemias
FR, Formal analysis and Methodology: MJ and KG, Software: MJ and ASM, by MCV and RDW. Am J Clin Pathol. 1983;80(3):322–6.
Writing- original draft: AK and MJ. Writing-review and editing: All authors. All 13. Thomas C, Thomas L. Biochemical markers and hematologic indi-
authors read and approved the final manuscript. ces in the diagnosis of functional iron deficiency. Clin Chem.
2002;48(7):1066–76.
Funding 14. Goddard AF, James MW, McIntyre AS, Scott BB. Guidelines for the man-
Research reported in this study was supported by Elite Researcher Grant agement of iron deficiency anaemia. Gut. 2011;60:1309–16.
Committee from the National Institutes for Medical Research Development 15. Mosca A, Paleari R, Ivaldi G, Galanello R, Giordano P. The role of haemoglo-
(NIMAD), Tehran, Iran. The funding source had no role in the study design, bin A2 testing in the diagnosis of thalassaemias and related haemoglobi-
data collection, analysis or interpretation of data, manuscript preparation or nopathies. J Clin Pathol. 2009;62(1):13–7.
decision for submission. 16. Demir A, Yarali N, Fisgin T, Duru F, Kara A. Most reliable indices in differen-
tiation between thalassemia trait and iron deficiency anemia. Pediatr Int.
Availability of data and materials 2002;44(6):612–6.
The datasets used and/or analyzed during the study are available from the 17. England J, Fraser P. Differentiation of iron deficiency from thalassaemia
corresponding author on reasonable request. trait by routine blood-count. Lancet. 1973;301(7801):449–52.
18. Klee GG, Fairbanks VF, Pierre RV, O’sullivan MB. Routine erythrocyte meas-
urements in diagnosis of iron-deficiency anemia and thalassemia minor.
Declarations Am J Clin Pathol. 1976;66(5):870–7.
19. Mentzer W. Differentiation of iron deficiency from thalassaemia trait.
Ethics approval and consent to participate Lancet. 1973;301(7808):882.
This study was approved by the ethical code IR.NIMAD.REC.1398.389 from the 20. Srivastava P, Bevington J. Iron deficiency and/or Thalassaemia trait. Lan-
National Institute for Medical Research Development, Tehran, Iran. A written cet. 1973;301(7807):832.
informed consent was obtained before the enrollment. All methods were 21. Shine I, Lal S. A strategy to detect β-thalassaemia minor. Lancet.
performed in accordance with the relevant guidelines and the institutional 1977;309(8013):692–4.
regulations. 22. Bessman JD, Feinstein D. Quantitative anisocytosis as a discri-
minant between iron deficiency and thalassemia minor. Blood.
Consent for publication 1979;53(2):288–93.
Not applicable. 23. Ricerca B, Storti S, d’Onofrio G, Mancini S, Vittori M, Campisi S, et al. Dif-
ferentiation of iron deficiency from thalassaemia trait: a new approach.
Competing interests Haematologica. 1986;72(5):409–13.
The authors declare that they have no competing interests. 24. Green R, King R. A new red cell discriminant incorporating volume disper-
sion for differentiating iron deficiency anemia from thalassemia minor.
Author details Blood Cells. 1989;15(3):481–95.
1
Research Center of Thalassemia and Hemoglobinopathy, Health Research 25. Gupta AD, Hegde C, Mistri R. Red cell distribution width as a measure of
Institute, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran. severity of iron deficiency in iron deficiency anemia. Indian J Med Res.
2
Department of Biostatistics, Faculty of Medical Sciences, Tarbiat Modares 1994;100:177–83.
University, Tehran, Iran. 3 Department of Biostatistics and Epidemiology, Faculty 26. Jayabose S, Giamelli J, LevondogluTugal O, Sandoval C, Ozkaynak F,
of Health, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran. Visintainer P. # 262 Differentiating iron deficiency anemia from thalas-
semia minor by using an RDW-based index. J Pediatr Hematol Oncol.
Received: 3 June 2021 Accepted: 3 November 2021 1999;21(4):314.
27. Telmissani OA, Khalil S, Roberts GT. Mean density of hemoglobin per liter
of blood: a new hematologic parameter with an inherent discriminant
function. Lab Hematol. 1999;5:149–52.
28. Huber AR, Ottiger C, Risch L, Regenass S, Hergersberg M, Herklotz R,
References editors. Thalassemie-syndrome: klinik und diagnose. Schweiz Med Forum;
1. Kara B, Çal S, Aydogan A, Sarper N. The prevalence of anemia in adoles- 2004.
cents: a study from Turkey. J Pediatr Hematol Oncol. 2006;28(5):316–21. 29. Kohan N, Ramzi M. Evaluation of sensitivity and specificity of Kerman
2. Brittenham G. Disorders of iron metabolism: iron deficiency and overload. index I and II in screening beta thalassemia minor. 2008.
Hematol Basic Principles Pract. 2000. 30. Sirdah M, Tarazi I, Al Najjar E, Al HR. Evaluation of the diagnostic reli-
3. Hallberg L. Iron requirements. Biol Trace Elem Res. 1992;35(1):25–45. ability of different RBC indices and formulas in the differentiation of the
4. Oliveri N. The beta-thalassemias. N Engl J Med. 1999;341(2):99–109. β-thalassaemia minor from iron deficiency in Palestinian population. Int J
5. Rathod DA, Kaur A, Patel V, Patel K, Kabrawala R, Patel V, et al. Useful- Lab Hematol. 2008;30(4):324–30.
ness of cell counter-based parameters and formulas in detection 31. Ehsani M, Shahgholi E, Rahiminejad M, Seighali F, Rashidi A. A new index
of β-thalassemia trait in areas of high prevalence. Am J Clin Pathol. for discrimination between iron deficiency anemia and beta-thalassemia
2007;128(4):585–9. minor: results in 284 patients. Pakist J Biol Sci. 2009;12(5):473–5.
6. Angastiniotis M, Modell B. Global epidemiology of hemoglobin disorders. 32. Keikhaei B. A new valid formula in differentiating iron deficiency anemia
Ann N Y Acad Sci. 1998;850(1):251–69. from ß-thalassemia trait. Pakist J Med Sci. 2010;26:368–73.
7. Weatherall D, Clegg JB. Inherited haemoglobin disorders: an increasing 33. Nishad AAN, Pathmeswaran A, Wickremasinghe A, Premawardhena A.
global health problem. Bull World Health Organ. 2001;79:704–12. The Thal-index with the BTT prediction. exe to discriminate ß-thalassae-
8. Urrechaga E, Borque L, Escanero JF. The role of automated measurement mia traits from other microcytic anaemias. 2012.
of RBC subpopulations in differential diagnosis of microcytic anemia and 34. Wongprachum K, Sanchaisuriya K, Sanchaisuriya P, Siridamrongvattana
β-thalassemia screening. Am J Clin Pathol. 2011;135(3):374–9. S, Manpeun S, Schlep FP. Proxy indicators for identifying iron deficiency
9. Galanello R, Origa R. Beta-thalassemia. Orphanet J Rare Dis. 2010;5(1):11.
Rahim et al. BMC Medical Informatics and Decision Making (2021) 21:313 Page 12 of 13
among anemic vegetarians in an area prevalent for thalassemia and 55. Su X, Tsai C-L, Wang H, Nickerson DM, Li B. Subgroup analysis via recursive
hemoglobinopathies. Acta Haematol. 2012;127(4):250–5. partitioning. J Mach Learn Res. 2009;10:141–58.
35. Dharmani P, Sehgal K, Dadu T, Mankeshwar R, Shaikh A, Khodaiji S. Devel- 56. Li C, Glüer C-C, Eastell R, Felsenberg D, Reid DM, Roux C, et al. Tree-struc-
oping a new index and its comparison with other CBC-based indices for tured subgroup analysis of receiver operating characteristic curves for
screening of beta thalassemia trait in a tertiary care hospital. Int J Lab diagnostic tests. Acad Radiol. 2012;19(12):1529–36.
Hematol. 2013;35:118. 57. De’ath G, Fabricius KE. Classification and regression trees: a powerful yet
36. Pornprasert S, Panya A, Punyamung M, Yanola J, Kongpan C. Red cell simple technique for ecological data analysis. Ecology. 2000;81(11):3178–92.
indices and formulas used in differentiation of β-thalassemia trait from 58. Lemon SC, Roy J, Clark MA, Friedmann PD, Rakowski W. Classification and
iron deficiency in Thai school children. Hemoglobin. 2014;38(4):258–61. regression tree analysis in public health: methodological review and com-
37. Sirachainan N, Iamsirirak P, Charoenkwan P, Kadegasem P, Wongwerawat- parison with logistic regression. Ann Behav Med. 2003;26(3):172–81.
tanakoon P, Sasanakul W, et al. New mathematical formula for differentiat- 59. Speybroeck N, Berkvens D, Mfoukou-Ntsakala A, Aerts M, Hens N, Van
ing thalassemia trait and iron deficiency anemia in thalassemia prevalent Huylenbroeck G, et al. Classification trees versus multinomial models
area: a study in healthy school-age children. Southeast Asian J Trop Med in the analysis of urban farming systems in Central Africa. Agric Syst.
Public Health. 2014;45(1):174. 2004;80(2):133–49.
38. Bordbar E, Taghipour M, Zucconi BE. Reliability of different RBC indices 60. Malehi AS, Jahangiri M. Classic and bayesian tree-based methods. Enhanced
and formulas in discriminating between β-thalassemia minor and other expert systems. IntechOpen; 2019.
microcytic hypochromic cases. Mediterranean journal of hematology and 61. Feldesman MR. Classification trees as an alternative to linear discrimi-
infectious diseases. 2015;7(1). nant analysis. Am J Phys Anthropol Off Publ Am Assoc Phys Anthropol.
39. Janel A, Roszyk L, Rapatel C, Mareynat G, Berger MG, Serre-Sapin AF. 2002;119(3):257–75.
Proposal of a score combining red blood cell indices for early differentia- 62. Chan K-Y, Loh W-Y. LOTUS: an algorithm for building accurate and compre-
tion of beta-thalassemia minor from iron deficiency anemia. Hematology. hensible logistic regression trees. J Comput Graph Stat. 2004;13(4):826–52.
2011;16(2):123–7. 63. Breiman L, Friedman J, Stone CJ, Olshen RA. Classification and regression
40. Jahangiri M, Rahim F, Malehi AS. Diagnostic performance of hematological trees. Boca Raton: CRC Press; 1984.
discrimination indices to discriminate between βeta thalassemia trait and 64. Grubinger T, Zeileis A, Pfeiffer K-P. Evtree: Evolutionary learning of globally
iron deficiency anemia and using cluster analysis: Introducing two new optimal classification and regression trees in R. Working papers in econom-
indices tested in Iranian population. Sci Rep. 2019;9(1):1–13. ics and statistics; 2011.
41. Matos JF, Dusse L, Borges KB, de Castro RL, Coura-Vital W, Carvalho MDG. A 65. Loh WY. Tree-structured classifiers. Wiley Interdiscip Rev Comput Stat.
new index to discriminate between iron deficiency anemia and thalassemia 2010;2(3):364–9.
trait. Rev Bras Hematol Hemoter. 2016;38(3):214–9. 66. Loh WY. Classification and regression trees. Wiley Interdiscip Rev Data Min
42. Sharma S. Applied multivariate techniques. New York: Wiley; 1995. Knowl Discov. 2011;1(1):14–23.
43. Setsirichok D, Piroonratana T, Wongseree W, Usavanarong T, Paulkhaolarn 67. Loh W-Y, Shih Y-S. Split selection methods for classification trees. Stat Sin.
N, Kanjanakorn C, et al. Classification of complete blood count and hae- 1997:815–40.
moglobin typing data by a C4.5 decision tree, a naïve Bayes classifier and 68. Kim H, Loh W-Y. Classification trees with unbiased multiway splits. J Am Stat
a multilayer perceptron for thalassaemia screening. Biomed Signal Process Assoc. 2001;96(454):589–604.
Control. 2012;7(2):202–12. 69. Loh W-Y. Improving the precision of classification trees. Ann Appl Stat.
44. Dogan S, Turkoglu I. Iron-deficiency anemia detection from hematology 2009:1710–37.
parameters by using decision trees. Int J Sci Technol. 2008;3(1):85–92. 70. Hothorn T, Hornik K, Zeileis A. Unbiased recursive partitioning: a conditional
45. Urrechaga E, Aguirre U, Izquierdo S. Multivariable discriminant analysis for inference framework. J Comput Graph Stat. 2006;15(3):651–74.
the differential diagnosis of microcytic anemia. Anemia. 2013;2013. 71. Organization WH. Serum ferritin concentrations for the assessment of iron
46. Wongseree W, Chaiyaratana N, Vichittumaros K, Winichagoon P, Fucharoen status and iron deficiency in populations. World Health Organization; 2011.
S. Thalassaemia classification by neural networks and genetic programming. 72. Chinudomwong P, Binyasing A, Trongsakul R, Paisooksantivatana K. Diagnos-
Inf Sci. 2007;177(3):771–86. tic performance of reticulocyte hemoglobin equivalent in assessing the iron
47. Jahangiri M, Khodadi E, Rahim F, Saki N, Saki Malehi A. Decision‐tree‐based status. J Clin Lab Anal. 2020:e23225.
methods for differential diagnosis of β‐thalassemia trait from iron deficiency 73. Quinlan JR. C4.5: programs for machine learning. Amsterdam: Elsevier; 2014.
anemia. Expert Syst. 2017;34(3). 74. Kuhn M, Weston S, Culp M, Coulter N, Quinlan R. Package ‘C50’. CRAN, UTC;
48. Barnhart-Magen G, Gotlib V, Marilus R, Einav Y. Differential diagnostics of 2015.
thalassemia minor by artificial neural networks model. J Clin Lab Anal. 75. Karatzoglou A, Meyer D, Hornik K. Support vector machines in R. J Stat
2013;27(6):481–6. Softw. 2006;15(1):1–28.
49. Amendolia SR, Cossu G, Ganadu M, Golosio B, Masala G, Mura GM. A 76. Šimundić A-M. Measures of diagnostic accuracy: basic definitions. Med Biol
comparative study of k-nearest neighbour, support vector machine and Sci. 2008;22(4):61–5.
multi-layer perceptron for thalassemia screening. Chemom Intell Lab Syst. 77. Ferri C, Hernández-Orallo J, Modroiu R. An experimental comparison of per-
2003;69(1):13–20. formance measures for classification. Pattern Recogn Lett. 2009;30(1):27–38.
50. Bellinger C, Amid A, Japkowicz N, Victor H, editors. Multi-label classification 78. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under
of anemia patients. In IEEE 14th international conference on machine learn- two or more correlated receiver operating characteristic curves: a nonpara-
ing and applications (ICMLA); 2015. IEEE. metric approach. Biometrics. 1988:837–45.
51. Maity M, Mungle T, Dhane D, Maiti AK, Chakraborty C. An ensemble rule 79. Kruskal JB. Multidimensional scaling. London: Sage; 1978.
learning approach for automated morphological classification of erythro- 80. Charrad M, Ghazzali N, Boiteau V, Niknafs A. NbClust: an R package for
cytes. J Med Syst. 2017;41(4):56. determining the relevant number of clusters in a data set. J Stat Softw.
52. AlAgha AS, Faris H, Hammo BH, Alam A-Z. Identifying β-thalassemia carriers 2014;61(1):1–36.
using a data mining approach: the case of the Gaza Strip, Palestine. Artif 81. Wang K, Phillips CA, Saxton AM, Langston MA. EntropyExplorer: an R
Intell Med. 2018;88:70–83. package for computing and comparing differential Shannon entropy, dif-
53. Lipkovich I, Dmitrienko A, Denne J, Enas G. Subgroup identification ferential coefficient of variation and differential expression. BMC Res Notes.
based on differential effect search—a recursive partitioning method for 2015;8(1):1–5.
establishing response to treatment in patient subpopulations. Stat Med. 82. Available from: https://stats.stackexchange.com/questions/239973/a-gener
2011;30(21):2601–21. al-measure-of-data-set-imbalance/239982.
54. Loh WY, He X, Man M. A regression tree approach to identifying subgroups 83. Ehsani M, Sotoudeh K, Shahgholi E, Rahiminezhad M, Seyghali F, Aslani A.
with differential treatment effects. Stat Med. 2015;34(11):1818–33. Discrimination of iron deficiency anemia and beta thalassemia minor based
on a new index. 2007.
Rahim et al. BMC Medical Informatics and Decision Making (2021) 21:313 Page 13 of 13
84. Vehapoglu A, Ozgurhan G, Demir AD, Uzuner S, Nursoy MA, Turkmen S, et al. 87. Bellinger C, Amid A, Japkowicz N, Victor H, editors. Multi-label classification
Hematological indices for differential diagnosis of beta thalassemia trait and of anemia patients. In: 2015 IEEE 14th international conference on machine
iron deficiency anemia. Anemia. 2014;2014. learning and applications (ICMLA). IEEE;2015.
85. Hoffmann JJ, Urrechaga E, Aguirre U. Discriminant indices for distinguishing
thalassemia and iron deficiency in patients with microcytic anemia: a meta-
analysis. Clin Chem Lab Med. 2015;53(12):1883–94. Publisher’s Note
86. Jahangiri M, Rahim F, Saki Malehi A, Pezeshki SMS, Ebrahimi M. Differential Springer Nature remains neutral with regard to jurisdictional claims in pub-
diagnosis of microcytic anemia, thalassemia or iron deficiency anemia: a lished maps and institutional affiliations.
diagnostic test accuracy meta-analysis. Mod Med Lab J. 2019;3(1):1–14.