0% found this document useful (0 votes)

6 views

Breast Cancer Gene Expression

The paper reviews various feature selection and classification techniques for breast cancer diagnosis using microarray gene expression data. It highlights the importance of preprocessing to eliminate irrelevant features, which can degrade classification accuracy, and discusses different methodologies including filter, wrapper, embedded, and hybrid approaches. The study emphasizes the role of machine learning in enhancing diagnostic accuracy and the necessity of effective feature selection to improve model performance.

Uploaded by

Pavan Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Breast Cancer Gene Expression

Uploaded by

Pavan Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Journal of Biomedical Informatics 117 (2021) 103764

Contents lists available at ScienceDirect

Journal of Biomedical Informatics

journal homepage: www.elsevier.com/locate/yjbin

Classification of breast cancer using microarray gene expression data:

A survey
Muhammed Abd-Elnaby a, *, Marco Alfonse a, Mohamed Roushdy b
a
Faculty of Computers and Information Science, Ain Shams University, Cairo, Egypt
b
Faculty of Computers and Information Technology, Future University, New Cairo, Egypt

A R T I C L E I N F O A B S T R A C T

Keywords: Cancer, in particular breast cancer, is considered one of the most common causes of death worldwide according
Feature selection to the world health organization. For this reason, extensive research efforts have been done in the area of ac
Machine learning curate and early diagnosis of cancer in order to increase the likelihood of cure. Among the available tools for
Cancer classification
diagnosing cancer, microarray technology has been proven to be effective. Microarray technology analyzes the
Microarray data
expression level of thousands of genes simultaneously. Although the huge number of features or genes in the
microarray data may seem advantageous, many of these features are irrelevant or redundant resulting in the
deterioration of classification accuracy. To overcome this challenge, feature selection techniques are a manda
tory preprocessing step before the classification process. In the paper, the main feature selection and classifi
cation techniques introduced in the literature for cancer (particularly breast cancer) are reviewed to improve the
microarray-based classification.

1. Introduction nipple region. Early treatment of cancer increases the possibility of the
cure and reduces the fatality rate and probability of recurrence [3].
All cells have a nucleus that contains deoxyribonucleic acid (DNA). Recurrence may happen after months or years from an initial treatment
DNA is carrying genetic information of the organism to develop, func and it can be local where cancer affects the same place or can be distant
tion, grow and reproduce. The coding segments of DNA are called genes, where cancer returns to different areas in the body [4]. Breast Cancer is
which are re sponsible for making proteins. Proteins do the essential detected using traditional methods, e.g., physical detection, blood test,
work in every organism and they are synthesized in two steps. Firstly, and X-ray scan, but they are time-consuming and subject to human er
DNA is transcribed into mRNA, then mRNA is translated into proteins. rors [5]. Medical errors are considered the third-leading cause of death
Genetic technologies such as DNA microarrays measure the simulta in the US [6]. Therefore, an effective tool for the diagnosis of breast
neous expression of genes, offering us a global view of the cell, which cancer is necessary, and for this purpose microarray technology is
helps in differentiating between normal and diseased states. Cancer can extensively used. Gene expression data of DNA microarray represents
be described as a group of diseases associated with uncontrollable cell the state of a cell at a molecular level [7]. It has a great perspective as a
growth that invades and metastasizes to other tissues. It is considered medical diagnosis. They either analyzed to determine whether the pa
the second main cause of death globally, about 9.6 million in 2018, 1 out tient is oncological or not (two-class problems), distinguish between
of 6 dies due to cancer. The most common types for men are; Lung, different types of cancer (multi-class problems) [8], predict the response
prostate, colorectal, stomach, and liver cancer while breast, colorectal, to a drug based on the gene signature, or identify tumors [9] by finding
lung, cervical, and thyroid cancer are popular among women [1]. Breast groups of similarly expressed genes. They effectively analyzed by ma
cancer is a heterogeneous disease having different histological and chine learning (ML). ML is an automatic and intelligent learning tech
biological properties and various treatment responses [2]. It can be nique that gives machines the ability to learn without being explicitly
traced back to genetic, epigenetic, or transcriptome changes. It appears programmed. Ml techniques are widely employed in solving many
as a lump, nipple discharge, or a change of skin texture around the complex real-world problems and have proven to be efficient in

* Corresponding author.
E-mail addresses: [email protected] (M. Abd-Elnaby), [email protected] (M. Alfonse), [email protected]
(M. Roushdy).

https://doi.org/10.1016/j.jbi.2021.103764
Received 8 July 2020; Received in revised form 9 March 2021; Accepted 26 March 2021
Available online 6 April 2021
1532-0464/© 2021 Elsevier Inc. This article is made available under the Elsevier license (http://www.elsevier.com/open-access/userlicense/1.0/).
M. Abd-Elnaby et al. Journal of Biomedical Informatics 117 (2021) 103764

Fig. 1. The pipeline of microarray analysis.

Fig. 2. The taxonomy of the feature selection techniques.

analyzing gene expression data. Applying ML improves the accuracy of

Table 1
predicting cancer vulnerability, mortality, and recurrence by about
The pros and cons of feature selection techniques.
15:25 [10]. While the increase in information as a result of increasing
feature numbers should improve the differentiating capacity, the accu Pros Cons
racy deteriorates when the number of features exceeds a specific limit, Filter • Not dependent on any particular • Do not consider the
especially with a small number of samples, which is known as the “curse algorithm interaction with the classifier
of dimensionality. This deterioration occurs as not all features are • Fast and are computationally • Low of performance
simple
informative; many are irrelevant, redundant, or noisy. To overcome this Wrapper • It always selects a near perfect • It has higher risk of over
problem feature selection techniques are utilized to select the most subset. fitting than filter techniques.
informative genes. Another issue with microarray data, they are un • Error rate in this method is less • It is computationally very
balanced; the number of available samples in each class is not equal that compared to other methods. intensive compared to other
methods.
makes the classification biased toward the class having the majority of
• It is meant for the particular
samples also ranking of features is considered as a challenge [11]. To learning machine on which it
address this issue oversampling techniques can be used. The paper is has been tested.
organized as follows: Section 2: defines the methodology of classifying Embedded • Computationally less intensive • Specific to a learning
breast cancer. Section 3: presents the state of the art of breast cancer than wrapper methods. machine.
• They include the interaction • Problem of over-fitting
classification. Section 4 presents the discussion and finally, section 5 with the classification model. compared to filters.
offers the conclusion and possible future work. • They make better use of the
available data by not needing to
split the training data into a
2. Methodology of cancer classification
training and validation set.
• They reach a solution faster by
Analysis of DNA microarray data is done through the following steps. avoiding retraining a predictor
Firstly, data are preprocessed using feature selection techniques to from scratch for every variable
subset investigated.
remove noisy, redundant features and get only informative ones. Then
Hybrid • combines the advantages of • Time complexity may increase
the resultant subset is used to train the learning model to diagnose various approaches.
cancer subtypes as illustrated in Fig. 1.

• Improves Accuracy: train the model with less misleading data will
2.1. Feature selection improve the accuracy.
• Reduce Training Time: The smaller the number of features, the less
Feature selection is the process of automatically or manually select computation time required for training.
the features that have an impact on the prediction to: • Offer biologists with insight about the mechanism between gene
signature and diseases [12].
• Reduce Overfitting: overfitting means the model doesn’t generalize
well from our training data to unseen data due to noise and redun Feature selection can be classified based on the integration between
dancy in the data. The model will be well generalized when the selection algorithm and the implemented model into four main
removing such data. categories, as shown in Fig. 2. the pros and cons of feature selection

2
M. Abd-Elnaby et al. Journal of Biomedical Informatics 117 (2021) 103764

Fig. 3. The taxonomy of the machine learning techniques.

techniques are presented in table 1. performance until a subset of the desired k features is reached that gives
maximum accuracy. Common techniques are sequential forward selec
2.2. Filter approach tion and sequential backward selection. Heuristic search algorithms: the
most utilized algorithms are:
It evaluates the features based on the intrinsic properties of the data
like distance, correlation, and consistency independently of the classi • Genetic Algorithm (GA) is a heuristic search algorithm that is
fier. Although they are computationally faster and have a strong inspired by natural evolution. The main principle of GA is randomly
generalization ability [5] their performance is lower. Filter approaches generating a population through three operations. Firstly, the se
are divided into univariate and multivariate. Univariate feature selec lection operation chooses individuals whose fitness functions are
tion examines each feature individually to measure the strength of an better. Then, in the crossover operation, each pair of individuals are
association between the features and the outcome variable. Common selected with a random crossover point to generate new offspring.
types are mutual information (MI) and information gain (IG). Finally, the Mutation process makes diversity in the population [16].
• Artificial Bee Colony (ABC) is a swarm-based algorithm that simu
• MI measures the correlation between the two variables. In other lates how honeybees search for food. The colony of bees consists of
words, measures how much information one variable (X) knows employed bees, on– lookers, and scouts. Employed bees: its numbers
about another one (Y). In gene selection, it measures the correlation are equal to the number of food sources. Each employed bee goes to a
between gene and classification category. The larger the value of MI, food source and evaluates it. Based on information shared by
the more informative the genes are [13]. employed bees, onlookers elect the food source. An employed bee
• IG is a statistical property that measures how infor mative a feature becomes a scout when the food source is depleted and begins to
is. Highly related features to class are those with the information, randomly search for a new food source around [17].
while unrelated features give no information. To determine the value • Particle Swarm Optimization (PSO) is a swarmbased algorithm, that
of IG, entropy value which is the impurity of the given samples is mimics how members in groups such as birds or fishes interact to
used. Then a threshold is set and features which value higher than share information. In PSO, a candidate solution is represented by a
the threshold are selected [14]. particle that has fitness values and velocity to direct the fly. Through
updating the position of the particle due to its own and of other
Multivariate evaluates features in the context of others, the most particle experience, an optimum solution can be reached [18].
typically used techniques are: • Bat Algorithm (BA) is a swarm-based algorithm, based on the
mechanism bats use to situate their prey, echolocation. Bats are
• Minimum Redundancy and Maximum Relevance (mRMR) tends to randomly fly based on the distance to the target. They automatically
select highly correlated features with the class and lowly between alter the frequency and rate of the emitted pulse. The solution is
themselves. It may not proper to select both features that are highly elected from among the best solutions [19].
relevant and highly correlated, as they wouldn’t add more infor
mation due to high correlation, but they would increase the model 2.2.2. Embedded approach
complexity and make it susceptible to overfitting [15]. Embedded methods learn which features best contribute to the ac
• Correlation-based Feature Selection (CFS) ranks features based on curacy of the model while the model is created. The most common types
the correlation due to the heuristic evaluation functions. Features are of embedded feature selection methods are regularization methods.
evaluated according to the hypothesis “Good feature subset contains
features that are highly correlated with the classification and yet 2.2.3. Hybrid approach
uncorrelated to each other” [13], which means a low correlation The hybrid approach can be any combination of any number of same
with the class refers to irrelevant features while informative features or different methods of feature selection to combine the advantages of
are strongly correlated [15]. both approaches and overcome or handle the drawback of each
• Fast Correlation Based Filter (FCBF) is a multivariate algorithm that approach individually. A combination is usually a filter-wrapper
bases on symmetrical uncertainty (SU) to select highly correlated approach that gets the benefit of fast computational of filter approach
features with the class. Then it applies heuristics to remove the to remove redundant features and high performance of wrapper
redundant features and maintain relevant ones to the class [6]. approach. It also less prone to overfitting than wrapper but it is classifier
specific.

2.2.1. Wrapper approach 2.3. Machine learning

Wrapper methods are using learning algorithms to select the optimal
subset of features. They have better accuracy than filter methods, but Machine Learning techniques are mainly classified into two cate
they are intended for a particular learning algorithm, tend to overfit and gories as shown in Fig. 3.
they are very computationally intensive, as the model has to train each
subset. This approach can be categorized into sequential selection al 2.3.1. Supervised approach
gorithms and heuristic search algorithms. Sequential selection algo Supervised machine learning algorithms need labeled data. The
rithms remove or add one feature at a time based on the classifier commonly used classifier in microarray analysis are:

3
M. Abd-Elnaby et al. Journal of Biomedical Informatics 117 (2021) 103764

• K-nearest neighbor (KNN) is a lazy learner that builds no model. It is efficiency of SVM to obtain high accuracy. Al-Batah et al.[24] used
used for classification and regression tasks. For classification it the filter method, CFS to remove redundant genes and get the
classifies data based on the classification of its neighbors, “birds of informative ones, then for classification process Decision Table,
feather flocks together”, an object is classified to the major class JRip, and OneR were used. The proposed approach can achieve high
among its k nearest neighbors. For regression, the output is the accuracy and fast computational speed with just a few numbers
average of the values of k nearest neighbors [20,21]. genes.
• Naïve Bayes (NB) is a probabilistic machine learning algorithm based Gao et al. [25] proposed PA-SVM that combines PSO with ABC
on Bayes’ theorem and widely used in classification tasks. Naïve named (PA) to optimize the classification of SVM. FCBF was initially
means the features are independent of each other and changing the used to obtain informative genes. Then PA-SVM evaluated 9 datasets.
value of one feature does not directly change the value of any of the The result was compared with other classifiers. According to the result,
other features. NB classifies data by calculating the posterior prob PA-SVM achieved good results with just a few numbers of genes.
ability for each class using the probability of the features belonging Baliarsingh et al [26] proposed Jaya optimized extreme learning
to the class. However, the simple assumption of NB, it is fast and machine (JELM) for breast cancer classification. Jaya is used to select
effective in real problems. Bayesian belief networks are used to deal the optimal input weights and hidden biases for ELM. The authors used
with the features’ dependency [20]. Wilcoxon rank sum test to select relevant genes. The performance of
• Support Vector Machine (SVM) is commonly used for classifying JELM was compared by the performance of SVM, KNN, NB, and c4.5 and
gene expession data due to the sparseness of solution sparseness of achieved a higher result about 90.91%. although the proposed model
solution and it’s ability to handle large feature space [22]. Firstly, achieved high accuracy, it selected a huge subset of about 505 genes so it
data items are plotted in n-dimensional space. Then SVM finds the needs a further reduction in the genes subset.
hyperplane that best differentiate between classes. Su et al. [27] introduced a gene selection method based on
Kolmogorov-Smirnov (K-S) test and CFS. Firstly, K-S test removed
2.3.2. Unsupervised approach redundant and noise genes by comparing the distribution of two sample
Unsupervised is a form of learning that requires no labeled data. One types. Then, the filtered subset was evaluated by CFS. Only genes that
of the common techniques is K-means where data with similar features are highly correlated with the class and have low redundancy remained.
are grouped in the same cluster.k-means in microarray analysis can be Finally, the proposed method the evaluation of proposed method was
used to remove redundant genes by grouping similar data [11]. done using SVM classifier with 10-fold CV. It’s the result was compared
with other FS techniques. K-S test-CFS had superior performance but
3. Different methods for classifying cancer optimization in running time was needed.
Ahmad, F. K [28] utilized different filter feature selection techniques
3.1. Filter approach namely SNR, FC, IG, and t-Test to select the informative genes. Gene
selection techniques were applied on three datasets. Finally, SVM was
Purbolaksono et al. [5] introduced a system of 3 stages for classifying used to evaluate the proposed methods. IG was effective to select a
microarray data, the first was discretization which used k-means for minimum set of attributes and SVM had high accuracy with IG and SNR
transforming continuous data into discrete and dividing data into clus techniques.
ters. Then the second stage was feature selection, mutual information
was used for dimensional reduction and obtaining informative genes. 3.2. Hybrid approach
Finally, the Bayes theorem was implemented on five datasets. The Best
result was obtained with k = 10 and the result showed that Bayesian WU et al. [23] proposed, a hybrid improved binary quantum particle
Network methods have better performance than Naïve Bayes in classi swarm optimization algorithm HI-BQPSO for feature selection,
fying the microarray data. combining the advantages of filtering and a random heuristic search.
Cilia et al. [8] compared the performance of various feature selection Firstly, the maximum information coefficient (MIC) was used to calcu
and classification techniques on six datasets. For feature selection, the late the correlation between features and class to obtain an initial
authors focused on feature ranking techniques, which evaluate each feature subset. Then the improved BQPSO was used to obtain the opti
feature singularly. The datasets were evaluated using a decision tree mized feature subset. The proposed model was evaluated using 9 gene
(DT), Random Forest (RF), KNN, and multilayer perceptron classifiers datasets with SVM classifier. However, HI-BQPSO had good overall
with 10-fold cross-validation (CV). The result of utilized filter tech performance and strong searchability, it still needs improvement espe
niques was compared with the Sequential Forward Floating Search, the cially for CNS dataset.
Fast Correlation-Based Filter, and the Minimum Redundancy Maximum Medjaheda et al. [29] proposed an approach to diagnosis cancer. In
Relevance. The ranking techniques obtained high results with three the first phase, Support Vector Machines based on Recursive Feature
datasets. While FCBF and SFFS obtained high results for the other three. Elimination (SVM-RFE) was used to eliminate 40 percent of features.
However, with the high result obtained, there was a need for further The remaining subset was processed via Binary Dragonfly (BDF) to
reduction in Ovarian, Lymphoma, and Lung datasets. retain informative genes only. The proposed method was evaluated on 6
Aydadenta and Adiwijaya [11] utilized k-means and IG for feature microarray cancer datasets. However, the model achieved comparable
selection. Initially, k-means was used to group similar features in one results but for breast, it was not satisfying as it achieved high accuracy
cluster, so a redundant one is removed. Then Relief algorithm was used but with a very huge number of features.
to rank the clusters’ elements and top-ranking features of each cluster Jain et al. [30] proposed a hybrid feature selection method that
were combined to train RF. The proposed model was evaluated on three combined CFS and IBPSO. The IBPSO enhanced the early convergence to
datasets and showed a higher result than the model using RF only the local optimum of BPSO. The proposed method was utilized on 11
without clustering [23]. microarray datasets and evaluated by NB with stratified 10 k-CV. The
model was compared with 7 classifiers and outperformed them in terms
V. Bolón.et.al. [12] reviewed state of art techniques applied in the of accuracy and number of selected genes in most cases.
domain of microarray classification. Then a practical evaluation was Shahbeig et al [31] proposed a hybrid TLBO-PSO that combined
done to compare the performance of the different techniques. teaching learning-based optimization (TLBO) algorithm and mutated
different feature selection techniques eg. ReliefF, SVM-RFE, mRMR, fuzzy adaptive particle swarm optimization (PSO) algorithm. The
IG, and FCBF were used for gene selection. Then C4.5, NB, and SVM mutated PSO is used to overcome PSO possibility to be trapped in the
were tested to get the accuracy of the model. The result showed the local optimum solutions. A constant or even linearly changed value of

4
M. Abd-Elnaby et al. Journal of Biomedical Informatics 117 (2021) 103764

inertia weight may prevent the PSO algorithm from reaching the opti ABC–SVM to obtain an accurate result in diagnosing breast cancer.
mum result. Fuzzy tuning of the inertia weight based on the proposed Initially, PSO and ABC were used as feature selection techniques. Then
total normalized function value can enhance The convergence speed of SVM was used as a classifier. The result showed that ABC–SVM had
PSO and avoid trapping at the local optimum. The proposed method was accurate results and it was effective to deal with high dimensional data
evaluated using SVM and achieved 91.88% accuracy with 195 features. like microarray.
Lu et al. [32] proposed MIMAGA, a hybrid feature selection algo Zhongxin et al. [40] proposed a Feature Selection Algorithm based
rithm combining mutual information maximization (MIM) and the on Mutual Information and Lasso(FSMIL). In the first stage, MI was used
adaptive genetic algorithm (AGA). Initially, MIM was applied as a pre to filter irrelevant features. Then in the next stage, an improved version
processing step to obtain a subset contains only 300 genes. Then of lasso was trained in the candidate subset to produce the most infor
wrapper technique, AGA, was applied. Finally, extreme learning ma mative genes. the produced methods were applied on five datasets. To
chine was applied as a classifier on the data set. MIMAGA was compared test the accuracy of the methods SVM classifier was utilized. The pro
with other FS techniques. The result showed that while MIMAGA takes a posed method achieved high accuracy, especially for lung and Lym
long time, it was efficient and had the best result. phoma datasets.
Alomari et al. [33] proposed a hybrid filter-wrapper gene selection Sardana et.al. [41] proposed a hybrid approach Cluster quantum
method using the filter approach, Minimum Redundancy Maximum Genetic Algorithm(ClusterQGA) to accurately classify cancer. Initially, a
Relevancy, and wrapper approach flower pollination algorithm (FPA). cluster was used to remove irrelevant and redundant data, then the
Initially, MRMR was employed to obtain important genes, that have the computer power of quantum and genetic algorithm were effectively
minimum redundancy for input genes and the maximum relevancy to used to select only relevant features. The proposed method was applied
the target class, from the gene expression data. Then these genes were to four datasets and evaluated using SVM and KNN classifiers. However,
used by FPA to get the most informative ones. The proposed model was ClusterQGA was successfully reduced the number of genes, the accuracy
evaluated on three datasets and compared with MRMR-GA. the pro of classifying needs further improvement.
posed method showed comparative results regarding the accuracy and a Singh and Sivabalakrishnan [42] presented a hybrid selection tech
low number of features. nique that comprised mRMR with Adaptive Genetic Algorithm (AGA). In
Turgut et al.[34] used Recursive Feature Elimination(RFE) and the first phase, mRMR was effectively used to reduce the dimensions and
Randomized Logistic Regression (RLR) feature elimination methods to the redundancy in the data. Subsequently produced subset was further
select informative genes. The proposed method selected the top 50 processed via AGA to get the most relevant genes. The mRMR-AGA
features. Performance of the proposed methods was evaluated using 8 approach was evaluated by four classifiers on four benchmarked data
classifiers: SVM, KNN, Multi-Layer Perceptron, DT, RF, LR, AdaBoost, sets and achieved comparable results.
and Gradient Boosting Machines with k- CV on two different breast Nagpala and Singhb [43] proposed qualitative mutual information
cancer datasets. The best result was achieved with SVM as a classifier for (QMI) for feature selection. Initially, RF was used to obtain the impor
both datasets. tance of each gene which was used to calculate the preference score (PS).
Mufassirin and Ragel. [35] proposed a novel filter- wrapper based PS reduces the redundancy in the subset. Then MI was used to obtain the
feature selection approach. Initially, a filter method gain ratio was used informative genes. The proposed method evaluated four datasets, and
to determine the importance of genes, by measuring the gain ratio for classification, NB, C4.5, and IB1 were used with 10-fold CV. The
concerning the relevant class to eliminate irrelevant and redundant result showed that the proposed method along with NB obtained an
genes. The second phase wrapper subset evaluator was used to evaluate accurate result of more than 98% for two datasets.
the subset produced after using gain ratio. Finally, the proposed Loey et al. [44] presented an intelligent decision support system for
approach was evaluated using J48, DT, NB, Sequential Minimal Opti diagnosing microarray cancer data. Initially, IG was used to select
mization on five datasets. The proposed model had time efficiency and relative genes. Then the selected subset was reduced via Grey Wolf
gave high results. Optimization (GWO). Finally, SVM was utilized for breast and colon
Sreepada et al. [36] proposed a hybrid of filter-wrapper approach for cancer classification. However, the IG-GWO approach achieved high
gene selection to combine the fast computation of the filter approach accuracy but with a huge number of features (about 240) for the breast
and the accuracy of the wrapper approach. Firstly, Filter techniques are cancer dataset.
computationally faster, and the wrapper approach is more efficient for Hamim et al. [45] combined a filter approach fisher score(F) with
classification accuracy. Each of F-Score and IG was separately used to C5.0 to select relevant breast cancer genes. Initially, Fisher score
produce a subset for each, then both sub- sets were combined. Then removed redundant genes and reduced the subset to only 10% of genes.
wrapper methods, Sequential Backward Elimination (SBE) and Then, C5.0 selects only 5 relevant genes. The proposed FC5 was assessed
Sequential Forward Selection (SFS) with SVM were used to get the by C5.0, ANN, SVM, and LR with stratified 10-fold CV. C5.0 achieved
informative genes. The proposed method was evaluated using three higher accuracy about 93.28%.
datasets and achieved good results of more than 97% for two datasets.
Hameed et al. [37] proposed a hybrid approach to elect the infor 3.3. Other approaches
mative genes. Firstly, Pearson correlation coefficient (PCC) was ran 10
times to select the top 100 ranked features. Then either binary PSO or Jinthanasatian et al. [46] utilized a neuro-fuzzy with firefly algo
GA was used for further reduction. Different classifiers were employed rithm to classify microarray data. A neurofuzzy algorithm was used to
to test the accuracy of eleven datasets based on 10-fold CV. The result select informative genes, and rule set generation as a classifier. firefly
showed that SVM had higher accuracy and BPSO performing faster and algorithm was utilized to optimize the parameters. The proposed
have high result than GA with a smaller number of selected genes. method was evaluated on seven datasets and the accuracy was assessed
Salem et al. [38] proposed a hybrid approach named (IG- SGA). with 10 k-fold method. The proposed algorithm provided comparable
Initially, IG was used with various thresholds to reduce the feature set. results with other techniques, but further improvement is needed
Then the reduced subset was passed to GA to obtain the most informa especially for the colon dataset achieved only 76.94%.
tive gene. Finally, genetic programming was used to classify seven Li et al. [47] proposed random value-based oversampling (RVOS)
datasets. The performance was assessed using 10-fold CV. However, the and an improved version of SVM-RFE to effectively analyze microarray
proposed model showed a higher result, needs further improvement was data. Firstly, RVOS was utilized to balance the distribution of two
needed specifically for Lung- Ontario datasets and there was a limitation samples. Then an improved version of linear SVM (LLSVM) with the
in terms of the time complexity. improved RFE strategy was used to get the informative genes. Finally,
Utami and Rustama [39] proposed a hybrid method PSO–SVM and the proposed model was evaluated using four classifiers with stratified

5
M. Abd-Elnaby et al. Journal of Biomedical Informatics 117 (2021) 103764

Table 2
Different Methods for classifying breast cancer.
Ref Feature selection Classifier Dataset[ref] Classificationaccuracy No.Genes

Purbolaksono et al. [5] MI BN Colon [48] 86.7% NA

BN Ovarian [49] 98.4%
BN Leukemia [50] 88.2%
BN Breast [51] 84%
NB Lung [52] 98.66%
Cilia et al.[8] GR KNN Breast [53] 91.96% 50
GR KNN Colon [48] 91.94% 10
FCBF NN Leukemia [50] 99.44% 51
FCBF NN Lymphoma 100% 128
SFFS NN Lung [52] 97.92% 308
GR NN Ovarian [49] 96.85% 500
V.Bolón.et.al. [12] ReliefF NB Breast [51] 89% 50
ReliefF SVM Prostate [54] 97% 50
IG SVM Colon [48] 85% 50
Al-Batah et al.[24] CFS JRip Breast [51] 88.7% 138
Decision Table, JRip Colon [48] 96.8% 26
Decision Table CNS 90.0% 9
Decision Table Leukemia [50] 98.6% 79
JRip Lung 97.5% 548
Gao et al. [25] FCBF PA-SVM Breast 88.66% 92
Lung 79.49% 6
NervSys 91.67% 28
Prostate 100% 27
Colon 93.55% 14
Leukemia 100% 97
Ovarian 100% 30
DLBCL1 100% 73
DLBCL2 86.21% 27
Baliarsingh et al. [26] Wilcoxon rank sum test JELM Breast[51] 90.91% 505
Su et al. [27] K-S test + CFS SVM Breast [51] 87.4% 11.7
Lung [52] 91.6% 23
Colon [48] 90.1% 10.7
Ovarian [49] 98.5% 33.2
Leukemia [50] 79.6% 25.2
Ahmad, F. K. [28] IG SVM Breast 80% 200
IG Colon 87% 100
SNR Lung 91% 100
Medjaheda el at. [29] SVM-RFE+BDF SVM Breast[51] 89.47% 7237
Jain el at. [30] CFS-iBPSO NB Breast[51] 92.75% 32.7
Shahbeig et al. [31] TLBO-PSO SVM Breast[51] 91.88% 195
Lu et al. [32] MIMAGA ELM Leukemia 95.95% 60
Colon [48] 85.45% 77
Prostate 97.12% 60
Lung 93.75% 74
Breast 87.12% 59
SRBCT 90.11% 78
Alomari et al. [33] MRMR-GA Colon 88.01% 6.73
MRMR-FPA Ovarian 100% 4
MRMR-FPA Breast 85.88% 16.8
Turgut et al. [34] RFE + RLR SVM Breast [51] 87.87% 50
Mufassirin and Ragel. [35] GR + Wrapper NB Breast [51] 89.69% NA
Subset Bayes Net Colon [48] 95.16%
Evaluator NB Lung 97.04%
NB Leukemia [50] 100%
NB Ovarian [49] 100%
Hameed et al. [37] PCC-BPSO SVM Brain 97.62% 13
PCC-BPSO SVM Breast [51] 90.72% 41
PCC-BPSO SVM CNS 98.33% 39
PCC-BPSO Bayes net Colon [48] 93.55% 23
PCC-BPSO KNN||Bayes net Leukemia 100% 17
PCC-BPSO NB Lung [52] 98.03% 39
PCC-BPSO NB Lymphoma 100% 19
PCC-GA SVM MLL 100% 22
PCC-BPSO KNN Ovarian 100% 15
PCC-BPSO SVM Prostate 97.06% 33
PCC-BPSO RF||SVM SRBCT 100% 19
Utami and Rustama. [39] ABC SVM Breast 88% NA
Sardana et.al[41] ClusterQGA SVM Breast [51] 86.6% 21
KNN Melanoma 94.74% 6
KNN Colon[48] 100% 11
SVM Prostate 100% 14
Singh andSivabalakrishNA[42] mRMRAGA ELM Breast[51] 86.73% 140
Nagpala andSinghb[43] QMI IB1 Colon 87.09% 68
Breast [51] 90.72% 98
(continued on next page)

6
M. Abd-Elnaby et al. Journal of Biomedical Informatics 117 (2021) 103764

Table 2 (continued )
Ref Feature selection Classifier Dataset[ref] Classificationaccuracy No.Genes

Leukemia 98.61% 93
Loey el al.[44] IG + GWO SVM Colon[48] 95.9% 16
Breast [51] 94.87% 240
Hamim et al. [45] FC5 C5.0 Breast[51] 93.28% 5
Jinthanasatian et al. [46] a neuro-fuzzy algorithm rule set generation Lung [52] 93.42% 4
Ovarian [49] 96.13% 12
Prostate [54] 87.43% 5
Leukemia [50] 82.27% 7
Breast [51] 82.37% 7
Colon 76.94% 11
DLBCL 83.81% 13
SVM-RFE
Li et al. [47] Prostate [54] 92.20% NA
SVM-RFE
Breast [51] 86.09%
SVM-VSSRFE CNS [55] 88.39%
SVM-RFE
Colon [48] 93.75%
SVM-VSSRFE
Ovarian [49] 100%
SVM-VSSRFE
Leukemia [50] 100%

Table 3
Different Methods for classifying other cancer.
Ref Feature selection Classifier Dataset[ref] Classificationaccuracy No.Genes

Aydadenta and Adiwijaya.[11] K-means -Relief RF Colon [48] 85.87% NA

Lung [52] 98.90%
Prostate[54] 88.97%
WU et al. [23] MIC- BQPSO SVM Leukemia 97.81% NA
Colon 88.36%
Prostate 91.60%
DLBCL 96.8%
CNS 74.64%
Sreepada et al. [36] (F-Score, IG)And (SBE and SFS) SVM Colon [48] 87.5% 12
DLBCL 100% 14
Leukemia 97.37% 16
Salem et al. [38] IG-SGA GP Leukemia [50] 97.06% 3
Colon [48] 85.48% 60
CNS [55] 86.67% 38
Lung-Ontario 74.4% 11
Lung –Michigan 100% 9
DLBCL 94.80% 110
Prostate cancer 100% 26
Zhongxin et al.[40] FSMIL SVM Colon 90.86% NA
Prostate 96.52%
Lymphoma 98.68%
Leukemia 98.63%
Lung 100%

5-fold CV. Results showed that SVM-VSSRFE had better results for three high result but it selects a larger number of features on the other hand
datasets, and also the efficiency of LLSVM-VSSRFE in reducing time the ability of evolutionary wrapper feature selection techniques to find
consumption, especially with high dimensional datasets. optimal or near-optimal subset help hybrid approach to achieve higher
Different Methods for classifying breast cancer,other cancer types are accuracy with just a small subset. In [45], FC5 could generate the
presented in tables 2,3 respectively. The accuracy of state of art methods smallest subset about 5 genes, while the highest result achieved in [41]
are presented in Fig. 4. using IG-GWO but with a large subset of about 240 genes. However, the
hybrid approach can achieve better performance than a filter, SVM-RFE,
4. Discussion and BDF in [30] had the worst performance in terms of the Number of
selected genes it selected about 7237 genes but with acceptable accu
Although microarray data are proven to be efficient for diagnosing racy. For other cancer types, while GR in [8] generated a small subset for
cancer, the huge number of its features with respect to small sample size, colon [48], it produced a large subset for ovarian [49] and lung [52].
for example, breast datasets Van’t Veer [51] and wang [53] have 24,482 PCC-BPSO in [37] produced a the small subset for ovarian and lung that
and 18,000 features with only 97 samples, cause a so-called curse of led to the best accuracy using KNN and NB respectively. Applying
dimensionality problem. To avoid it hybrid and filter selection tech ClusterQGA in [41] with KNN classifier led to the best performance for
niques are commonly used. Filter approach is fast and isn’t computa colon about 100% accuracy. SVM has a high accuracy of for breast, KNN
tionally extensive so it is used in [5,8,12,24–28] and recommended to be has better accuracy for colon and ovarian and NB for lung. Another issue
initially used in hybrid approach in [29–35,37,39,41–46]. Applying due to few samples for accurate validation, 10-fold CV is commonly
filter approach on Van’t Veer, K-S test-CFS in [27] generated a small used.
subset in comparison to subset generated by other filter technique but
with lower accuracy about 87.4%.While in [26] Wilcoxon rank-sum test 5. Conclusion and future work
achieved high accuracy about 90.91% but with a large gene subset about
505 genes. On the other hand applying GR on wang achieved higher Microarray data analysis deepens your understanding of cancer
accuracy with only 50 genes. While the filter approach may achieve a pathogenesis and also having diagnostic value. It accurately diagnoses

7
M. Abd-Elnaby et al. Journal of Biomedical Informatics 117 (2021) 103764

Fig. 4. The performance result of breast cancer dataset.

cancer. However, The accuracy influenced by a large number of features profiles, IEEE/ACM transactions on computational biology and bioinformatics 11
(2014) 727–740.
and the limited number of samples. Dimensionality reduction tech
[10] K. Kourou, T.P. Exarchos, K.P. Exarchos, M.V. Karamouzis, D.I. Fotiadis, Machine
niques, mainly feature selection approaches are utilized to overcome learning applications in cancer prognosis and prediction, Computational and
this deterioration inaccuracy. The survey reviewed the state of the art of structural biotechnology journal 13 (2015) 8–17.
feature selection and classification techniques. The review showed that [11] H. Aydadenta, A Clustering Approach for Feature Selection in Microarray Data
Classification Using Random forest, Journal of Information Processing Systems 14
SVM is the most applied classification algorithm and achieved a high (2018).
result of about 94.87% with hybrid feature selection (IG-GWO). As [12] V. Bolón-Canedo, N. Sánchez-Marono, A. Alonso-Betanzos, J.M. Benítez,
future work, a hybrid feature selection technique based on a heuristic F. Herrera, A review of microarray datasets and applied feature selection methods,
Information Sciences 282 (2014) 111–135.
search algorithm will be examined to obtain a more accurate result. [13] J.R. Vergara, P.A. Estévez, A review of feature selection methods based on mutual
information, Neural computing and applications 24 (2014) 175–186.
[14] B. Azhagusundari, A.S. Thanamani, Feature selection based on information gain,
Declaration of Competing Interest
International Journal of Innovative Technology and Exploring Engineering
(IJITEE) 2 (2013) 18–21.
The authors declare that they have no known competing financial [15] M.A. Hall, L.A. Smith, Feature selection for machine learning: comparing a
correlation-based filter approach to the wrapper, FLAIRS conference (1999)
interests or personal relationships that could have appeared to influence
235–239.
the work reported in this paper. [16] N. Almugren, H. Alshamlan, A survey on hybrid feature selection methods in
microarray gene expression data for cancer classification, IEEE Access 7 (2019)
References 78533–78548.
[17] M.S. Hossain, A. El-Shafie, Application of artificial bee colony (ABC) algorithm in
search of optimal release of Aswan High Dam, Journal of Physics: Conference
[1] F. Bray, J. Ferlay, I. Soerjomataram, R.L. Siegel, L.A. Torre, A. Jemal, Global cancer Series, IOP Publishing (2013), 012001.
statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 [18] D. Wang, D. Tan, L. Liu, Particle swarm optimization algorithm: an overview, Soft
cancers in 185 countries, CA: a cancer journal for clinicians 68 (2018) 394–424. Computing 22 (2018) 387–408.
[2] N. Eliyatkın, E. Yalçın, B. Zengel, S. Aktaş, E. Vardar, Molecular classification of [19] X.-S. Yang, A new metaheuristic bat-inspired algorithm, Nature inspired
breast carcinoma: from traditional, old- fashioned way to a new age, and a new cooperative strategies for optimization (NICSO, Springer 2010 (2010) 65–74.
way, The journal of breast health 11 (2015) 59. [20] S.A. Abdulrahman, W. Khalifa, M. Roushdy, A.M. Salem, Comparative study for 8
[3] Lindsey A. Torre, Freddie Bray, Rebecca L. Siegel, Jacques Ferlay, Joannie Lortet- computational intelligence algorithms for human identification, Comput. Sci. Rev.
Tieulent, Ahmedin Jemal, Global cancer statistics, 2012: Global Cancer Statistics, 36 (2020), 100237.
2012, CA: A Cancer Journal for Clinicians 65 (2) (2015) 87–108, https://doi.org/ [21] Widiawati, I.F., Nugrahapraja, H., Fajriyah, R. (2018). K-Nearest Neighbor (KNN)
10.3322/caac.21262. Analysis on Genes Expression Datasets of Maize Nested Association Mapping
[4] R. Priya, P.S. Vadivu, A Review on Data Mining Techniques for Prediction of Breast (NAM) Showed Confident Classification on Organ-specific Expression. 2018 1st
Cancer Recurrence, International Journal of Engineering and Management International Conference on Bioinformatics, Biotechnology, and Biomedical
Research (IJEMR) 9 (2019) 142–146. Engineering - Bioinformatics and Biomedical Engineering, 1, 1-3.
[5] M.D. Purbolaksono, K.C. Widiastuti, M.S. Mubarok, F.A. Ma’ruf, Implementation of [22] B. Sahu, S. Dehuri, A.K. Jagadev, Feature selection model based on clustering and
mutual information and bayes theorem for classification microarray data, Journal ranking in pipeline for microarray data, Informatics in Medicine Unlocked 9 (2017)
of Physics: Conference Series, IOP Publishing (2018), 012011. 107–122.
[6] M.A. Makary, M. Daniel, Medical error—the third leading cause of death in the US, [23] Q. Wu, Z. Ma, J. Fan, G. Xu, Y. Shen, A feature selection method based on hybrid
Bmj 353 (2016). improved binary quantum particle swarm optimization, IEEE Access 7 (2019)
[7] H.J. Hong, W.S. Koom, W.-G. Koh, Cell microarray technologies for high- 80588–80601.
throughput cell-based biosensors, Sensors 17 (2017) 1293. [24] M.S. Al-Batah, B.M. Zaqaibeh, S.A. Alomari, M.S. Alz-boon, Gene Microarray
[8] N.D. Cilia, C. De Stefano, F. Fontanella, S. Raimondo, A. Scotto di Freca, An Cancer Classification using Correlation Based Feature Selection Algorithm and
experimental comparison of feature- selection and classification methods for Rules Classifiers, International Journal of Online and Biomedical Engineering
microarray datasets, Information 10 (2019) 109. (iJOE) 15 (2019) 62–73.
[9] Z. Yu, H. Chen, J. You, H.-S. Wong, J. Liu, L. Li, G. Han, Double selection based
semi-supervised clustering ensemble for tumor clustering from gene expression

8
M. Abd-Elnaby et al. Journal of Biomedical Informatics 117 (2021) 103764

[25] L. Gao, M. Ye, C. Wu, Cancer classification based on support vector machine [42] R.K. Singh, M. Sivabalakrishnan, Microarray Gene Expression Data Classification
optimized by particle swarm optimization and artificial bee colony, Molecules 22 using a Hybrid Algorithm: MRMRAGA, International Journal of Innovative
(2017) 2086. Technology and Exploring Engineering (IJITEE) August (8) (2019).
[26] S.K. Baliarsingh, C. Dora, S. Vipsita, Jaya Optimized Extreme Learning Machine for [43] A. Nagpal, V. Singh, A feature selection algorithm based on qualitative mutual
Breast Cancer Data Classification, Springer Singapore, Singapore, 2021, information for cancer microarray data, Procedia computer science, 132 (2018)
pp. 459–467. 244–252, Biotechnology Journal 10 (2016).
[27] Q. Su, Y. Wang, X. Jiang, F. Chen, W.-C. Lu, A cancer gene selection algorithm [44] Loey M, Jasim MW, EL-Bakry HM, Taha MHN, Khalifa NEM. Breast and Colon
based on the KS test and CFS, BioMed research international 2017 (2017). Cancer Classification from Gene Expression Profiles Using Data Mining Techniques.
[28] F.K. Ahmad, A comparative study on gene selection methods for tissues Symmetry. 2020;12:408.
classification on large scale gene expression data, Jurnal Teknologi 78 (2016) [45] M. Hamim, I. El Moudden, H. Moutachaouik, M. Hain, Decision Tree Model Based
116–125. Gene Selection and Classification for Breast Cancer Risk Prediction, Springer
[29] S.A. Medjahed, T.A. Saadi, A. Benyettou, M. Ouali, Kernel-based learning and International Publishing, Cham, 2020, pp. 165–177.
feature selection analysis for cancer diagnosis, Applied Soft Computing. 51 (2017) [46] P. Jinthanasatian, S. Auephanwiriyakul, N. Theera-Umpon, Microarray data
39–48. classification using neuro-fuzzy classifier with firefly algorithm, 2017 IEEE
[30] I. Jain, V.K. Jain, R. Jain, Correlation feature selection based improved-Binary Symposium Series on Computational Intelligence (SSCI), IEEE, 2017, pp. 1-6.
Particle Swarm Optimization for gene selection and cancer classification, Appl Soft [47] Z. Li, W. Xie, T. Liu, Efficient feature selection and classification for microarray
Comput. 62 (2018) 203–215. data, PloS one 13 (2018).
[31] S. Shahbeig, M.S. Helfroush, A. Rahideh, A fuzzy multi-objective hybrid TLBO-PSO [48] U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, A.J. Levine, Broad
approach to select the associated genes with breast cancer, Signal Process. 131 patterns of gene expression revealed by clustering analysis of tumor and normal
(2017) 58–65. colon tissues probed by oligonucleotide arrays, Proceedings of the National
[32] H. Lu, J. Chen, K. Yan, Q. Jin, Y. Xue, Z. Gao, A hybrid feature selection algorithm Academy of Sciences 96 (1999) 6745–6750.
for gene expression data classification, Neurocomputing 256 (2017) 56–62. [49] E.F. Petricoin III, A.M. Ardekani, B.A. Hitt, P.J. Levine, V.A. Fusaro, S.M. Steinberg,
[33] O.A. Alomari, A.T. Khader, M.A. Al-Betar, Z.A.A. Alyasseri, A hybrid filter-wrapper G.B. Mills, C. Simone, D.A. Fishman, E.C. Kohn, Use of proteomic patterns in serum
gene selection method for cancer classification, 2018 2nd International Conference to identify ovarian cancer, The lancet 359 (2002) 572–577.
on BioSignal Analysis, Processing and Systems (ICBAPS), IEEE, 2018, pp. 113- 118. [50] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasen-beek, J.P. Mesirov,
[34] S. Turgut, M. Dağtekin, T. Ensari, Microarray breast cancer data classification using H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, Molecular classification of
machine learning methods, 2018 Electric Electronics, Computer Science, cancer: class discovery and class prediction by gene expression monitoring, science
Biomedical Engineerings’ Meeting (EBBT), IEEE, 2018, pp. 1-3. 286 (1999) 531–537.
[35] M.M. Mufassirin, R.G. Ragel, A novel filter-wrapper based feature selection [51] L.J. Van’t Veer, H. Dai, M.J. Van De Vijver, Y.D. He, A.A. Hart, M. Mao, H.L.
approach for cancer data classification, 2018 IEEE International Conference on Peterse, K. Van Der Kooy, M.J. Marton, A.T. Witteveen, Gene expression profiling
Information and Automation for Sustainability (ICIAfS), IEEE, 2018, pp. 1-6. predicts clinical outcome of breast cancer, nature, 415 (2002) 530-536.
[36] R.S. Sreepada, S. Vipsita, P. Mohapatra, An efficient approach for microarray data [52] G.J. Gordon, R.V. Jensen, L.-L. Hsiao, S.R. Gullans, J.E. Blu-menstock,
classification using filter wrapper hybrid approach, 2015 IEEE International S. Ramaswamy, W.G. Richards, D.J. Sugarbaker, R. Bueno, Translation of
Advance Computing Conference (IACC), IEEE, 2015, pp. 263-267. microarray data into clinically relevant cancer diagnostic tests using gene
[37] S.S. Hameed, F.F. Muhammad, R. Hassan, F. Saeed, Gene Selection and expression ratios in lung cancer and mesothelioma, Cancer research 62 (2002)
Classification in Microarray Datasets using a Hybrid Approach of PCC-BPSO/GA 4963–4967.
with Multi Classifiers, JCS. 14 (2018) 868–880. [53] Y. Wang, J.G. Klijn, Y. Zhang, A.M. Sieuwerts, M.P. Look, F. Yang, D. Talantov,
[38] H. Salem, G. Attiya, N. El-Fishawy, Classification of human cancer diseases by gene M. Timmermans, M.E. Meijer-van Gelder, J. Yu, Gene-expression profiles to predict
expression profiles, Applied Soft Computing 50 (2017) 124–134. distant metastasis of lymph-node-negative primary breast cancer, The Lancet 365
[39] D. Utami, Z. Rustam, Gene selection in cancer classification using hybrid method (2005) 671–679.
based on Particle Swarm Optimization (PSO), Artificial Bee Colony (ABC) feature [54] D. Singh, P. Febbo, K. Ross, D. Jackson, J. Manola, C. Ladd, P. Tamayo,
selection and support vector machine, AIP Conference Proceedings, AIP Publishing A. Renshaw, A. D’Amico, J. Richie, E. Lander, M. Loda, P. Kantoff, T. Golub,
LLC (2019), 020047. W. Sellers, Gene Expression Correlates of Clinical Prostate Cancer Behavior, Cancer
[40] W. Zhongxin, S. Gang, Z. Jing, Z. Jia, Feature selection algorithm based on mutual cell 1 (2002) 203–209.
information and Lasso for microarray data, The Open Biotechnology Journal 10 [55] S.L. Pomeroy, P. Tamayo, M. Gaasenbeek, L.M. Sturla, M. Angelo, M.
(2016). E. McLaughlin, J.Y. Kim, L.C. Goumnerova, P.M. Black, C. Lau, Prediction of
[41] M. Sardana, R. Agrawal, B. Kaur, A hybrid of clustering and quantum genetic central nervous system embryonal tumour outcome based on gene expression,
algorithm for relevant genes selection for cancer microarray data, International Nature 415 (2002) 436–442.
Journal of Knowledge-based and Intelligent Engineering Systems 20 (2016)
161–173.

Breast Cancer Prediction Using Machine Learning
No ratings yet
Breast Cancer Prediction Using Machine Learning
8 pages
Enhancing Breast Cancer Diagnosis: A Comparative Analysis of Feature Selection Techniques
No ratings yet
Enhancing Breast Cancer Diagnosis: A Comparative Analysis of Feature Selection Techniques
11 pages
2019-05 Machine Learning Techniques For Detecting and Predicting Breast Cancer
No ratings yet
2019-05 Machine Learning Techniques For Detecting and Predicting Breast Cancer
5 pages
2012 IJCSE Gene Expression
No ratings yet
2012 IJCSE Gene Expression
6 pages
TSP_CMC_44065
No ratings yet
TSP_CMC_44065
26 pages
Neon DNA The Human Body Recipe Presentation
No ratings yet
Neon DNA The Human Body Recipe Presentation
18 pages
Neural Network
No ratings yet
Neural Network
15 pages
Cancer Classification of Bioinformatics Data Using ANOVA: A. Bharathi, Dr.A.M.Natarajan
No ratings yet
Cancer Classification of Bioinformatics Data Using ANOVA: A. Bharathi, Dr.A.M.Natarajan
5 pages
Classification of Breast Cancer Detection by Using Machine Learning Technique
No ratings yet
Classification of Breast Cancer Detection by Using Machine Learning Technique
5 pages
Supervised Learning Approach For Human Liver Cancer Diagnosis
No ratings yet
Supervised Learning Approach For Human Liver Cancer Diagnosis
10 pages
Breast Cancer Classification and Prediction Using Machine Learning IJERTV9IS020280
No ratings yet
Breast Cancer Classification and Prediction Using Machine Learning IJERTV9IS020280
5 pages
s40537 019 0247 7
No ratings yet
s40537 019 0247 7
15 pages
Cancerous Profiles - 2017 - Conference - Paper
No ratings yet
Cancerous Profiles - 2017 - Conference - Paper
6 pages
Anembeddedfeatureselectionmethodbasedongeneralizedclassifierneural Network For Cancer Classification
No ratings yet
Anembeddedfeatureselectionmethodbasedongeneralizedclassifierneural Network For Cancer Classification
11 pages
Breast Cancer Detection Using GA Feature Selection and Rotation Forest
No ratings yet
Breast Cancer Detection Using GA Feature Selection and Rotation Forest
11 pages
Grdjev06i010003 PDF
No ratings yet
Grdjev06i010003 PDF
4 pages
A Machine Learning Based Framework For Breast Cancer Prediction Using Biomarkers
No ratings yet
A Machine Learning Based Framework For Breast Cancer Prediction Using Biomarkers
8 pages
Expert Systems With Applications: Huey Fang Ong, Norwati Mustapha, Hazlina Hamdan, Rozita Rosli, Aida Mustapha
No ratings yet
Expert Systems With Applications: Huey Fang Ong, Norwati Mustapha, Hazlina Hamdan, Rozita Rosli, Aida Mustapha
18 pages
Multilevel Classification Algorithm Using Diagnosis and Prognosis of Breast Cancer
No ratings yet
Multilevel Classification Algorithm Using Diagnosis and Prognosis of Breast Cancer
3 pages
Inteligencia Artificial
No ratings yet
Inteligencia Artificial
15 pages
br inel
No ratings yet
br inel
11 pages
Mining Big Data: Breast Cancer Prediction Using DT - SVM Hybrid Model
No ratings yet
Mining Big Data: Breast Cancer Prediction Using DT - SVM Hybrid Model
12 pages
Breast Cancer Prediction a Comparative S-1
No ratings yet
Breast Cancer Prediction a Comparative S-1
14 pages
Research Paper Diagnosis
No ratings yet
Research Paper Diagnosis
10 pages
Otro
No ratings yet
Otro
16 pages
yaghoobi2020
No ratings yet
yaghoobi2020
11 pages
1 s2.0 S1877050923001102 Main
No ratings yet
1 s2.0 S1877050923001102 Main
7 pages
Sahana S_1BI22MC086
No ratings yet
Sahana S_1BI22MC086
47 pages
br inel
No ratings yet
br inel
11 pages
br old
No ratings yet
br old
8 pages
Breast Cancer Detection With Machine Learning
No ratings yet
Breast Cancer Detection With Machine Learning
7 pages
1599311465islam2020 Article BreastCancerPredictionACompara
No ratings yet
1599311465islam2020 Article BreastCancerPredictionACompara
14 pages
Machine Learning Models For Breast Cancer Classifi
No ratings yet
Machine Learning Models For Breast Cancer Classifi
13 pages
Machine Learning Based Approaches For Cancer Classification Using Gene Expression Data
No ratings yet
Machine Learning Based Approaches For Cancer Classification Using Gene Expression Data
12 pages
Research Paper Final
No ratings yet
Research Paper Final
11 pages
Almugren, Alshamlan - 2019 - A Survey On Hybrid Feature Selection Methods in Microarray Gene Expression Data For Cancer Classification
No ratings yet
Almugren, Alshamlan - 2019 - A Survey On Hybrid Feature Selection Methods in Microarray Gene Expression Data For Cancer Classification
16 pages
Feature Selection For Breast Cancer Detection Using Machine Learning Algorithms
No ratings yet
Feature Selection For Breast Cancer Detection Using Machine Learning Algorithms
4 pages
Diagnosis of Breast Tumours and Evaluation of Prognostic Risk by Using Machine Learning Approaches
No ratings yet
Diagnosis of Breast Tumours and Evaluation of Prognostic Risk by Using Machine Learning Approaches
11 pages
Applications of Machine Learning in Cancer Prediction and Prognosis
No ratings yet
Applications of Machine Learning in Cancer Prediction and Prognosis
19 pages
Breast Cancer Prediction Model Assignment
No ratings yet
Breast Cancer Prediction Model Assignment
37 pages
New Microsoft Word Document (2)
No ratings yet
New Microsoft Word Document (2)
7 pages
csit110713_2
No ratings yet
csit110713_2
16 pages
Applications of Machine Learning Techniques To Predict Diagnostic Breast Cancer
No ratings yet
Applications of Machine Learning Techniques To Predict Diagnostic Breast Cancer
11 pages
Expert Systems With Applications: Bichen Zheng, Sang Won Yoon, Sarah S. Lam
No ratings yet
Expert Systems With Applications: Bichen Zheng, Sang Won Yoon, Sarah S. Lam
7 pages
Optimized Gene Classification Using Support Vector Machine With Convolutional Neural Network For Cancer Detection From Gene Expression Microarray Data
No ratings yet
Optimized Gene Classification Using Support Vector Machine With Convolutional Neural Network For Cancer Detection From Gene Expression Microarray Data
9 pages
Breast Cancer Classification Using Machine Learning
No ratings yet
Breast Cancer Classification Using Machine Learning
9 pages
Yuuy
No ratings yet
Yuuy
5 pages
Prediction of Breast Cancer Using Supervised Machine Learning Techniques
No ratings yet
Prediction of Breast Cancer Using Supervised Machine Learning Techniques
5 pages
Utilizing Cutting-Edge Machine Learning Methods fo_241221_101813 paper
No ratings yet
Utilizing Cutting-Edge Machine Learning Methods fo_241221_101813 paper
7 pages
Article Review
No ratings yet
Article Review
6 pages
Predicting Breast Cancer Recurrence Using Effective Classification and Feature Selection Technique
No ratings yet
Predicting Breast Cancer Recurrence Using Effective Classification and Feature Selection Technique
1 page
Goni 2020
No ratings yet
Goni 2020
5 pages
Microarray Time Series
No ratings yet
Microarray Time Series
19 pages
High-dimensional Microarray Data Analysis: Cancer Gene Diagnosis and Malignancy Indexes by Microarray Shuichi Shinmura - Quickly download the ebook to explore the full content
100% (2)
High-dimensional Microarray Data Analysis: Cancer Gene Diagnosis and Malignancy Indexes by Microarray Shuichi Shinmura - Quickly download the ebook to explore the full content
63 pages
A Homogeneous Ensemble Classifier For Breast Cancer Detection Using Parameters Tuning of MLP Neural
No ratings yet
A Homogeneous Ensemble Classifier For Breast Cancer Detection Using Parameters Tuning of MLP Neural
22 pages
CRISP-MED-DM A Methodology of Diagnosing Breast Cancer
No ratings yet
CRISP-MED-DM A Methodology of Diagnosing Breast Cancer
13 pages
Plagiarism1 - Report
No ratings yet
Plagiarism1 - Report
8 pages
Computer Science Extended Essay First Draft (Second Version)
No ratings yet
Computer Science Extended Essay First Draft (Second Version)
10 pages
Smart Business Problems and Analytical Hints in Cancer Research
From Everand
Smart Business Problems and Analytical Hints in Cancer Research
Zemelak Goraga
No ratings yet
A Patient's Guide to Cancer: Understanding the Causes and Treatments of a Complex Disease
From Everand
A Patient's Guide to Cancer: Understanding the Causes and Treatments of a Complex Disease
John F McDonald
No ratings yet
s12863-023-01123-8
No ratings yet
s12863-023-01123-8
9 pages
Toward Deep MRI Segmentation for Alzheim
No ratings yet
Toward Deep MRI Segmentation for Alzheim
17 pages
1-s2.0-S105381191931050X-main
No ratings yet
1-s2.0-S105381191931050X-main
15 pages
s41467-024-53851-9
No ratings yet
s41467-024-53851-9
12 pages
FP Growth Example
No ratings yet
FP Growth Example
6 pages
Unit 3. Alphabet of Lines
No ratings yet
Unit 3. Alphabet of Lines
6 pages
2foundation Moments Hydraulics and CM Self Study Questions
No ratings yet
2foundation Moments Hydraulics and CM Self Study Questions
40 pages
Fundamentals of Engineering Thermodynamics 2nd Edition Ethirajan Rathakrishnan download
No ratings yet
Fundamentals of Engineering Thermodynamics 2nd Edition Ethirajan Rathakrishnan download
54 pages
12th Geography Important Questions With Solutions Watermarked
No ratings yet
12th Geography Important Questions With Solutions Watermarked
79 pages
Phase 1: Interlocutor: Speaking Test Part 1 (3 - 4 Mins) : General Questions
No ratings yet
Phase 1: Interlocutor: Speaking Test Part 1 (3 - 4 Mins) : General Questions
8 pages
Artificial Intelligence and Machine Learning Techniques For Civil Engineering (Vagelis Plevris, Afaq Ahmad, Nikos D. Lagaros) (Z-Library)
100% (1)
Artificial Intelligence and Machine Learning Techniques For Civil Engineering (Vagelis Plevris, Afaq Ahmad, Nikos D. Lagaros) (Z-Library)
404 pages
STD - 4 Air, Water and Weather - ppt1
No ratings yet
STD - 4 Air, Water and Weather - ppt1
17 pages
Key Mock Test 4
No ratings yet
Key Mock Test 4
3 pages
Kamal SOP For USA
No ratings yet
Kamal SOP For USA
4 pages
Grade 9 NS Forces 5 Field Forces Gravitation Mass and Weight
No ratings yet
Grade 9 NS Forces 5 Field Forces Gravitation Mass and Weight
4 pages
Heat and Thermodynamics: This Chapter Covers
No ratings yet
Heat and Thermodynamics: This Chapter Covers
10 pages
Mod Pedstrian Safety
No ratings yet
Mod Pedstrian Safety
69 pages
Natural Disasters
No ratings yet
Natural Disasters
3 pages
In Vitro Multiplication and Protocorm Development of Dendrobiumaphyllum (2011)
No ratings yet
In Vitro Multiplication and Protocorm Development of Dendrobiumaphyllum (2011)
7 pages
Addressing The Problems of Food Waste Generation in Malaysia
No ratings yet
Addressing The Problems of Food Waste Generation in Malaysia
11 pages
Milner__1966_
No ratings yet
Milner__1966_
2 pages
Grade 11 Capricorn North Common Task
No ratings yet
Grade 11 Capricorn North Common Task
7 pages
Direct and Indirect Speech
No ratings yet
Direct and Indirect Speech
11 pages
New Jedi Order
No ratings yet
New Jedi Order
6 pages
Detlefsen (Auth.) - Hilbert's Program - An Essay On Mathematical Instrumentalism-Springer Netherlands (1986)
No ratings yet
Detlefsen (Auth.) - Hilbert's Program - An Essay On Mathematical Instrumentalism-Springer Netherlands (1986)
198 pages
(MWJ0306) RF Test Fixture Basics
100% (1)
(MWJ0306) RF Test Fixture Basics
6 pages
Lesson 1 Oral Communication in Context
No ratings yet
Lesson 1 Oral Communication in Context
18 pages
STM 006 Module 15 Explaining Entrophy and Second Law of Thermodynamics
No ratings yet
STM 006 Module 15 Explaining Entrophy and Second Law of Thermodynamics
39 pages
UCSP First-Quarter M06
No ratings yet
UCSP First-Quarter M06
15 pages
What Is The Green Building Index
No ratings yet
What Is The Green Building Index
2 pages
Multicollinearity and Regression Analysis
No ratings yet
Multicollinearity and Regression Analysis
12 pages
Earth'S Ecology 3: Weeks
No ratings yet
Earth'S Ecology 3: Weeks
91 pages
Metrology & Instrumentation Course File1
No ratings yet
Metrology & Instrumentation Course File1
112 pages
Skripsi Riska Rahman PDF
No ratings yet
Skripsi Riska Rahman PDF
130 pages
New TIP Course 3 DepEd Teacher
No ratings yet
New TIP Course 3 DepEd Teacher
77 pages

Breast Cancer Gene Expression

Uploaded by

Breast Cancer Gene Expression

Uploaded by

Journal of Biomedical Informatics 117 (2021) 103764

Contents lists available at ScienceDirect

Journal of Biomedical Informatics

Classification of breast cancer using microarray gene expression data:

Fig. 1. The pipeline of microarray analysis.

Fig. 2. The taxonomy of the feature selection techniques.

analyzing gene expression data. Applying ML improves the accuracy of

Fig. 3. The taxonomy of the machine learning techniques.

2.2.1. Wrapper approach 2.3. Machine learning

Purbolaksono et al. [5] MI BN Colon [48] 86.7% NA

Aydadenta and Adiwijaya.[11] K-means -Relief RF Colon [48] 85.87% NA

Fig. 4. The performance result of breast cancer dataset.

You might also like