2012 IJCSE Gene Expression
2012 IJCSE Gene Expression
Abstract— Despite of an increased global effort to end breast cancer, it continues to be most common
cancer deaths in women. This problem reminds that new therapeutic approaches are desperately needed
to improve patient survival rate. This requires proper diagnosis of disease and classification of tumor
type based on genomic information according to which proper treatment can be provided to the patient.
There exists a no. of classification techniques to classify the tumor types. In this paper we have focused on
three different classification techniques: BPN, FLANN and PSO-FLANN and found that the integrated
approach of Functional Link Artificial Neural Network (FLANN) and Particle Swarm Optimization
(PSO) can better predict the disease as compared to other method.
I. INTRODUCTION
Breast cancer is a heterogeneous disease with respect to molecular alteration, cellular composition, and
clinical outcome. This diversity creates a challenge in developing tumor classifications that are clinically useful
with respect to prognosis or prediction. There are two types of breast cancer: ductal carcinomas and lobular
carcinomas. Cancers originating from ducts are known as ductal carcinomas and cancers originating from
lobules are known as lobular carcinomas. The high incidence of breast cancer is emerging as a public health
problem in the country in past decades. While there is no single reason for the escalating incidence of breast
cancer, obesity, dietary habits, physical inactivity and the over-use of hormonal pills are the main causes.
Prognosis and survival rates for breast cancer vary greatly depending on the cancer type, stage, treatment and
geographic location of the patient. Breast cancer is the most commonly diagnosed form of cancer in women
accounting for about 30% of all cases. Some studies, which were based on low capability and poorly calibrated
equipment, infrared imaging has been shown to be well suited for task of detecting breast cancer, in particular
when the tumor is in its early stages or in dense tissue [1].
Microarrays are a powerful tool for biologists as they enable the simultaneous measurement of the
expression levels of thousands of genes per tissue sample [10]. Microarray analysis is a widely used technology
for studying gene expression on a global scale. Gene expression profiling by DNA microarray has become an
important tool for studying the transcriptome of cancer cells and has been successfully used in many studies of
tumor classification and identification of marker genes associated with cancer. With an increasing number of
microarray data becoming available, the comparative of study on normal tissue versus tumor tissue has gained
high importance.
Microarray breast cancer event prediction, however, has proven to be difficult, as few classification rules are
able to obtain a balanced accuracy rate of over 70%, when properly validated. These performance indicators are
also often associated with wide confidence intervals. Signature composition strongly depends on the subset of
patient samples used for feature selection. In recent years many different signatures have been proposed, mostly
derived using different patient populations and/or array technologies. Although the overall performance of these
signatures is comparable, there is often a high level of inconsistency between class assignments obtained using
different signatures. One of the challenging aspects of microarray data is that they are subject to various sources
of technical variation, arising from the many experimental laboratory steps needed to get from a tissue sample to
an array scan. The noises can be removed from microarray data using some preprocessing methods.
The goal of this study is to investigate the benefit of performing supervised classification analyses on
microarray data [6]. Methods of supervised classification analysis render it possible to automatically build
classifiers that distinguish among specimens on the basis of predefined class label information (phenotypes) and
in many cancer research studies; the application of these methods has shown promising results of improved
tumor diagnosis and prognosis. In this paper, an integrated approach of functional link artificial intelligence
(FLANN) [2] and particle swarm optimization (PSO) [3] is used to build a more reliable Classifier. This
approach incorporates gene features and FLANN parameters into one common solution code. The problem is
solved by selecting gene features and optimizing the parameters of the FLANN classifier.
II. CLASSIFICATION WITH BPN MODEL
Neural networks with back propagation technique can be used for classification task such as character
recognition, voice recognition, medical application etc [7, 8, 9]. Neurons in neural network have weighted
inputs, threshold values, activation function and an output where activation function=f (∑ (inputs * weights).
Threshold values play an important role in deciding the output. Backpropagation technique can be applied to
multi-layered neural network. Neural networks learn by example. That means, during training we have to apply
input-output pair from the training set to the neural network. Training continues till the network is well-trained.
After that the performance of the network can be checked by applying the patterns belonging to testing set. The
increasing no. of hidden layers results in the computational complexity of the network. The time taken for
convergence and to minimize the error may be very high due to this reason.
than the BPN and FLANN-based gene classification techniques. More accuracy and less error rate leads to
effective classification of the given microarray gene data to the actual class of the gene. From the above training
and testing phases of BPN, FLANN and PSO-FLANN, we observed that FLANN with PSO provides good
results than BPN and FLANN.
TABLE I. CONFUSION MATRIX FOR BREAST CANCER DATA SET USING BPN
Predicted
Class 1 Class 2
Actual Class 1 62 15
Class 2 46 16
TABLE II. CONFUSION MATRIX FOR BREAST CANCER DATA SET USING FLANN
Predicted
Class 1 Class 2
Actual Class 1 56 21
Class 2 30 32
TABLE III. CONFUSION MATRIX FOR BREAST CANCER DATA SET USING PSO-FLANN
Predicted
Class 1 Class 2
Actual Class 1 70 7
Class 2 4 58