Deep Learning for Biomedical Data Analysis Techniques, Approaches, and Applications Dropbox Download
Deep Learning for Biomedical Data Analysis Techniques, Approaches, and Applications Dropbox Download
Visit the link below to download the full version of this book:
https://medipdf.com/product/deep-learning-for-biomedical-data-analysis-technique
s-approaches-and-applications/
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Contents
v
vi Contents
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
Part I
Deep Learning for Biomedical
Data Analysis
1-Dimensional Convolution Neural
Network Classification Technique for
Gene Expression Data
1 Introduction
ated from different sources, including laboratory experiments, medical records, etc.
Different types of biological data include nucleotide sequences, gene expression
data, macromolecular 3D structure, metabolic pathways, protein sequences, protein
patterns or motifs and medical images [1]. Unlike a genome, which provides only
static sequence information, microarray experiments produce gene expression pat-
terns that provide cell functions dynamic information. Understanding the biological
intercellular and intra-cellular processes underlying many diseases is essential for
improving the sample classification for diagnostic and prognostic purposes and
patient treatments.
Biomedical specialists are attempting to find relationships among genes and
disease or formative stages, as well as relationships between genes. For example,
an application of microarrays is the revelation of novel biomarkers for cancer,
which can give increasingly exact determination and monitoring tools for early
recognition of a specific subtype of disease or assessment of the viability of a
particular treatment protocol. Different technologies are used to interpret these
biological data. For example, microarray technology is useful for measuring the
expression levels of a large number of genes under different environmental con-
ditions, and Next Generation Sequencing (NGS) Technology for massively parallel
DNA sequencing. This kind of experiments on a large amount of biological data
leads to an absolute requirement of collection, storage and computational analysis
[2].
In the last decade, biological data analytics has improved with the development
of associated techniques such as Machine Learning (ML), Evolutionary Algorithms
(EA) and Deep Learning (DL). These techniques are capable of handling more
complex relationships in the biological data. For example, the prediction of cancer
disease from the microarray data can be carried out using different ML algorithms
(classification and clustering). While dealing with the microarray datasets, which
has high dimensionality, are usually complex and noisy makes the classification
task inconvenient [3, 4]. Table 1 presents the gene expression data format.
Due to this high dimensionality and redundancy, usual classification methods
became challenging to apply on gene expression data efficiently. To reduce the
problem of high dimensionality, improving learning accuracy and removing
irrelevant data from gene expression data, many filter [48] and wrapper [49]
approaches were applied. The filter method selects feature subsets independently of
any learning algorithm and relies on various measures of the general characters
of the training data. The wrapper method uses the predictive accuracy of a
1-Dimensional Convolution Neural Network Classification Technique for Gene. . . 5
2 Related Works
3 Preliminaries
In this section, we present the details of three various filter approaches that can apply
on microarray gene expression data, deep learning and CNN.
vi = I (Xi ; Y ) (1)
1-Dimensional Convolution Neural Network Classification Technique for Gene. . . 7
1
zi = I (Xi ; Xj ) (2)
|XS |
Xj ∈XS
In every step, this method chooses the variable having the best trade-off between
relevance and redundancy. This selection approach is fast and efficient. At step d of
forwarding search, the algorithm computes n − d evaluations where each evaluation
requires the estimation of d + 1 bivariate densities (one for each already selected
variable and one for with the output). Therefore, MRMR avoids the estimation of
multivariate densities by using multiple bivariate densities.
In [20], the authors justification of MRMR is as follows:
with
n
R(X1 ; X2 ; . . . ; Xn ) = H (Xi ) − H (X) (5)
i=1
and
n
R(X1 ; X2 ; . . . ; Xn ; Y ) = H (Xi ) + H (Y ) − H (X, Y ) (6)
i=1
Hence
where:
• The minimum of the 2nd term R(X1 ; X2 ; . . . ; Xn ) is reached for independent
variables since, in that case, H (X) = Σi H (Xi ) and R(X1 ; X2 ; . . . ; Xn ) =
Σi H (Xi ) − H (X) = 0. Hence, if XS is already selected, a variable Xi should
have a minimal redundancy I (Xi ; XS ) with the subset. However, according to
the authors the approximation of I (Xi ; XS ) is with |S|
1
Σj ∈S I (Xi ; Xj ).
• The maximum of the 1 term R(X1 ; X2 ; . . . ; Xn ; Y ) is attained for maximally
st
dependent variables.
Qualitatively, in a sequential setting where a selected subset XS is
given, independence between the variables in X is achieved by minimizing
|XS | ΣXj ∈XS I (Xi ; Xj ) I (Xi ; XS ) and maximizing dependency between the
1
In recent studies, the focus has been on the Fisher ranking measure [23], and this
metric has proven its robustness against data scarcity [24] in eliminating the weak
features, in this work, we used Fisher criterion to rank the genes. The Fisher criterion
calculation for genes is as followed in Eq. 8.
(μj 1 − μj 2 )2
F C(j ) = (8)
σj21 − σj22
Where, μj c is the sample mean of gene j in class c and σj2c is variance of gene j
in c. The top N genes possessing the highest Fisher value are to be selected for the
next step.
Using this k-Means and Signal-to-Noise Ratio (KM-SNR) filter method [25], ini-
tially, all the genes are grouped into different clusters by k-means (KM) algorithm.
Then genes in each cluster are positioned separately using the Signal-to-Noise Ratio
(SNR) ranking method. These two methods are applied to overcome the redundancy
issue of the gene selection process and to decrease the search space complexity. Best
ranked genes from each cluster then sent to the next stage.
Here, Sv represents the SNR value, μ1 and μ2 are the means of class1 and class2
respectively. And, σ1 and σ2 are the standard deviations of class1 and class2
respectively.
DL begins to administer our day by day life, exhibiting such arrangements that must
be envisioned in science fiction movies just a decade earlier. Indeed, the presentation
of the AlexNet, maybe one can think about, has started with the pivotal article
published in the journal, Science, in 2006 by Hinton and Salakhutdinov [36], which
described the importance of “the depth” of an ANN in ML. It fundamentally calls
attention to the way that ANNs with few hidden layers can have an amazing learning
capacity, that further improve with increasing depth—or equivalently the number of
hidden layers. Thus comes the term “Deep” learning, a specific ML branch, which
can handle intricate patterns and objects in enormous size datasets.
DL is a way to deal with ML that has drawn slowly on our knowledge of
the human mind, statistics and applied math as it developed over the past several
10 S. A. B. Parisapogu et al.
decades. In recent years, it has seen enormous growth in its popularity and
usefulness, due in large part to all-powerful computers, more substantial datasets
and techniques to train deeper ANNs. DL has solved increasingly complicated
applications with increasing accuracy over time.
Because of the capacity of learning on multilayered representations, DL is
prevalent in drawing results from complex issues. In this sense, DL is the most
progressive way to be utilized in collecting and processing abstract information
from several layers. Such attributes present DL as an appropriate way to be
considered in dissecting and contemplating gene expression information. The ability
to learning multilayered representations makes DL a flexible procedure in creating
progressively accurate outcomes in a speedier time. Multi-layered representation is
a component that structures the general architecture of DL [37].
ML and DL contrast in terms of performance relying upon the amount of data.
For low dimensionality dataset, DL works inefficiently, as it requires data consisting
of high dimensionality to comprehend learning to be carried out [38].
During the most recent decade, Convolutional Neural Networks (CNNs) [51] has
turned into the de facto standard for different Computer Vision and ML tasks.
CNNs are feed-forward Artificial Neural Networks (ANNs) [52] with alternating
convolutional and subsampling layers. Profound 2D-CNNs with many hidden layers
and with many parameters can learn intricate patterns giving that they train on
a gigantic size visual database with ground-truth labels. Figure 1 visualizes the
pipeline of usual CNN architecture.1
1D-CNNs have been proposed in a few applications, for example, customized
biomedical data classification and early finding, structural health observing,
Dog
Person
Cat
Bird
Convolution Max pooling Fish
Fox
anomaly detection and identification in motor fault detection. Moreover, the real-
time and minimum-cost hardware usage is plausible because of the reduced and
straightforward configuration of 1D-CNNs that perform only 1D convolutions. The
following subsections present a comprehensive review of the general architecture
and principals of 2D-CNNs and 1D-CNNs [40]. The CNNs has three main
components, such as the input layer, hidden layer, and latent layers. These latent
(hidden) layers may categorize as a fully-connected layer, a pooling layer, or a
convolutional layer. The definitions and details are as follows [39, 41]:
Max pooling layer concerns with delivering a few grids from the splitting
convolution layers output. In matrices, most of the grid values used to be
sequenced. Operators are used in performing the calculation on every matrix
to quantify average or maximize value.
Although it has been right around thirty years after the first CNN proposed, present-
day CNN structures still share the underlying properties with the absolute initial
one, for example, convolutional and pooling layers. To begin with, the ubiquity
and the broad scope of utilization areas of deep CNNs can ascribe to the following
advantages:
12 S. A. B. Parisapogu et al.
(24,24) (1,1)
(21,21)
(7,7) (4,4)
Convolution y1
Kx = Ky = 4 Pooling y2
sx = sy = 3 Pooling
Convolution sx = sy = 4
Kx = Ky = 4
Fig. 2 The sample illustration of CNN with two convolution and one fully-connected layers [40]
1. CNNs intertwine the feature extraction and classification procedures into a single
learning body. They can learn to optimize the features during the training stage
legitimately from the raw input.
2. As the CNN neurons are connected sparsely along with tied weights, CNNs can
process more inputs with an extraordinary computational proficiency compared
with the regular fully connected Multi-Layer Perceptrons (MLP) networks.
3. CNNs are resistant to little changes in the input information, including transla-
tion, scaling, skewing, and distortion.
4. CNNs can adapt various sizes of inputs.
In a conventional MLPs, each hidden neuron contains scalar weights, input and
output. In any case, because of the 2D nature of pictures, every neuron in CNN
contains 2-D planes for weights, known as the kernel, and input and outputs which
are known as a feature map. The classification of a 24 × 24 pixel grayscale image
of two categories by conventional CNN is shown in Fig. 2 [40]. This sample CNN
consists of two convolution layers and two pooling layers. The output of the second
pooling layer handled by a fully-connected layer and followed by the output layer
that produces the classification result.
The interconnections assigned with the weighting filters (w) and a kernel size
of (Kx , Ky ), which feeds the convolutional layers. As the convolution happens
inside the boundary limits of the image, the feature map dimension is decreased
to (Kx − 1, Ky − 1) pixels from the width and height, respectively. The values
(Sx , Sy ) initialized in pooling layers as subsampling factors. In the sample Fig. 2,
the kernel sizes of the two convolution layers assigned as Kx = Ky = 4, while
the subsampling elements set as Sx = Sy = 3 for the first pooling layer and
Sx = Sy = 4 for the subsequent one. Note that these values purposely are chosen so
that the last pooling layer (i.e. the input of fully-connected layer) outputs are scalars
(1 × 1). The output layer comprises of two fully-connected neurons relating to the
number of classes to which the image is categorized. The following steps show a
complete forward-propagation process of the given example CNN:
1. For the CNN, a grayscale 24 × 24-pixel image fed as the input layer.
2. Every neuron of the 1st convolution layer performs a linear convolution between
the image and related filter to create the input feature map of the neuron.