Learning Techniques Biomedical Informatics Studies
Learning Techniques Biomedical Informatics Studies
Deep Learning
Techniques
for Biomedical
and Health
Informatics
Studies in Big Data
Volume 68
Series Editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Big Data” (SBD) publishes new developments and advances
in the various areas of Big Data- quickly and with a high quality. The intent is to
cover the theory, research, development, and applications of Big Data, as embedded
in the fields of engineering, computer science, physics, economics and life sciences.
The books of the series refer to the analysis and understanding of large, complex,
and/or distributed data sets generated from recent digital sources coming from
sensors or other physical instruments as well as simulations, crowd sourcing, social
networks or other internet transactions, such as emails or video click streams and
other. The series contains monographs, lecture notes and edited volumes in Big
Data spanning the areas of computational intelligence including neural networks,
evolutionary computation, soft computing, fuzzy systems, as well as artificial
intelligence, data mining, modern statistics and Operations research, as well as
self-organizing systems. Of particular value to both the contributors and the
readership are the short publication timeframe and the world-wide distribution,
which enable both wide and rapid dissemination of research output.
** Indexing: The books of this series are submitted to ISI Web of Science, DBLP,
Ulrichs, MathSciNet, Current Mathematical Publications, Mathematical Reviews,
Zentralblatt Math: MetaPress and Springerlink.
Arpad Kelemen
Editors
123
Editors
Sujata Dash Biswa Ranjan Acharya
Department of Computer Science School of Computer Science
North Orissa University and Engineering
Takatpur, Odisha, India KIIT Deemed to University
Bhubaneswar, Odisha, India
Mamta Mittal
Computer Science and Engineering Ajith Abraham
Department Scientific Network for Innovation
G. B. Pant Government Engineering College and Research Excellence
New Delhi, Delhi, India Machine Intelligence Research Labs
Auburn, AL, USA
Arpad Kelemen
Department of Organizational Systems
and Adult Health
University of Maryland
Baltimore, MD, USA
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Overview
v
vi Preface
research in the most important area of research which has direct impact on bet-
terment of the human life and health. This book would be very useful because there
is no book in the market which provides a good collection of the state-of-the-art
methods of deep learning based models for biomedical and health informatics as
Deep learning is recently emerged and very un-matured field of research in
biomedical and healthcare.
This book, Deep Learning Techniques for Biomedical and Health Informatics,
aims to present discussions on various applications of deep learning relating to the
Biomedical and Health Informatics problems and suggest latest research method-
ologies and emerging developments to benefit the researchers and practitioners. In
this volume, 49 researchers and practitioners of international repute have presented
latest research developments, current trends, state of the art reports, case studies and
suggestions for further development in the field of biomedical and health infor-
matics, and deep learning.
Objective
The purpose of this book is to report the latest advances and developments in the
field of biomedical and health informatics, and deep learning. The book comprises
the following three parts:
• Deep Learning for Biomedical Engineering and Health Informatics
• Deep Learning and Electronics Health Records
• Deep Learning for Medical Image Processing
Organization
There are 16 chapters in Deep Learning Techniques for Biomedical and Health
Informatics. They are organized into three parts, as follows:
• Part One: Deep Learning for Biomedical Engineering and Health Informatics.
This part has a focus on deep learning paradigms and its application in
biomedical and health informatics, clinical decision support systems, disease
diagnosis and monitoring systems and recommender systems for health infor-
matics. There are six chapters in this part. The first chapter looks into the
application of deep learning to healthcare data in the task like information and
relation extraction. The second and third contribution focus on discovery of
biomedical named entities from many biomedical text mining task applying
deep learning techniques. The fourth chapter introduces deep learning and
developments in neural network and then discusses its applications in healthcare
Preface vii
Target Audiences
The editors would like to acknowledge the help of all the people involved in this
project and, more specifically, to the reviewers who took part in the review process.
Without their support, this book would not have become a reality.
First, the editors would like to thank each one of the authors for their time,
contribution, and understanding during the preparation of the book.
Second, the editors wish to acknowledge the valuable contributions of the
reviewers regarding the improvement of quality, coherence, and content presenta-
tion of chapters.
Last but not least, the editors wish to acknowledge the love, understanding, and
support of their family members during the preparation of the book.
ix
Contents
xi
xii Contents
xiii
xiv Editors and Contributors
Contributors
AD Alzheimer’s disease
AdaGrad Adaptive gradient algorithm
ADC Analog-to-digital converter
ADHD Attention-deficit hyperactivity disorder
ADNI Alzheimer’s disease neuroimaging initiative
AE Auto-encoders
AES Advanced Encryption Standard
AFLC Adaptive fuzzy leader clustering algorithm
AHE Adaptive histogram equalization
AI Artificial intelligence
AMBE Absolute mean brightness error
ANFIS Adaptive neuro-fuzzy inference system
ANN Artificial neural network
ANS Autonomic nervous system
Anti-CPP Anti-cyclic citrullinated peptide
ApEn Approximate entropy
AR Autoregressive
AUC Area under curve
AV Atrioventricular
BBHE Bi-histogram equalization
BCHC Birmingham Community Healthcare
BERT Bidirectional Encoder Representations From Transformer
BETA Blackbox Explanations Using Transparent Approximations
Bi-LSTM Bidirectional long short-term memory
BiLM Bidirectional language model
Bio-NER Biomedical named entity recognition
BMESO B-Begin M-Middle E-End S-Single O-Outside
BMI Body mass index
BOE Bag of events
BoW Bag of words
xix
xx Abbreviations
TE Tone entropy
TG Triglycerides
TN True negative
TP True positive
t-SNE T-distributed stochastic neighbor embedding
TT Trapping time
UCI University of California, Irvine
UMLS Unified medical language system
UQI Universal quality index
USF University of Southern California
UTI Urinary tract infection
VAE Variational auto-encoders
VEGF Vascular endothelial growth factor
VGG Visual geometry group
VHL Von-Hippel–Lindau Illness
VIA Visual and image analysis
VP Verb phrases
WBAN Wireless body area network
WBCD Wisconsin Breast Cancer Dataset
WE Word embedding
WHO World Health Organization
WM White matter
XML Extensible Markup Language
Deep Learning for Biomedical
Engineering and Health Informatics
MedNLU: Natural Language
Understander for Medical Texts
Abstract Natural Language Understanding is one of the essential tasks for building
clinical text-based applications. Understanding of these clinical texts can be achieved
through Vector Space Models and Sequential Modelling tasks. This paper is focused
on sequential modelling i.e. Named Entity Recognition and Part of Speech Tagging
by attaining a state of the art performance of 93.8% as F1 score for i2b2 clinical cor-
pus and achieves 97.29% as F1 score for GENIA corpus. This paper also states the
performance of feature fusion by integrating word embedding, feature embedding
and character embedding for sequential modelling tasks. We also propose a frame-
work based on a sequential modelling architecture, named MedNLU, which has the
capability of performing Part of Speech Tagging, Chunking, and Entity Recognition
on clinical texts. The sequence modeler in MedNLU is an integrated framework
of Convolutional Neural Network, Conditional Random Fields and Bi-directional
Long-Short Term Memory network.
1 Introduction
social media platforms. No one individual can acquire and maintain the knowledge
needed to comprehend the entirety of these data. Here comes the need of Natural
Language Processing (NLP), which is one of the subfields of Artificial Intelligence.
MedNLU is a framework to address the challenges involved in understanding the
information that are hidden in the midst of digital health data. This framework will
act as a fundamental component in many health care applications that requires natural
language processing and understanding.
The MedNLU comprises of subfields in Artificial Intelligence like NLP, Con-
ventional Machine Learning and Deep Learning. It takes the health-care texts or
health-care documents as the input and outputs the tokenized text, chunked text,
parsed text, entities and Part of Speech (POS) tags associated with the medical text.
By utilizing these entities and POS tags, knowledge base can be built, which can
be used for data base management and conversational systems. The components of
MedNLU framework are given in Fig. 1.
Most of the health documents are produced using Electronic Health Records
(EHRs) that includes records of patient’s family history, reason for initial complaint,
diagnosis and treatment, prescription medication, lab tests and results, record of
visits, administrative and billing data, patient demographics, progress notes, vital
signs, medical histories, immunization dates, allergies, radiology images and so on.
Almost all the details of a patient will be readily available for clinicians and physicians
at any point of time. With the help of NLP in Health care, the end system that required
a human assistance to re-check the documents got replaced by NLP based systems.
There are hospitals around the world which have already started using NLP on a
daily basis.
Extraction of information from it will help to develop the application like deci-
sion support systems, adverse drug reaction identification, pharmacovigilance, effec-
tive management of pharmacokinetics, patient cohort identification, effective EMR
development and maintenance. This required information is mostly extracted through
NLU tasks like Named Entity Recognition (NER), Part of Speech (POS) tagging and
Chunking [1–4].
So far medical domain has been mostly using NLU, which worked on rule-based
methodology [5, 6]. Rule-based is nothing but a set of hand coded rules to extract
valuable information from the medical data. With respect to the knowledge resources
that needed to be extracted in each set of documents, certain rules were framed that
were convenient for the structure of each document. In simple, document specific
rules were used for knowledge extraction. It was a little later when algorithm driven
models were used and reduced the workload of manually encoding the datum [7, 8].
Recently researchers have moved into applying Deep Learning to the Health care
data in the task like NER and relation extraction [9–11].
By observing this here we state the performance of feature fusion by integrat-
ing word embedding, feature embedding and character embedding for sequential
modelling tasks. Sequence modeler in MedNLU is an integrated framework of Con-
volutional Neural Network (CNN), Long-Short Term Memory (LSTM) network and
Conditional Random Fields (CRF). The proposed framework named MedNLU, has
the capability of performing Named Entity Recognition, Part of Speech Tagging,
Parsing and Chunking on clinical texts. This experiment also proves that without
having a domain specific word embedding model, the sequence model architecture
attains state of art performance using word embeddings developed from general
English text.
2 Related Works
Effective computation of dense word matrix and addition of downstream model [12]
on word2vec with different architecture is published in [13] which has a large impact
on research group to make use of the Big Data available in healthcare domain. Thus,
Text Classification tasks like sentiment analysis, Text summarization, Information
extraction (IE) [14] and Information retrieval (IR) which are some of common NLP
problems [15] have started using word embedding. Because of its acclaim fields in
health care and bioinformatics uses the same. Some of healthcare problems like Rela-
tion Extraction (RE), Named Entity Recognition (NER) [1], drug-disease interaction,
medical synonym extraction [2], and chemical-disease relation are getting special
attention. Closed set small corpus or general big corpus such as Google news and
6 H. B. Barathi Ganesh et al.
Wikipedia [16] have been used by people most of the time for training the embed-
ding models. These models cannot be directly used since clinical texts includes more
clinical words than the general words and it is not following the general grammar
patterns.
After computing the word vectors, different methodologies were used for eval-
uating the word embedding models. Context predicting and context counting from
semantic vectors are few among them in which the relation between data and corre-
lation issues with the different parameters are measured for lexical semantic tasks to
evaluate the word embedding model [17]. Counter predicting model is chosen over
count-based model due to its ability to give better results. Latent Semantic Analysis
was used by Landauer Thomas [18] for indirect knowledge accretion from text and
analysis for similarities in space were done by local co-occurrence. Unsupervised
vectors were used for classification problems in analogy tasks by Turney [19] and
this unsupervised way of learning for text applications were tried to be modified by
many others [20]. In bioinformatics domain assessment of word embeddings was
done by Pakhomo [12].
Due to restrictions on the use of clinical texts (HIPAA), work available on clinical
POS tagging is much less. POS annotation of 390,000 pediatric sequence from text
at Cincinnati Children’s Medical Centre was reported by Pestian et al. [3]. With
the addition of Special Lexicon into tagger wordlist, tagger which is comparable to
dTagger after training acquired an accuracy of 91.5%. But both tagger and the corpus
were not available. In order to reduce the dimensions of clinical text annotation while
co-training a POS tagger along with WSJ corpus Liu et al. [4, 21] developed sampling
methods. While evaluating one of the sampling methods in tagging pathology reports,
84% of the training data found to be reduced giving an accuracy of 92.7%.
Due to the domain constraints, annotated corpus as well as the trained tagger
were not available to the research community. Mayo Clinic in Rochester, Minnesota
developed MED corpus [4] having 100,650 POS-tagged tokens from 273 clinical
notes. An accuracy of 93.6% on the clinical notes was achieved when annotations
were pooled with GENIA and POS-tagged corpora [22]. Even with the unavailability
of clinical text corpora, Mayo Clinic made a biomedical NLP package cTAKES [6]
which is a full- established tagger made as a pre-trained reusable model.
The classic methods of doing NER were dictionary based and Rule based
approaches [5], which required domain expertise for detecting proper rules. Earlier
most of the researchers, those focused on named entity recognition tasks mostly pro-
posed the conventional machine learning approaches or using a grouping of conven-
tional machine learning and rule-based approaches. In [23] different supervised and
semi-supervised machine learning algorithms were used for NER problems which
concentrated on domain-dependent attribute and specialized text features. Hybrid
models made by concatenating Conditional Random Fields (CRF) and Support Vec-
tor Machines (SVM) algorithms combined with different pattern matching rules
gave better output as shown in [7]. In [8] combining some pre-processing techniques
like annotation and true casing with CRF based NER seems to have better concept
extraction performance. i2b2 challenge top performed models employed CRF and
MedNLU: Natural Language Understander for Medical Texts 7
semi markov Hidden Markov Models (HMM) with the F-score value of 0.85 in the
shared task.
Brown clustering method was used to derive unsupervised feature representations
from unlabelled corpora joined with HMM algorithm that was semi-supervised, was
selected as the best performing system for 2010 i2b2/VA challenge [24]. Multiple
aspect relations between words are not captured by one-hot unsupervised word fea-
ture representation from Brown clustering. Thus Jonnalagadda [25] proposed clinical
Entity recognition that was improved by including distributional word representa-
tion with random indexing model. By integrating word embedding obtained from
English Wikipedia corpus has been applied for the different NER tasks [25] which
found out to be a successive approach. CRF based concept extraction system [26] got
an enhanced performance through binarized word embedding obtained from domain
specific corpora.
By the commencement of deep learning, a subset of machine learning, unparal-
leled results were obtained for visual, NER and speech. Features are automatically
learned in neural networks which reduces the man power that was earlier needed for
machine learning, making neural network advantageous than conventional machine
learning algorithms. Researchers now started applying Deep Learning algorithms to
the health care data in the task like NER and relation extraction [9–11].
3 Methodology
The neural network architecture used for our implementation has multiple compo-
nents. The architecture of the entire process is depicted in Fig. 2.
The text representation is first and foremost technique in any natural language
understanding task. It sets the stage for the performance of subsequent machine
learning or deep learning algorithm. In our problem statement we transformed the
input sentence into a vector representation combining three different attributes named
as word embeddings, character embedding and feature vector. The character embed-
dings are computed through CNN using the methodology described in [27]. In this
experiment we have used a domain specific word embedding model developed from
Journal of Medical Case Reports (Health Embedding) and also, we have experi-
mented with the architecture with word embeddings from Google (Google Embed-
ding). By fusing these three vectors to the network with Bi-LSTM followed by CRF or
SoftMax makes the final prediction. The developed health embeddings are evaluated
through both qualitative and quantitative methods.
Word embedding captures the contextual meaning of words in terms of a low dimen-
sional vector. A word vector should clearly represent the distribution of adjacent
8 H. B. Barathi Ganesh et al.
words around the current word. This approach of representing words has helped
achieve state of the art performance for many challenging natural language process-
ing tasks. The two major models for learning word embedding were Continuous
Bag-of-Words (CBoW) model which learns current word representation based on
adjacent words (or context) and Continuous Skip-Gram model which learns by pre-
dicting the adjacent words given a context word [13, 28].
In our experimentation, we employed CBoW model for word embeddings. The
input layer consists of context words with a word window of size S and Vocabulary
V. This input is passed to hidden layer h which is an N-dimensional vector. Finally,
the output y is one-hot encoded word from training examples. The input layer is con-
nected to the hidden layer via a V × N weight matrix W and hidden layer is connected
to output layer using a N × V weight matrix Wt. The forward pass computations are
performed by first computing the output of hidden layer h as follows:
1
s
h= W xi (1)
s i=1
where, u j is the input to each layer in output layer. This forward pass is followed by
a backward propagation in which the model learns the parameters in term of weight
MedNLU: Natural Language Understander for Medical Texts 9
matrices W and Wt . The weight matrices are initialized with random values. The
cost function (E) which is just the conditional probability of output word given the
input word is computed using the training examples fed to the model. Our objective
is to maximize the conditional probability. Maximizing the conditional probability is
similar to minimizing the negative log probability. The final objective function could
be written as:
The feature vector is just one hot encoded vector. It transforms the 7 categorical
attributes into one hot encoding vector. The different categorical attributes are Start
case, uppercase, lowercase, all numeric, partially numeric, contains digit, and others.
The final vector input is the concatenation of character embedding (vector for char-
acter representation), word embedding (vector for word representation), and feature
vector (categorical attributes). We call this concatenation as Feature Fusion. The
word embedding from pre-trained google news vectors were used in one setup while
in the other we trained our own embeddings on healthcare data. These healthcare
embeddings seem to work better than the pre-trained embeddings. In our experimen-
tation, we have observed that the implementation using feature fusion yields better
results than word embeddings alone.
10 H. B. Barathi Ganesh et al.
The textual data is nothing but a string of words put together with some language
specific rules. The most suitable network architecture which inherently works well
with sequential data is Long Short-Term Memory (LSTM) networks. The network
architecture of LSTM differs in terms of directionality. It could be unidirectional
or bi-directional. The Bi-LSTM has access to the information from past as well as
future [29, 30].
The LSTM network consists of a set of memory blocks. Each LSTM cell has a
self-connected memory cell and three gates namely, input, output and forget gates.
These inputs, output and reset gates corresponds to write, read and reset operations
for a single LSTM cell. These memory blocks help LSTM cells to retain information
for a longer duration of time and it also help solve long range dependency issues.
In sequential task of Natural Language Processing (NLP), it is always better to have
both past as well as future contexts. However, an LSTM cell retains information
from the past values not the future values. An elegant solution to the aforementioned
scenario is to use a Bi-directional LSTM cell [29]. The idea is to replicate the LSTM
cell and stack it side by side. The first cell reads the input as-is and the second half
reads the same input but a reverse copy of it. It has practically proven to work better
for sequential tasks.
MedNLU: Natural Language Understander for Medical Texts 11
where y’ and y are the label pair. Wand b are the weight vectors and bias corresponding
to the language pair. The training of CRF is executed using maximum likelihood
estimation. For training set pair (xi , yi ), the log likelihood is given as;
L(W, b) = log p(y|z; W, b) (5)
i
The objective is to choose the parameters such that the log-likelihood is maxi-
mized. To retrieve the sequence of labels with highest probability, we use:
4 Corpora Statistics
and etc. which are not encoded by UTF-8 encoding scheme. We have also removed
the classes with negligible count: predeterminers (PDT), interjection (UH) and ?/= .
Statistics about the corpora utilized in creating the MedNLU is shown in Table 1.
GENIA [32] corpus is used for creating parts of speech tagging model in MedNLU.
The i2b2 clinical [33] corpus is annotated with 3 types of clinical tags, which are
named as problem, test and treatment. These tags were comprised of successive
words also. This corpus consists of 16107 sentences of patient discharge summary.
The i2b2 clinical corpus follows the Inside—Outside—Beginning (IOB) format.
The sequential modeler for MedNLU has been constructed by integrating CNN,
BLSTM and CRF. The systematic diagram is given in Fig. 2. This experiment is per-
formed with the system having the following configuration: RAM 32 GB, NVIDIA
GEFORCE GTX1080, i7 Processor, Python 3 and Ubuntu 16.04 LTS.
For every word, character-level representation (i.e. 30 × 1 vector) is computed
using CNN as given in Fig. 2. For each of these embeddings, we fine-tune the initial
embeddings by modifying them during weights updates of the neural network model
by back-propagation. These character embeddings are concatenated with the corre-
sponding word embedding (300 × 1) and a feature vector (7 × 1). This concatenated
vector has been fed to the BLSTM followed by the CRF layer. For the performance
observation purpose, we also integrated the BLSTM with the typical SoftMax layer.
The dropout has been applied in multiple levels during the computation of char-
acter embeddings. It applied before inputting to CNN as well as on input and output
vectors of BLSTM. The dropout rate has been fixed as 0.25 for all dropout layers
through all the experiments. This is shown in Fig. 1. Optimization of parameters are
performed with mini-batch Adam optimizer with batch size 32 and early stopping 5.
We have used pre-trained word embedding generated from general news text
(Google embedding), as well as the embedding model developed from clinical texts
MedNLU: Natural Language Understander for Medical Texts 13
(Health embedding). Python Gensim library is used for developing health embed-
dings. From word2vec, the continuous bag of words model with the following param-
eters are used to compute the health embeddings: minimum word frequency as 1,
embedding dimension as 300 and window size as 4. The corpus used for creating
health embedding model has explained under corpus statistics. The systematic dia-
gram is given below in Fig. 4.
The created word embeddings are evaluated through qualitative and quantita-
tive analysis. We have used cosine distance to inference the similarity among the
words/phrases for performing qualitative evaluation. The top five similar words were
taken with respect to the target word for further analysis. The qualitative analysis
results are given in Following Table 2.
In qualitative analysis, health embedding (vectors computed for the clinical text)
is validated by using the data from two sequential modeling tasks: POS tagging and
NER. Qualitative evaluation is performed based on the three different categories
disorder, symptoms and drug name.
t-distributed Stochastic Neighbour Embedding (t-SNE) whose primary purpose
is used for visualizing high parameter data. There are techniques like multidi-
mensional scaling, sammon mapping graph-based techniques are developed earlier
before t-SNE. Here D-dimensional data is visualized into two dimensional or three-
dimensional data.
In t-SNE, the euclidean distances between vectors are converted into a probability
distribution such that similar vector will have the high probability. The t-SNE map is
generated by keeping the target word in different categories and the same is shown
in Fig. 5. t-distributed stochastic neighbour embedding maps the vector in the high
dimensional space into the 2-dimension space. Here in this paper, the vectors from
the health embeddings those are close in the vector space can be visualized by t-SNE
map.
In the above figure, data points in the orange, blue and green colors are representing
the respective categories like disorder, drug, and symptoms. From Fig. 5, we can
clearly observe that the computed health embedding maps the different categories
(disorder, drug, and symptoms) into different clusters.
We modelled our analysis into classification task for performing quantitative anal-
ysis. As described in Ghanny et al. [34], we then evaluated the word embedding on a
POS tagged representation of GENIA corpus as given in Fig. 4 to ensure the quality
of representation.
In POS tagging task we have totally 26 classes and those were mapped to meta
tags with the count of 12 classes. In entity recognition task we have 7 classes. The
statistics about the classes are given in Tables 3 and 4. The obtained quantitative
results are given in the following Fig. 6. The results were obtained using 10 × 10
fold cross validation by having LSTM as a classifier.
Finally, we ended up with performance results for four architecture i.e. ([Google
Embedding or Health Embedding] + BLSTM + [CRF or SoftMax]). The observed
results for POS task for these combinations are given in Fig. 6. The performance of
sequence modeler on NER corpus are shown in Fig. 7a, b and performance on POS
corpus are shown in Fig. 8a, b.
The chunking and Parsing are performed through regular expression parser. A
set of rules defined for extracting Clauses (S), Prepositional Phrases (PP), Verb
MedNLU: Natural Language Understander for Medical Texts 15
Fig. 5 t-SNE map of health embeddings computed through word2vec CBOW model
Phrases (VP) and Noun Phrases (NP). These commonly occurring grammar rules
are extracted from POS tagged corpus based on the frequency of its occurrence. The
resultant parsed tree from the chunking is also a part of MedNLU.
It can be observed that CRF performs better than the SoftMax in both the NER
and POS tagging tasks. This ensures the need of sequence modeler at the output
layer than the typical SoftMax layer. The time duration takes for building CRF and
SoftMax based models are almost the same. Due to this we have not given the details
about time consumption for building proposed sequence modeler.
Google embedding wins the race by attaining better results than the health embed-
dings in both the tasks. Hence this ensures that, the sequence modeler is independent
to the requirement of domain knowledge. It can also be inferred that the character
embeddings include the information about medical words that are not present in the
Google embeddings.
We also compared the results obtained by the other models on experimented
corpora. The sequence modeler able to achieve the state of the art performance on
i2b2 clinical corpus. The statistics are given in the following Table 5. MedNLU
able to achieve nearly 8% of improved performance. Due to the non-availability of
standard separated train and test files, we have not compared the results obtained for
GENIA Corpus.
6 Conclusion
Fig. 7 a Performance of google embedding—sequence modeler with CRF and SoftMax on NER
b Performance of health embedding—sequence modeler with CRF and SoftMax on NER
18 H. B. Barathi Ganesh et al.
Fig. 8 a Performance of google embedding—sequence modeler with CRF and SoftMax on POS
tagging b performance of health embedding—sequence modeler with CRF and SoftMax on POS
tagging
Named Entity Recognition by attains the state of the art performance of 93.8%
as F1 score for i2b2 clinical corpus and achieves 97.29% as F1 score for GENIA
corpus. From the observed results it is clear that the character embedding provides
an additional sub word information about the clinical words. Character Embedding
along with the word embedding (computed for general text) solves the requirement
of clinical text-based word embedding model. The sub features extracted from the
clinical words also contributes towards the objective. The proposed MedNLU, has
the capability of performing Named Entity Recognition, Part of Speech Tagging,
Parsing and Chunking on clinical texts.
These successive results are good enough to extend this framework further towards
building the relation extraction and dependency parsing modules. It is also clear
that the existing annotated corpora are not good enough to drive the deep learning
algorithms. Hence, future work will also be focused on creating large annotated
clinical text-based corpora. Framework will be extended further by including features
that support in finding of Adverse Drug reaction and also findings of disability.
References
1. Wang, Y., Wang, L., Rastegar-Mojarad, M., Moon, S., Shen, F., Afzal, N., Liu, S., Zeng, Y.,
Mehrabi, S., Sohn, S. et al.: Clinical information extraction applications: a literature review. J.
Biomed. Inform, 2017
2. Yogatama, D., Liu, F., Smith, N.A.: Extractive summarization by maximizing semantic volume.
In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing,
pp. 1961–1966, (2015)
3. Pestian, J.P., Itert, L., Duch, W.: Development of a pediatric text-corpus for part-of-speech
tagging. In: Proceedings of the International IIS: IIPWM‘04 Conference held in Zakopane,
Poland. Springer, pp. 219–26 (2004)
4. Pakhomov, S.V., Coden, A., Chute, C.G.: Developing a corpus of clinical notes manually
annotated for part-of-speech. Int J Med Inform. 75(6), 418–429 (2006)
5. Hirschman, L., Morgan, A.A., Yeh, A.S.: The MITRE Corporation. Rutabaga by any other
name: extracting biological names. J. Biomed. Inform. 35(4), 247–259 (2002)
6. Savova, G.K., Masanz, J.J., Ogren, P.V., Zheng, J., Sohn, S., Kipper-Schuler, K.C., Chute, C.G.:
Mayo clinical text analysis and knowledge extraction system (ctakes): architecture, component
evaluation and applications. J. Am. Med. Inform. Assoc. 17(5), 507–513 (2010)
7. Boag, W., Wacome, K, Naumann, T., Rumshisky, A.: Cliner: a lightweight tool for clinical
named entity recognition. AMIA Joint Summits on Clinical Research Informatics (poster)
(2015)
8. Fu, X., Ananiadou, S.: Improving the extraction of clinical concepts from clinical records. In:
Proceedings of BioTxtM14 (2014)
9. Lv, X., Guan, Y., Yang, J., Wu, J.: Clinical relation extraction with deep learning. International
Journal of Hybrid Information Technology, pp. 237–248 (2016)
10. Wu, Y., Jiang, M„ Lei, J., Xu, H.: Named entity recognition in Chinese clinical text using deep
neural networks. Studies in Health Technology and Informatics, pp. 624 (2015)
11. Dong, X., Qian, L., Guan, Y., Huang, L., Yu, Q., Yang, J.: A multiclass classification method
based on deep learning for named entity recognition in electronic medical records. In: Scientific
Data Summit (NYSDS), IEEE, pp. 1–10 (2016)
12. Pakhomov, S.V., Finley, G., McEwan, R., Wang, Y., Melton, G.B.: Corpus domain effects on
distributional semantic modeling of medical terms. Bioinformatics 32(23), 3635–3644 (2016)
20 H. B. Barathi Ganesh et al.
13. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of
words and phrases and their compositionality. In: Advances in Neural Information Processing
Systems, pp. 3111–3119 (2013)
14. Ganguly, D., Roy, D., Mitra, M., Jones, G.J.: Word embedding based generalized language
model for information retrieval. In: Proceedings of the 38th International ACM SIGIR Con-
ference on Research and Development in Information Retrieval, ACM, pp. 795–798 (2015)
15. Ganesh, H.B., Kumar, M.A., Soman, K.P.: From vector space models to vector space models of
semantics. In: Forum for Information Retrieval Evaluation, Springer, Cham, pp. 50–60 (2018)
16. Tang, B., Cao, H., Wang, X., Chen, Q., Xu, H.: Evaluating word representation features in
biomedical named entity recognition tasks. BioMed research International, 2014 (2014)
17. Jagannatha, A., Chen, J., Yu, H.: Mining and ranking biomedical synonym candidates from
wikipedia. In: Proceedings of the Sixth International Workshop on Health Text Mining and
Information Analysis, pp. 142–151 (2015)
18. Gurulingappa, H., Toldo, L., Schepers, C., Bauer, A., Megaro, G.: Semi-supervised information
retrieval system for clinical decision support. In TREC (2016)
19. Peter, D.T.: A uniform approach to analogies, synonyms, antonyms, and associations. In: Pro-
ceedings of the 22nd International Conference on Computational Linguistics, Vol. 1. Associa-
tion for Computational Linguistics, pp. 905–912 (2008)
20. Landauer, T.K., Dumais, S.T.: A solution to plato’s problem: the latent semantic analysis theory
of acquisition, induction, and representation of knowledge. Psychol. Rev. 104(2), 211 (1997)
21. Liu, K., Chapman, W., Hwa, R., Crowley, R.S.: Heuristic sample selection to minimize ref-
erence standard training set for a part-of-speech tagger. J. Am. Med. Inform. Assoc. 14(5),
641–650 (2007)
22. Fan, J.W., Prasad, R., Yabut, R.M., Loomis, R.M., Zisook, D.S., Mattison, J.E., Huang, Y.:
Part-of-speech tagging for clinical text: wall or bridge between institutions?” In: AMIA Annual
Symposium Proceedings, vol. 2011. American Medical Informatics Association, p. 382–391
(2011)
23. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for
segmenting and labeling sequence data. ICML. pp. 282–289 (2001)
24. de Bruijn, Berry, Cherry, Colin, Kiritchenko, Svetlana, Martin, Joel, Zhu, Xiaodan: Machine-
learned solutions for three stages of clinical information extraction: the state of the art at i2b2
2010. J. Am. Med. Inform. Assoc. 18(5), 557–562 (2011)
25. Jonnalagadda, S., Cohen, T., Wu, S., Gonzalez, G.: Enhancing clinical concept extraction with
distributional semantics. J. Biomed. Inform. 45(1), 129–140 (2012)
26. Wu, Y., Xu, J., Jiang, M., Zhang, Y., Xu, H.: A study of neural word embeddings for named entity
recognition in clinical text. In: AMIA Annual Symposium Proceedings, vol. 2015, p. 1326.
American Medical Informatics Association (2015)
27. Chiu, J.P.C., Nichols, E.: Named entity recognition with bidirectional lstm-cnns. arXiv preprint
arXiv:1511.08308 (2015)
28. Ganesh, H.B., Kumar, M.A., Soman, K.P.: Distributional semantic representation in health
care text classification. In: International Conference on Forum of Information Retrieval and
Evaluation, pages 201–204, 2016
29. Dyer, C., Ballesteros, M., Ling, W., Matthews, A., Smith, N.A..: Transition based dependency
parsing with stack long short-term memory. In: Proceedings of ACL-2015 (Volume1: Long
Papers), pages 334–343 (2015)
30. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional lstm and
other neural network architectures. Neural Networks 18(5–6), 602–610 (2005)
31. Settles, B.: Biomedical named entity recognition using conditional random fields and rich
feature sets. In: Proceedings of the COLING 2004 NLPBA,. 2004, pp 104–108 (2004)
32. Verspoor, K., Cohen, K.B., Lanfranchi, A., Warner, C., Johnson, H.L., Roeder, C., Choi, J.D.,
Funk, C., Malenkiy, Y., Eckert, M., et al.: A corpus of full-text journal articles is a robust eval-
uation tool for revealing differences in performance of biomedical natural language processing
tools. BMC Bioinformatics 13(1), 207 (2012)
MedNLU: Natural Language Understander for Medical Texts 21
33. Uzuner, O., South, B.R., Shen, S., DuVall, S.L.: 2010 i2b2/VA challenge on concepts, asser-
tions, and relations in clinical text. J Am Med Inform Assoc. Sep-Oct 18(5), 552–556 (2011)
34. Ghannay, S., Favre, B., Esteve, Y., Camelin, N.: Word embedding evaluation and combination.
In: Proceedings of the Tenth International Conference on Language Resources and Evaluation
(LREC 2016), pp. 300–305 (2016)
35. Baroni, M., Dinu, G., Kruszewski, G.: Don’t count, predict! a systematic comparison of context-
counting vs. context-predicting semantic vectors. In: Proceedings of the 52nd Annual Meeting
of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 238–247
(2014)
H. B. Barathi Ganesh Current Chief Technology Officer at Arnekt Solutions Pvt Ltd., a pioneer-
ing Artificial Intelligence technologist with 5+ years of experience in implementing AI-enabled
technologies and enterprise systems that facilitate business processes and strategic objectives.
Continuous practitioner in blending of technology and business requirements for defining pow-
erful future business strategies, which were evidenced by cost-effective, high-performance ser-
vices and products. Has broader AI expertise in the domains like Automotive, BFSI, Education,
E-Commerce, Logistics, Manufacturing, and Retail.
K. P. Soman Currently serves as Head and Professor at Amrita Center for Computational Engi-
neering and Networking (CEN), Coimbatore Campus. He has 300+ publications in national &
international journals and conference proceedings. He has organized a series of workshops and
summer schools in Advanced signal processing using wavelets, Kernel Methods for pattern clas-
sification, Deep learning, Big-data Analytics etc. for industry and academia. Authored books on
“Insight into Wavelets”, “Insight into Data mining”, “Support Vector Machines and Other Kernel
Methods” and “Signal and Image processing-the sparse way”, published by Prentice Hall, New
Delhi, and Elsevier.
M. Anand Kumar Received his Ph.D. in Machine Translation from Amrita Center for Compu-
tational Engineering and Networking (CEN), Coimbatore Campus. Currently serving as an assis-
tant professor at the Department of Information technology, National Institute of Technology, Kar-
nataka. He has 100+ publications in national and international journals and conference proceed-
ings. His research interests include Natural Language Processing, Text Mining, Deep Learning
and Transfer Learning.
Deep Learning Based Biomedical Named
Entity Recognition Systems
P. Mishra
Gandhi Institute for Technology, Bhubaneswar, India
e-mail: [email protected]
S. Biswas · S. Dash (B)
North Orissa University, Baripada, India
e-mail: [email protected]
S. Biswas
e-mail: [email protected]
systems nonetheless with various variations from structural and purposeful func-
tions of biological brains. For experiment and analysis, we’ve used GENIA Corpus
that was created by a gaggle of researchers to develop the analysis of knowledge and
text mining system in biological science. It consists of one, 999 MEDLINE abstracts.
The GENIA Corpus has been loosely employed by linguistic communication process
community for improvement of linguistics search system and institution Bio human
language technology tasks. During this analysis, we tend to propose a multi-tasking
learning arrangement for Bio-NER that supports NN models to avoid wasting human
effort. Deep neural spec that has several layers and every layer abstract options pri-
marily based on the standard generated by the lower layers. After comparing with the
results of various experiments like Saha et al.’s (Pattern Recogn. Lett 3:1591–1597,
2010) with a Precision of 68.12, Recall 67.66 and F-Score 67.89; Liao et al.’s
(Biomedical Named Entity Recognition Based on Skip-Chain Crfs. pp. 1495–1498,
2012) with a Precision of 72.8, Recall 73.6 and F-Score73.2; ABNER (A Biomed-
ical Named Entity Recognizer, pp. 46–51, 2013) with a Precision of 69.1, Recall
72.0 and F-Score 70.5; Sasaki et al. (How to Make the Most of Ne Dictionaries
in Statistical Ner. pp. 63–70, 2008) with a Precision of 68.58, Recall 79.85 and
F-Score 73.78; Sun et al.’s (Comput. Biol. Med 37:1327–1333, 2007) with a Preci-
sion of 70.2, Recall 72.3 and F-Score 71.2; Our system has achieved a Precision of
66.54, Recall 76.13 and F-score 71.01% on GENIA normal take a look at corpus,
that is near to the progressive performance using simply Part-of-speech feature and
shows that deep learning will efficiently be performed upon medical specialty Named
Entity Recognition. This book chapter deals with the following section: Introduc-
tion, Literature review, Architecture, Experiment, Results and analysis, conclusion
and future work and References.
1 Introduction
In this book chapter, we are dealing with a really crucial downside referred to as
medicine Named Entity Recognition system. Named entity recognition may be a vital
mission in language process touching the computational linguistics, info Retrieval
and data Extraction. Language process may be a subfield of technology, computer
science and data engineering that deals with the interaction between the computer
and human language. It deals with the method and analyse the language information.
It’s a laptop activity during which computers are subjected to know, alter and analyse
which incorporates automation of activities, ways of communication. Named Entity
Recognition (NER) is one in all the crucial elements to language process (N L P)
that is employed to search out and categorise expressions of distinctive which means
in texts, written in language. The assorted kinds of named entities embrace person’s
name, organization’s name, place’s name, numbers etc. during this book chapter we
Deep Learning Based Biomedical Named Entity Recognition Systems 25
have a tendency tobe completely coping with medicine named entity recognition
(Bio-NER) that may be primary task in managing medicine text terminologies, like
polymer, cell-type, cell-line, protein, and DNA. Biomedical N.E.R is the most simple
and important task in medicine data extraction from text. Recognizing or distinctive
medicine named entities appears to be additionally tricky than to recognize traditional
named entities. Biomedical named entity recognition faces five challenges:
• The numbers of new medical terms are emerging. Therefore it is hard to build a
dictionary which will include the newest term.
• Same word could be categorized into different entity in term of context.
• Length of an entity could be quite long, and may include special characters such
as hyphens.
• Abbreviations are frequently used in the biomedical area that undergo ambiguous
situation.
• In biomedical terminology, normal terms or functional terms are united. Due to
this the term becomes very long. It is challenging for bio-NER to fragment the
sentence with named entities.
Recently, applications of deep learning build approach has been made to bio-
medical named entity recognition (Bio-N.E.R) which has shown promising outputs.
However, an abundant/huge quantity of training data or the scarcity/lack of data can
hamper the performance of deep learning approaches. Deep learning is also known
as deep structural learning or hierarchical learning. It is a branch of a broader unit
of machine learning methods based on learning data representations, as opposed
to specified task algorithms. Deep learning models are mostly enthused by com-
munication patterns and information processing in biological nervous systems yet
has various differences from the functional and structural property of the biological
brain (human brain). Also, Deep learning methods such as deep belief networks,
deep neural networks and recurring neural networks are applied to areas like audio
recognition, computer vision, former social network filtering, bioinformatics, natural
language processing, etc. where they have shown results equivalent to and in certain
cases advanced to human experts. This type of learning can be: Supervised learn-
ing, which is a machine learning chore of learning a function that maps an input and
output based on example input–output pairs. Semi-supervised learning, which is a
class of machine learning chore and technique which also make use of un-labelled
data for training a small quantity of labelled data with a large amount of unlabeled
data. Unsupervised learning, which is a term used for Hebbian learning, associated
to learning without a teacher, also known as self organisation and a method of mod-
elling the probability density of inputs. In this research work, we draw on a method
which is based on Convolution Neural Network (CNN)
Named Entity Recognition (NER) is that computerised process of finding out
plus labelling entities in a given text. Within the medicine domain, typical entity
varieties embody illness, chemical, cistron and macro molecule. Biomedical NER
(BioNER) is a necessary structural block of the various down-stream text mining
applications like extraction of drug-drug interactions [1] and disease-treatment rela-
tions. Bio-NER be additionally used once in the formation of a classy medicine
26 P. Mishra et al.
entity search tool [2] that allows user to cause advanced query to go looking for bio-
entities. NER, in medicine text-mining is concentrated chiefly on the wordbook, the
rule and the machine learning-based approach [3–5] word book based mostly sys-
tems have an easy and insightful structure however they cannot handle undetected
entity or polysemantic word, leading to lower recall [3, 4]. Additionally, building and
maintaining a comprehensive and latest wordbook includes a substantial quantity of
labour-intensive work. The statute primary approach is a lot of a scendable; how-
ever it wants manually crafted featured sets to suit a model to a dataset [5]. These
dictionary-based and ruled approach are able to do high preciseness [2] however
will manufacture incorrect predictions once a brand new word, that isn’t within the
coaching knowledge, seems for the period of a sentence (out-of-vocabulary problem).
Habibi et al. [6, 7] utilised character-level word embedding to confine character-
istic, like writing options, of medical specialty entities and achieved progressive per-
formance, demonstrating the efficiency of character-level word embeddings in Bio-
NER. Even though these models have shown some potential results, NER remains a
really difficult chore within medical specialty domain for all the subsequent reasons.
First, a restricted quantity of coaching knowledge is offered for BioNER task. On
the contrary, the J.N.L.P.B.A corpus [8] contain annotation of solely genes and pro-
teins. Hence, {the knowledge|the info|the information} for every entity kind includes
solelya little section of the overall quantity of annotated data. Multi-task learning
(MTL) may be a technique for coaching one model for numerous tasks at an equiv-
alent time. MTL will influence totally diverse datasets that area unit composed for
various however connected tasks [9]. Though extraction of genes is totally dissimi-
lar from extraction of chemicals, each task needs learning of some general options
which will facilitate perceive the linguistic expressions for medicine texts. Student
et al. Since M.T.L based mostly models square measure is trained on various styles of
entities and bigger coaching knowledge, they need a broad exposure of varied med-
ical specialty entities, which as expected ends up in higher recall. On the contrary,
because the M.T.L models square measure is trained on combos of various entity
varieties, they have an inclination towards own issue in differentiating amongst entity
varieties, leading to low preciseness.
Another reason NER is troublesome within the medical specialty domain is the
associate entity might be tagged as completely unlike entity sorts counting on its
matter context. As an example, BiLSTM-CRF based mostly models for illness entities
erroneously labeled the factor name “BRCA1” as an illness entity as a result of there
are illness names like “BRCA1 abnormalities” or “Brca1-deficient” within coaching
sets. In addition, the coaching set that annotates “VHL” (Von-Hippel-Lindau illness)
as disease entity confuses the model as a result of VHL be additionally used as
factor name, since the alteration of this factor causes VHL illness. Therefore, every
model is Associate in nursing professional in its own domain and helps improving the
accuracy rate by investing the multi-domain data from the opposite model. Driven
by the works of Collobert [10], we have a tendency to tend to place up a neural
network model in support of medication N.E.R mission. Our works gift that deep
learning can expeditiously be performed on drugs N.E.R. Our design achieves getting
Deep Learning Based Biomedical Named Entity Recognition Systems 27
2 Literature Review
In the field of Biomedical, the level of data has been produced each day is Giga-
byte or even Terabyte. The development of the medicine analysis space has been
driven into some ways by such an enormous quantity of information. Medicine
Named Entity Recognition could be an important initial step for medicine scien-
tific discipline. Medicine Named Entity Recognition is far trickier than the final
Named Entity Recognition thanks to complexities like daily dynamic cluster mem-
bers, distinguished boundaries and irregularity in expression [11–15]. The popular-
ity of genes, drawing out a listing of exclusive identifiers for human genes and also
the extraction of physical macromolecule—protein interaction annotation—relevant
info. AN even-handed exactness and recall discovered in favour of the submission
of the cistron mentioned for cistron standardization task. Within the case of protein-
protein interaction task completely different results were obtained looking on the
annotation extraction progress. The final characteristic discovered task was the group-
ing of system outputs showed results higher than a single system that light-emitting
diode to the event of the foremost text mining meta-server in the context [12]. There
has been numerous supervised technique that are accustomed learn medicine names
entity recognition issues like: MEMMs (Maximum Entropy Markov Models) [16]
or conditional Markov model could be a graphical model that mixes HMM and most
entropy models for sequence labelling. MEMMs notice applications in language
processing; a part of speech tagging in specific likewise as info extraction.
HMM (Hidden mathematician|Markov|Andre Markov|mathematician} Models)
[17] is applied math Markov model into that the system being modelled is taken
to be a procedure with unobserved state. CRF (Conditional Random Field) [18] be
a category of applied math modelling methodology that is applied in recognition
of pattern and machine learning, used for structured prediction. A CRF is capable
of taking context into consideration. For instance, the linear chain CRF predicts
the sequence of labels for sequence of input samples. It’s fashionable in linguistic
communication process. HMM, MEMM, and CRF square measure 3 fashion able
applied math model strategies, often applied to pattern recognition and machine
learning issues. In Hidden Markov Model (HMM) the word “Hidden” depicts the
fact that only the symbols released by the system can be seen. Advantages of Hidden
Markov Model have a strong foundation with efficient learning algorithms. Whereas
disadvantages of Hidden Markov Model include its dependency on every state and its
corresponding observed objects. The sequence labelling, having a relationship with
individual words, also relates to aspects such as sequence length or world context,
etc. Maximum Entropy Markov Model takes into consideration the dependencies
between neighbouring and entirely observed sequence which gives better expression
ability. Conditional Random Field Model addresses the labelling bias issue. With
28 P. Mishra et al.
Comparison to Hidden Markov Model, since CRF does not have strict independent
assumptions as HMM and accommodate any contact information. Thus its feature
design is flexible. Whereas, compared to Maximum Entropy Markov model, CRF
computes the conditional probability of global optimal output notes; it overcomes
the drawbacks of label bias. CRF is additionally applied for entity recognition in
medicine by Settles [2], which accomplish Associate in Nursing F-score of 69.9%
on GENIA corpus in conjunction with varied types of character. Whereas, the HMM
when applied on GENIA corpus to attain preciseness of 65.5% and a recall of 66.9%
[2]. Li conferred 2 faces of medicine named entity recognition model on GENIA
corpus, which is split in 2 components [10, 19]: Named entity detection (NED): this
is often the primary half that is employed to differentiate the non-named entities
(NNE) while not characteristic their sort. Names entity classification (NEC): This is
the second part in which the multi-agent technique or strategy is used, achieving an
F- score of 76.06%.
BioNER is additionally used once building a classy medical specialty entity search
tool [20] that allows the user to cause complicated query to go looking for biomedical
entities. NER in medical specialty text mining concentrates principally on wordbook,
the rule, and the machined learning-based approaches [3–5, 21–23]. Word book based
mostly systems have straightforward and perceptive structure however they cannot
handle undetected entities or ambiguous words, leading to low recall [3, 4]. These
rules and dictionary-based approaches are able to do high preciseness [3] however
will manufacture incorrect predictions once a replacement word, that isn’t within
the coaching in formation, seems in the sentence (not from the vocabulary issues).
The not from the vocabulary issues drawback happen soften particularly within the
medical specialty domain, because it is frequent for replacement medical specialty
term, like a replacement drug name. Habibi et al. [24] utilised character level word
embedding to capture characteristics, like writing options, of medical specialty enti-
ties and achieved progressive performance, demonstrating the efficiency of character
level word embedding in Bio NER. Though these models have shown some potential
results, NER continues to be an awfully difficult job within medical specialty domain
for subsequent reasons. Firstly, a restricted quantity of coaching knowledge is out
there for Bio NER task. The Gold-standard datasets contain annotation of 1 or 2 vari-
eties of entity. As an instance, the NCBI corpus [8] includes annotations of diseases
however not for different varieties of entities like proteins and genes. On the con-
trary, JNLPBA corpus [9] contains annotations of solely protein sand genes. Hence,
{the knowledge |the info| the information} for every entity sort contains solelya little
fraction of the entire quantity of annotated data. Multi-task learning (MTL) could
be a methodology to coach one model for several tasks at a similar time. MTL will
influence completely different datasets that area unit composed for various however
connected tasks [25].
Although extraction of genes is totally different from extraction of chemicals, each
task needs to learn some general options which may facilitate perceive the linguistic
expressions of medical specialty text. [26, 27] achieved performance appreciate that
of the progressive single task NER models. In contrast to the standard MTL ways
that use solelyone static model, CollaboNet consists of several models strained on
Deep Learning Based Biomedical Named Entity Recognition Systems 29
totally diverse datasets for various task. On the contrary, because the MTL models
are trained with mixtures of various entity sorts, they have an inclination to possess
problem in differentiating amongst entity sorts, leading to low preciseness. A further
excuse NER is troublesome within the medicine domain is the associate degree entity
can be tagged as a completely different entity sorts betting on its matter context.
3 Architecture
In this chapter we tend to use a technique that relies on a convolutional neural network
(CNN) that has obsessed some human language technology (Natural Language Pro-
cessing) tasks [28–30]. This convolutional neural spec is projected by Bengio [30]
for the probabilistic language model. Neural Networks were introduced when this
for compound human language technology tasks. We tend to take this into thought
for medicine named entity recognition task. The design is given in Fig. 1. After we
compare the previous over engineering system, the deep learning approach that is
enforced here reduces the enslavement on linguistic ingenuity. In figure one the token
beta, delineate within the right middle of the window, and calculated at instant “t”.
Words contained by the sliding windows that square measure painted as real valued
vectors square measure inputs for this neural network. The node score for every
label of word beta is produced once the transformation of linear layers and sigmoid
layers. At last the count lattice for the sentence be given as output at the top of pro-
cedure. Viterbi algorithmic rule is then applied to induce the label sequence within
the best state. And, the length of our input for CNN is fastened and custom-made to
text information. Firstly, a word wordbook S is be created by massive information
from medical specialty papers. The words in S are reworked into vectors for input
for CNN. Each word within the word book encompasses a preset dimension vector.
Therefore, the words that are altered into vectors are held on within the matrix
M ∈D×|S| (1)
In this chapter, we have a tendency to optimize the word illustration for such
that Bio-NER task. Here, we have a tendency to use the second technique. Once
comparison the various language models [34–37] we have a tendency to choose
skip-gram neural network language model. This model isn’t the most effective model,
however it’s additional applicable for the coaching of rare words.
Here, within the case of medical specialty NER tasks, a correct label for every word
within the sentences have to be compelled to tend to suggest if it’s a Bio-NER or
not. These sentences area unit taken as inputs and acceptable labelled sequences
are given as output for every sentence. The lengths of the sentences don’t seem
to be fastened however the input for neural network is fastened. This is often the
explanation; we have a tendency to choose window approach. Therefore, the window
size is determined as ‘k’ at the start and completely different exactness might occur
within the system thanks to it. The dependency data amongst the label of every word
and it’s near words area unit below concern thanks to window approach. Hence, the
words close the labelled word of the window will experience the layer along. When
we study the word at position C, along with the neighbouring words of position in
the range [(C − (k − 1)/2), (C + (k − 1)/2)] shall be pass onto Mapping layer.
Since, every word is reworked into D-dimensional vector via this layer; hence the
input-size for the linear layer-1 is unbroken mounted.
Deep neural network is described as a structural style with many layers. The layers
show characteristics supported the options made by subordinate layers. Betting on
the planning of the neural network, every layer may either be linear perform or
alternative transformation.
A perform fθ (.) describes the 3 layers in our design as
f (x) = M 2 g M 1 x + b1 + b2 (2)
e f (x,l,θ )
P(l|x, θ ) = f (x, j,θ ) (3)
e
T
W x[1:T ] , l[1:T ] , θ ∼ = A[t−1]l[t] + f x[1:T ] , l[t] , t, θ (5)
t=1
The log conditional probability for taking the real labelled path, y[1:T] be
log P(y[1:T ] |x[1:T ] , θ ∼ ) = W x[1:T ] , y[1:T ] , θ ∼ − log∇l[1:T ] addW (x[1:T ] , l[1:T ] , θ ∼
(6)
all the parameters θ ∼ are trained over (x[1:T] , y[1:T] ). In inference procedure, the
Viterbi algorithm is used to come across
argl[1:T ] maxW x[1:T ] , l[1:T ] , θ ∼ (7)
4 Experiment
The task of Bio-NER is to acknowledge the entities like diseases, viruses, proteins
and genes and label them not ably in straightforward medicine text. The figure a
pair of shows that each word in a very given sentence be taken as token and allied
with the selected label. Here, the labels O, B-C or I-C not solely indicates the cluster
however conjointly the placement of the token inside the Named Entities, wherever
C is for class, B and that I are locations for starting associate in training inside an
entity severally. There are five label categories: deoxyribonucleic acid, RNA, Protein,
Cell_type and Cell_line. Here O indicates the token that isn’t an element of Named
Entity. The check file is thought as the BIO notation in GENIA Corpus. 11 labels are
enclosed victimization this BIO notation in Fig. 2. These tokens are assigned with
one amongst the 11 labels within the result.
34 P. Mishra et al.
According to different systems, the classes like super molecule and polymer have
the very best F-Score. The number of each entity within the coaching knowledge is
shown in Table 1. Once examination Table 1 and Fig. 3, the class ‘cell-type has the
tiniest coaching knowledge set however has highest preciseness and second highest
F-Score. In figure four, square measure able to see that there are twelve-tone music
‘B-DNA’ wrong labelled words into ‘B-Protein’ that includes a larger count than
different medicine classes.
We found 2 major reasons when researching on the coaching data: medical spe-
cialty Named Entities are composed of the many nested named entities. as an example,
Deep Learning Based Biomedical Named Entity Recognition Systems 35
words like, ‘Viruses’, ‘Epstein-Barr’, ‘protein’, ‘cell’, ‘EBV’ are in each the entities
but belong to totally different classes in Fig. 4.
It is found that these words might come into view at different positions according
to categories. The BMESO notation is applied to utilize this information since BIO
notation cannot present such information (Fig. 5).
BMSEO notation is analogous to BIO notation which supplies elaborated depic-
tion of the position of every word within the entities. Here B indicated the start of
entity and E is the finish of that object. Words amid B and E are denoted as M. If
the entity is singular, it shall be denoted as S. Second reason is the need of training
36 P. Mishra et al.
set of the labels as well as the entities that don’t come into view in training set. The
ultimate results on GENIA file is listed in Table 2.
After comparing with the results of various experiments like Saha et al.’s [39]
with a Precision of 68.12, Recall 67.66 and F-Score 67.89; Liao et al.’s [40] with
a Precision of 72.8, Recall 73.6 and F-Score 73.2; ABNER [41] with a Precision
of 69.1, Recall 72.0 and F-Score 70.5; Sasaki et al.’s [42] with a Precision of
68.58, Recall 79.85 and F-Score73.78; Sun et al.’s [43] with a Precision of 70.2,
Recall72.3 and F-Score71.2; Our system has achieved a Precision of 66.54, Recall
Deep Learning Based Biomedical Named Entity Recognition Systems 37
76.13 and F-score 71.01% on GENIA standard test corpus, which be nearly the
state-of-the-art performance. However, the biomedical dictionary changes every day
and will be different due to changing tasks and corpora.
In this book chapter, we have enforced a compound layer neural network on medicine
Named Entity Recognition system. Results that are achieved square measure getting
ready to state-of-art performance. There’s a scope of any improvement of the per-
formance of neural network. The belief of the left boundary word is crucial and not
word or the subsequent words are tagged incorrectly too. Reverse recognition with
forward recognition can be explored in future for better accuracy of the system.
References
1. Lim, S., Lee, K., Kang, J.: Drug drug interaction extraction from the literature using a recursive
neural network. PLoS ONE 13(1), e0190926 (2018)
2. Lee, K., Hwang, Y., Kim, S., Rim, H.: Biomedical named entity recognition using two-phase
model based on Svms. J. Biomed. Inform. 37(6), 436–447 (2004)
3. Hettne, K.M., Stierum, R.H., Schuemie, M.J., Hendriksen, P.J., Schijvenaars, B.J., Mulligen,
E.M.V et al.: A dictionary to identify small molecules and drugs in free text. Bioinformatics.
25(22), 2983–2991 (2009)
4. Song, M., Yu, H., Han, W.S.: Developing a hybrid dictionary-based bio-entity recognition
technique. BMC Med. Inform. Decis. Mak. 15(1), S9 (2015)
5. Fukuda, K.I., Tsunoda, T., Tamura, A., Takagi, T. et al.: Toward information extraction: iden-
tifying protein names from biological papers. In: Pac sympbiocomput. vol. 707, p. 707–718
(1998)
6. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for
named entity recognition. In: HLT-NAACL. The Association for Computational Linguistics.
p. 260–270 (2016)
7. Pyysalo, S., Ginter, F., Moen, H., Salakoski, T., Ananiadou, S.: Distributional semantics
resources for biomedical text processing. In: Proceedings of the 5th International Symposium
on Languages in Biology and Medicine, Tokyo, Japan. p. 39–43 (2013)
38 P. Mishra et al.
8. Kim, J.D., Ohta, T., Tsuruoka, Y., Tateisi, Y., Collier, N.: Introduction to the bio-entity recogni-
tion task at JNLPBA. In: Proceedings of the international joint workshop on natural language
processing in biomedicine and its applications. Association for Computational Linguistics.
p. 70–75 (2004)
9. Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997). Available from: https://
doi.org/10.1023/A:1007379606734
10. Collobert, R.: Deep learning for efficient discriminative parsing. In: International Conference
on Artificial Intelligence and Statistics (2011)
11. Dai, H., Chang, Y.C., Tsai, R.T.Z.H., Hsu, W.: New challenges for biological text- mining in
the next decade. J. Comput. Sci. Technol. 25(1), 169–179 (2010)
12. Krallinger, M., Morgan, A., Smith, L., Leitner, F., Tanabe, L., Wilbur, J., Hirschman, L.,
Valencia, A.: Evaluation of text-mining systems for biology: overview of the second biocreative
community challenge. Genome Biol. 9(2) (2008)
13. Dai, H., Huang, C., Lin, R., Tsai, R., Hsu, W.: Biosmile web search: a web application for
annotating biomedical entities and relations. Nucleic Acids Res. 36, 390–397 (2008)
14. Rebholz-Schuhmann, D., Arregui, M., Gaudan, S., Kirsch, H., Jimeno, A.: Text processing
through web services: calling Whatizit. Bioinformatics. 24(2) 296–300 (2008)
15. Si, L., Kanungo, T., Huang, X.: Boosting performance of bio-entity recognition by combining
results from multiple systems. In: Proceedings of the 5th International Workshop on Bioinfor-
matics ACM (2005), pp. 76–83
16. Tsuruoka, Y., Tateishi, Y., Kim, J.-D., Ohta, T., McNaught, J., Ananiadou, S., Tsujii, J.I.:
Developing a robust part-of-speech tagger for biomedical text. In: Advances in Informatics.
Springer (2005), pp. 382–392
17. Vlachos, A.: Evaluating and combining biomedical named entity recognition systems.
In: BioNLP 2007: Biological, Translational, and Clinical Language Processing (2007),
pp. 199–206
18. Li, L., Zhou, R., Huang, D.: Two-phase biomedical named entity recognition using crfs. Com-
put. Biol. Chem. 33(4), 334–338 (2009)
19. Li, L., Fan, W., Huang, D.: A two-phase bio-ner system based on integrated classifiers and
multi-agent strategy. IEEE/ACM Trans. Comput. Biol. Bioinf. 10(4), 897–904 (2013)
20. Lee, S., Kim, D., Lee, K., Choi, J., Kim, S., Jeon, M., et al.: BEST: next-generation biomedical
entity search tool for knowledge discovery from biomedical literature. PLoS ONE 11(10),
e0164680 (2016)
21. Proux, D., Rechenmann, F., Julliard, L., Pillet, V., Jacq, B.: Detecting gene symbols and names
in biological texts. Genome Inform. 9, 72–80 (1998)
22. Tsai, R.T.H., Sung, C.L., Dai, H.J., Hung, H.C., Sung, T.Y., Hsu, W.L.: NERBio: using selected
word conjunctions, term normalization, and global patterns to improve biomedical named entity
recognition. In: BMC bioinformatics. BioMed Central. 7, S11 (2006)
23. Ju, M., Miwa, M., Ananiadou, S.: A neural layered model for nested named entity recognition.
In: Proceedings of the 2018 Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). vol. 1,
p. 1446–1459 (2018)
24. Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition
and concept normalization. J. Biomed. Inform. 47, 1–10 (2014)
25. Crichton, G., Pyysalo, S., Chiu, B., Korhonen, A.: A neural network multi-task learning
approach to biomedical named entity recognition. BMC Bioinform. 18(1), 368 (2017)
26. Zheng, J.G., Howsmon, D., Zhang, B., Hahn, J., McGuinness, D., Hendler, J et al.: Entity
linking for biomedical literature. In: Proceedings of the ACM 8th International Workshop on
Data and Text Mining in Bioinformatics. ACM. p. 3–4 (2014)
27. Tsutsui, S., Ding, Y., Meng, G.: Machine reading approach to understand Alzheimers disease
literature. In: Proceedings of the Tenth International Workshop on Data and Text Mining in
Biomedical Informatics (DTMBIO) (2016)
28. Bengio, R.D.Y., Vincent, P.: A neural probalilistic language model. In: NIPS. vol. 13 (2001)
Deep Learning Based Biomedical Named Entity Recognition Systems 39
29. Westion, R.C.A.J.: A unified architecture for natural language processing: deep neural networks
with multitask learning. In: ICML (2008)
30. Collobert, J.W.R., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, A.P.: Natural language
processing (almost) from scratch. JMLR (2011)
31. YoshuaBengio, R.E.D., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach.
Learn. Res. 3, 1137–1155 (2003)
32. Schwenk, H.: Continuous space language models. Comput. Speech Lang. 21(3), 492–518
(2007)
33. Mikolov, T., Karafiat, M., Burget, L., Cernocky, J., Khudanpur, S.: Recurrent neural network
based language model. In: Eleventh Annual Conference of the International Speech Commu-
nication Association (INTERSPEECH) (2010), pp. 1045–1048
34. Mnih, A., Teh, Y.W.: A fast and simple algorithm for training neural probabilistic language
models. In: Proceedings of the 29th International Conference on Machine Learning (ICML-12)
(2012), pp. 1751–1758
35. Collobert, R.: Deep learning for efficient discriminative parsing. In: International Conference
on Artificial Intelligence and Statistics (AISTATS) (2011)
36. Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for
semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for
Computational Linguistics (2010), pp. 384–394
37. Yih, W.T., Mikolov, T., Zweig, G.: Linguistic regularities in continuous space word representa-
tions. In: Proceedings of the 2013 Conference of the North American Chapter of the Association
for Computational Linguistics: Human Language Technologies, (2013) pp. 746–751
38. Bottou, L.: Stochastic gradient learning in neural networks. In: Proceedings of Neuro-Nimes,
vol. 91 (1991)
39. Saha, S.N.S.K., Sarkar, S., Mitra, P.: A composite kernel for named entity recognition. Pattern
Recogn. Lett. 3, 1591–1597 (2010)
40. Liao, Z., Wu, H.: Biomedical named entity recognition based on skip-chain crfs. In: Indus-
trial Control and Electronics Engineering (ICICEE), 2012 International Conference on. IEEE
(2012), pp. 1495–1498
41. ABNER: A Biomedical Named Entity Recognizer (2013), pp. 46–51
42. Sasaki, Y.T.Y., McNaught, J., Ananiadou, S.: How to make the most of ne dictionaries in
statistical ner. In: Proceedings Workshop Current Trends in Biomedical Natural Language
Processing (2008), pp. 63–70
43. Sun, C., Guan, Y., Wang, X., Lin, L.: Rich features based conditional random fields for bio-
logical named entities recognition. Comput. Biol. Med. 37, 1327–1333 (2007)
Pragatika Mishra is an M. Tech in Computer Science and Engineering from Biju Patnaik Uni-
versity of Technology. She has around 2 years of experience in teaching under-graduate students.
Her area of research interests are Artificial Intelligence and Machine Learning.
Sitanath Biswas has done M.E (CSE) from Utkal University and currently pursuing Ph.D. from
North Orissa University, Baripada, Odisha. He is currently working as Asst. Prof. in Gandhi Insti-
tute for Technology, Bhubaneswar, Odisha. He has over 14 years of experience in Teaching and
Research. He has published over 18 research papers in various international Journal of repute. His
area of research is artificial Intelligence and Natural Language Processing.
Sujata Dash received her Ph.D. degree in Computational Modelling from Berhampur University,
Orissa, India in 1995. She is an Associate Professor in P.G. Department of Computer Science and
Application, North Orissa University, at Baripada, India. She has published more than 150 tech-
nical papers in international journals, conferences, and book chapters of reputed publications. She
has guided many scholars for their Ph.D. degrees in computer science. She is associated with many
40 P. Mishra et al.
professional bodies like IEEE, CSI, ISTE, OITS, OMS, IACSIT, IMS and IAENG. She is in the
editorial board of several international journals and also reviewer of many international journals.
Her current research interests include Machine Learning, Distributed Data Mining, Bioinformat-
ics, Intelligent Agent, Web Data Mining, Recommender System and Image Processing.
Disambiguation Model for Bio-Medical
Named Entity Recognition
A. Kumar
A. Kumar (B)
Department of Computer Science and Engineering, National Institute of Technology Raipur,
Raipur, Chhattisgarh 492010, India
e-mail: [email protected]
Abbreviations
1 Introduction
As the internet is growing day by day, biomedical text data is also increasing, and
for access meaningful and important information from various biomedical text data,
a strong technique is required. In this chapter, Named Entity Recognition (NER),
which is a technique of information extraction (IE) and a part of text mining is used
in this research. The named entity recognition (NER) is required substances to label
the text dataset. NER automatically recognizes name entity in natural language in
the domain of interest. In biomedical text data, many identities are required to label
like gene name, protein name, disease, chemical, medication. In the recent past,
most of the researcher focuses on protein and gene items extraction, whereas some
research going on disease entity extraction. Biomedical named entity recognition is
somewhere difficult to normal named entity recognition, because of the reason that
biomedical named entity recognition have a verity of the alias, abbreviation, verity
in naming convention and organism which may refer as the same name of protein
or genes with term which refer different biological entities. In the example, one
biomedical entity named called p53, which refer as a protein named in one context.
Similarly, p53 also refers to a molecular weight of protein with 53 KD. For
tackle, this type of problem, different approaches of named entity recognition has
been applied on biomedical text name as a rule-based approach, dictionary-based
approach, and machine learning-based approach. As we know, thousands of biomed-
ical pieces of literature published in thousand of the journal every day, which emerges
new terms and spelling variation of an existing biomedical word. The rule-based and
dictionary-based approach of named entity recognition is not suitable because of
Disambiguation Model for Bio-Medical Named Entity Recognition 43
less prediction power. Here ML-based approach comes into the picture. The ML-
based approach is more reliable and robust for biomedical named entity recognition
because it has capabilities to handle a data of high dimensional vector features for
text processing and it can predict new terms or variations depends on learning pat-
tern. For training, a reliable and high performance named entity recognition model
is required, which is capable of fully capture the words in the context. Biomedi-
cal NER was developed for the use of various linguistic features characteristics of
the word like lemmatization and stemming, morphological features like prefixes or
suffixes, word shape, character weight, etc., orthographic features like word forma-
tion, symbols, digits, etc., contextual features like word windows and conjunction.
Binary encoding sets of feature is used for an input of ML to train the algorithm
of Named entity recognition model with the involvement of annotation of named
entity mentioned in training dataset.
Recently past year most of the researchers work only the single domain such
as protein, or a gene or a disease or a chemical name but none of the research as
described in the literature has been done for all four datasets together. This research
chapter considered multiple domains (protein, gene, disease, and chemical name)
together so that we can automatically recognize and labeled the correct entity in a
given text. The main goal of this research is to handle polysemous words, which is
the main cause of lower recall. The model which mention in this chapter is combined
handles all four type of domain dataset in a single model.
The rule-based approach deal with the orthographic and morphological structures.
Compare the rule-based approach with the dictionary-based approach; the rule-based
approach performs better as the comparison with the directory-based approach. In, a
character string is used to identify the term followed by the rules and the handcrafter
patterns to concatenation the adjacent words of a named entity. The drawback with
the rule-based approach is, it highly depended upon the domain-based named entities
which have common morphological and orthographic characteristics. Depending on
handcrafted features and inappropriate for a new domain or naming convention,
switch with the other approaches.
The dictionary-based approach used to find the entire name entity from a given text
by the dictionary, and various terminology has applied on bio-med text mining. An
instance of “HUGO,” is a terminology which provides 21,000 gene entities of human.
UniProt database of the Swiss-Prot, which contains 180,000 records of protein, has
been frequently used. BioThesaurus include the compilation of several of million
44 A. Kumar
genes and protein mapped into the UniProt entries used by cross-reference in the
database of iProclass. Unlike a machine learning-based approach, the significant
advantage of dictionary-based approach over the machine learning-based approach
used an external identifier for built each entry which provides metadata to the anno-
tation extracted names. However, this approach suffers various challenges, including
false positive, due to the cause of ambiguity in the name. Spelling variations and
synonyms covered by the false negative. This approach depends on the curation and
creation of lexicon to the particular domain, which contains millions of entities. To
solve the problem of spelling variation, Tsuruoka et al., use the variant generator and
string searching and method for achieving improved F-Score on GENIA corpora
compared by the exact matching algorithm [1, 2].
Machine learning-based approach is one of the best and frequently used in the area
of text mining. BioCreative II protein or gene tasks achieved the best performance by
using a machine learning-based approach. Different type of supervised learning like
a Support Vector Machine (SVM) [3], Hidden Markov Model (HMM) [2], CRF [4],
MEMMs [5], Cased-based [6] have used in named entity recognition. Supervised
learning methods utilizes only annotated text corpus. To resolve the sparseness of
data issue, which encountered during the use of a large set of features on a minimal
dataset of training. Recently few semi-supervised learning methods used for large
size of unannotated text corpora. The vital part of the ML approach is an appropriate
selection of features set, which is represented by the named entity. Mostly used
features are morphological patterns, parts of speech (POS) tagging, orthographical
words pattern formation, tokenization, lemmatization, and conjunction of contextual
features.
In recent, the importance of deep learning-based methods is demonstrated by the
various studies. The ability of Recurrent Neural Network (RNN) is shown by a Sahu
and Anand [7] for biomedical text named entity recognition. The model proposed
by Sahu and Anand is the combination of Conditional Random Field (CRF) with
Bi-directional Long Short-Term Memory (Bi-LSTM), used character level (Cl) and
word level (WL) embedding but they did not describe the benefits of CL and WL
embedding with Bi-LSTM-CRF model. Habib et al., [8], merged the Bi-LSTM-CRF
model Lample et al., [9] with word embedding of Pyysalo et al., [10]. Habibi used CL
based word embedding for capturing characteristics like an orthographic feature of
bio-medical entities. Habibi et al., illustrate the potentiality of character-level word
embedding in Biomedical Named entity recognition. Although the given models
showed the prominent result, still a very challenging task in the area of biomedical
named entity recognition remains. First, to deal with a small amount of training data,
which is available for Biomed NER task. A Gold Standard datasets are consist of only
one or two types of annotation of the entity. NCBI corpora [11] contain only diseases
annotation only, and this corpus does not contain any other types of an entity like
Disambiguation Model for Bio-Medical Named Entity Recognition 45
gene and proteins. Whereas in JNLPBA corpora [12], consist annotations of gene
and proteins only. Therefore, a small amount of total annotated data is compromised
for each entity.
Discuss multitask learning model, which is used to train a single model for mul-
tiple tasks at the same periods. MTL can influence by distinct datasets collected
for different but related task [13]. Although the extraction of gene entity is entirely
different as compare to chemicals entity. Both the task requires the learning of some
standard features which can help to access the linguistic expression of biomedical
text. Crichton et al., [14] developed a multitask learning model which was trained
by the various datasets that contain annotation of different types of entities. MTL
model proposed by Wang et al., [15] performs better as compare to other states of
the art methods, single task named entity recognition models. This much of litera-
ture review inspire us for the proposed model, proposed model is a combination of
multiple models. As previous conventional multitask learning method which only
uses a single-task model. The proposed model trained different datasets for different
tasks. The proposed model is used to train an annotated dataset for a particular type
of entity so that it becomes trained for its own entity type. The major drawback in
multitask learning methods are, it produces high recall and low precision value. So
multitask learning method based models, train multiple types of entities and having
a more extensive training dataset. The coverage of various biomedical entities is
broader, which resulting in a higher recall. On the other side, MTL based models
trained a combination of different type of entities, which create difficulty to differ-
entiate among a different kind of entity, which results lower precision value. One
more reason for that named entity recognition is said to be difficult in the field of the
biomedical domain is that NER labeled as a different entity type based on the textual
context. In this chapter, observed that many false prediction tents to the polysemy
problem. For example, a word can use as a disease name and a gene name. Model
designed to labeled disease entity mistakenly labeled gene as a disease this mistified
problem of entity tends towards the false positive rate. Example, BI-LSTM-CRF
models for labeling disease type of entity incorrectly label the gene name “BRCA1”
as a disease type of entity because there exist disease name as a “BRCA1 abnormal-
ities” or “brca1 deficient” in the training dataset. Besides, in training data set one
annotates as “VHL” (Von Hippel-Lindau disease) is a disease entity which confuses
the model because “VHL” also used for the gene name and the after the mutation of
the gene is converted into a disease. For solving the false positive which is arises due
to the polysemous words, a proposed model is introduced, in which “BRCA1” utilize
the outputs of a chemical and gene models. Once it predicts as a gene, it informs
to the disease model that it identifies “BRCA1” as a gene, so that disease model
will not need to predict as a disease. In the proposed model, each model is trained
individually of its entity type and further train with the output of another model to
train the other kind of entity.
The remaining chapter is organized as follows: Sect. 2 describes the basic concepts
of Conditional Random Field (CRF), Long Short Term Memory (LSTM) and Bi-
directional Long Short Term Memory (BILSTM) which is used in the field of Deep
Learning; In Sect. 3 proposed methodology is present by using biomedical datasets;
46 A. Kumar
2 Background
In the following section, the deep techniques has been applied on biomedical named
entity recognition. The brief introductions about three approaches are as follows.
Long Short Term Memory (LSTM) is the Recurrent Neural Network (RNN) based
neural network which efficiently managed variable-length inputs. Research has
proven that RNN is useful in various NLP tasks like speech recognition, language
modeling, and machine translation [16, 17], RNN based LSTM variants are mostly
used [18]. The proposed model uses the LSTM framework from Graves et al., [16].
The following steps are used to calculate the hidden states by given the output of the
embedding layer.
Disambiguation Model for Bio-Medical Named Entity Recognition 47
f t = σ Wx f xt + Wh f ht−1 + b f (2)
ht = ot tanh(ct ) (5)
where logistic hyperbolic tangent function and sigmoid function and denoted as tanh
and σ respectively and use for element-wise product. Forward LSTM is used to
extract represent of input in a forward direction and backward LSTM, which represent
the input in a backward direction. The concatenation of forward and backward LSTM
create the hidden state which is proposed by Schuster and Paliwal [19], and it was
frequently is used in various sequence encoding task.
Output CRF
Layer Algorithm
Input Word2Vec
Layer Representation
model, where the CRF model tags the input tokens sequence according to tagging
scheme.
The probability of each label given in the sequence S = w1 . . . , wn are calculated
by the following equations.
z t = W y h bi
t + by (6)
exp a j
so f tmax a j =
k exp ak
where W y and b y shows in Eq. (6) are the parameters of fully connected layer for BIO
tagging scheme, and to calculate the probability of each tag, softmax (.) function is
applied. Based on probability p from Eq. (7) the training objective is to minimize by
following steps.
N
L L ST M = − log p(yt |w1 , . . . , wn ; Θ) (8)
t=1
T
LC RF = − A yt−1, yt + z t,yt (9)
t=1
Loss = L L ST M + L C R F (10)
where L L ST M is use for cross entropy loss for the label yt and L C R F stands for the
negative sentence-level log likelihood. A yt−1, yt , z t,yt shows the transition and emission
score respectively and summation of A yt−1, yt , z t,yt gives the tag score.
3 Methodology
This section describes the architecture of proposed model. The combination of mul-
tiple datasets like NCBI [11], BC5CDR [20], JNLPBA [21], BC5CDR [22] are
considered as an input dataset Fig. 2 shows the architecture of the proposed model.
The following steps describe the proposed model.
1. All the biomedical dataset first combine and sent it to the individual model.
2. Each model trains the dataset according to its bio-entity type and send it to the
max pooling function.
3. The function of max pooling is to progressively reduce the dataset size of the rep-
resentation to reduce the number of parameters and computation in the network.
Pooling layer operates on each feature map independently.
Disambiguation Model for Bio-Medical Named Entity Recognition 49
4. The activation function introduced nonlinearity in the output of the neuron. Then
it sends to the targeted model.
5. The proposed model again combine with Conditional random field (CRF) to give
the sequential tagged output.
6. Target output will give the annotated tagged dataset.
In this chapter, Deep Learning concept is introduced. To handle the deep learning-
based method a very big amount of dataset are required and biomedical dataset are
capable to fulfill the requirement of deep learning-based methods.
The advantage of using the deep learning-based methods in the biomedical named
entity recognition is it reduce the probability of error. It will later on discuss in the
evaluation section.
50 A. Kumar
4 Evaluation
In this section, four biomedical datasets are considering for experimental research
named as NCBI [11], BC5CDR [20], JNLPBA [21], BC5CDR [22] all four men-
tioned datasets are collected by the Chichton et al., [14]. These four datasets con-
structed from MEDLINE abstracts [23] and each dataset concentrate one of the three
biomedical entity type gene or protein, disease, and chemical. Cell type entity tags
from JNLPBA did not consider in this research. All datasets consist of input sen-
tences of the biomedical entity. JNLPBA contain training and testing dataset while
remaining three contain development, training, and testing dataset. JNLPBA used a
small part of training dataset as a development dataset, which is approximately equal
to the size of test datasets. JNLPBA dataset from Crichton et al., [14] Contain split
sentences. This chapter needs original dataset developed by Kim et al., [20] which
contain more accurate sentence separation. The description of the datasets shown in
Table 1.
TP
precision(P) = (11)
T P + FP
TP
Recall(R) = (12)
T P + FN
2× P × R
F − measur e(F1 ) = (13)
P+R
Disambiguation Model for Bio-Medical Named Entity Recognition 51
where:
• TP stands for (True Positive) = total number of correct entities in sequence.
• TP + FP stand for (False Positive) = total number of ground truth entities in
sequence.
• TP + FN stands for (False Negative) = total number of predictive entities in
sequence.
to compare his model with the other models. The iterative result denoted by the aster-
isks symbol in the table. The proposed model performs ten times with ten different
initialization and then take the arithmetic mean of all the four datasets to evaluate
the performance of each model.
The proposed model as shown in Table 2, gain higher precision as well as F1 score
as compare to MTM model on all datasets. The proposed model able to improve both
precisions as well as recall. The proposed model also performs better as compared
to the Multi Task Model (MTM) from Wang et al., [15] on four datasets. The pro-
posed model consists of the expert training model for each entity type, which further
enhances biomedical named entity recognition performance.
When the proposed model compared with baseline models, the Proposed model
achieves higher precision on average. Even though if the slight increase in recall,
the increase in precision is more valuable than that of recall when considering the
practical use of the bio-NER model. The strong probability of repeating important
information in a large size corpus, but it may not create any problem in the perfor-
mance of the named entity system it will be compensated in another place. However,
false information and error propagation can affect the entire system.
Recognizing biomedical entity as different bio-entity type is the type of bio-
entity error. For instance, ‘VHL’ a gene recognize as a disease when it was used in
the sentences is a type of bio-entity error. The interesting thing is, bio-entity error
generally occurs when the bio-entities are confusing or entity contain multiple words
(e.g. BRCA1). The error comes out from MTM are 4334 whereas proposed model on
four datasets (BC5CDR-disease, BC4CHEMD, JNLPBA, NCBI) gives 3966 which
is 368 less as compared to the MTM Model. Proposed shows the best performance
in error analysis.
The inaccuracy investigation on STM which is a single LSTM-CRF model shows
a lot of errors while classified bio-entity in JNLPBA. It contains 49.3% of total errors
of JNLPBA. The error investigates on the MTM model is a bio-entity error which
contains 1333 out of 4334 errors which are 38% of incorrect error. The bio-entity
type of error is much greater as comparison to the other type of errors like a span
error which was the most common error type, which contain 38% of incorrect errors.
While most span errors tend from subjective annotations or can be easily fixed by
non-experts, bio-entity errors are difficult to detect, even for biomedical researchers.
Also, for biomedical text mining methods, such as drug-drug interaction extraction,
span errors can cause minor errors. bio-entity errors could lead to entirely different
results.
Disambiguation Model for Bio-Medical Named Entity Recognition 53
In the proposed model, every expert model trained single entity type dataset,
and the output of the training data is concatenated with word embedding. Other
expert models share knowledge to the targeted model as shown in Fig. 2 so that the
bio-entity type error problem will reduce. Table 2 shows and thus, 736 errors are
bio-entity errors, covers 18.6% of all the errors.
6 Conclusions
Conclude this paper with the introduction to the proposed model, which contains mul-
tiple bidirectional LSTM-CRF (BILSTM-CRF) Model for recognition of biomedical
entities. Most of the state of the art methods are capable of handling only a single
type of entity. The proposed model can handle multiple datasets along with higher
F1 Scores. Dissimilar to the multi-task models, Proposed model used various single
task NER models, which relay more information to other models for achieving the
highest precision. To enhance the performance over multi-task models, Proposed
model categorized biomedical entity which is polysemous or which have the same
orthographic feature. As a result show, Proposed model achieving excellent results
as a comparison to the related work proposed on four BioNER datasets in term of
precision, recall, and F1 Score. Although there is some computational overhead in
this proposed model, when it gives an accurate result, it does not make any sense.
This proposed model will be imposed on a geospatial dataset in future.
Acknowledgements The authors would like to thank the National Institute of Technology Raipur
for providing necessary infrastructure and facility for doing research.
References
1. Zhong, H., Hu, X.: Disease named entity recognition by machine learning using semantic type
of metathesaurus. Int. J. Mach. Learn. Comput. 3(6), 494–498 (2014)
2. Collier, N., Nobata, C., Tsujii, J.: Extracting the names of genes and gene products with a hidden
Markov model, vol. 1. In: Proceedings of the 18th Conference on Computational Linguistics,
pp. 201–207 (2000)
3. Zhou, G.D.: Recognizing names in biomedical texts using mutual information independence
model and SVM plus sigmoid. Int. J. Med. Inf. 75(6), 456–467 (2006)
4. Lafferty, J., Mccallum, A., Pereira, F.C.N., Pereira, F.: Conditional Random Fields, pp. 282–289
(2001)
5. McCallum, A., Freitag, D., Pereira, F.C.N.: Maximum entropy markov models for information
extraction and segmentation. In: Proceedings of the Seventeenth International Conference on
Machine Learning, Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp. 591–598
(2000)
6. Neves, M.L., Carazo, J.-M., Pascual-Montano, A.: Moara: A Java library for extracting and
normalizing gene and protein mentions. BMC Bioinf. 11(1), 157 (2010)
7. Sahu, S.K., Anand, A.: Recurrent neural network models for disease name recognition using
domain invariant features. ArXiv E-Prints. arXiv:1606.09371 (2016)
54 A. Kumar
8. Habibi, M., Weber, L., Neves, M., Wiegandt, D.L., Leser, U.: Deep learning with word
embeddings improves biomedical named entity recognition. Bioinformatics (Oxford, England)
33(14), i37–i48 (2017)
9. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures
for named entity recognition. ArXiv E-Prints. arXiv:1603.01360 (2016)
10. Pyysalo, S., Ginter, F., Moen, H., Salakoski, T., Ananiadou, S.: Distributional semantics
resources for biomedical text processing. In: Proceedings of the 5th Languages in Biology
and Medicine Conference (LBM’13), pp. 39–44 (2013)
11. Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition
and concept normalization. J. Biomed. Inform. 47, 1–10 (2014)
12. Goulart, R.R.V., Strube de Lima, V.L., Xavier, C.C.: A systematic review of named entity
recognition in biomedical texts. J. Braz. Comput. Soc. 17(2), 103–116 (2011)
13. Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
14. Crichton, G., Pyysalo, S., Chiu, B., Korhonen, A.: A neural network multi-task learning
approach to biomedical named entity recognition. BMC Bioinf. 18(1), 368 (2017)
15. Wang, X., Zhang, Y., Ren, X., Zhang, Y., Zitnik, M., Shang, J., Langlotz, C., Han, J.: Cross-
type biomedical named entity recognition with deep multi-task learning. ArXiv E-Prints. arXiv:
1801.09851 (2018)
16. Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks.
In: IEEE International Conference, Department of Computer Science, University of Toronto,
no. 3, pp. 6645–6649
17. Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. ArXiv
E-Prints. arXiv:1508.06615 (2015)
18. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780
(1997)
19. Song, M., Yu, H., Han, W.-S.: Developing a hybrid dictionary-based bio-entity recognition
technique. BMC Med. Inform. Decis. Mak. 15(1), S9 (2015)
20. Kim, J.-D., Ohta, T., Tsuruoka, Y., Tateisi, Y., Collier, N.: Introduction to the bio-entity recogni-
tion task at JNLPBA. In: Proceedings of the International Joint Workshop on Natural Language
Processing in Biomedicine and Its Applications, Association for Computational Linguistics,
Stroudsburg, PA, USA, pp. 70–75 (2004)
21. Kim, J., Ohta, T., Tsuruoka, Y., Tateisi, Y., Collier, N.: Introduction to the bio-entity recognition
task at JNLPBA, 70–75 (n.d.)
22. Krallinger, M., Rabal, O., Leitner, F., Vazquez, M., Salgado, D., Lu, Z., Leaman, R., Lu, Y.,
Ji, D., Lowe, D.M., Valencia, A.: The CHEMDNER corpus of chemicals and drugs and its
annotation principles. J. Cheminf. 7(Suppl 1 Text mining for chemistry and the CHEMDNER
track), S2–S2 (2015)
23. Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition.
In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning,
Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 147–155 (2009)
24. Campos, D., Matos, S., Oliveira, J.L.: Biomedical named entity recognition: a survey of
machine-learning tools. In: Sakurai, S. (ed.) Theory and Applications for Advanced Text Min-
ing (2012)
25. Fukuda, K., Tamura, A., Tsunoda, T., Takagi, T.: Toward information extraction: identifying
protein names from biological papers. In: Pacific Symposium on Biocomputing, pp. 707–718
(1998)
26. Kim, S., Chen, J.Y., Cutello, V., Lee, D.: DTMBIO 2016: The Tenth International Workshop on
Data and Text Mining in Biomedical Informatics. In: Proceedings of the 25th ACM International
on Conference on Information and Knowledge Management, pp. 2511–2512 (2016)
Disambiguation Model for Bio-Medical Named Entity Recognition 55
Ashutosh Kumar completed his Bachelor of Engineering (B.E.) in 2014 from Rajiv Gandhi
Proudyogiki Vishwavidyalaya, Bhopal in Computer Science and Engineering. He received his
Master of Technology (M.Tech) degree in 2017 from the Central University of Rajasthan in Com-
puter Science and Engineering. Currently, he is Ph.D. Research Scholar in National Institute of
Technology Raipur. His area of research interest includes “Text mining for biomedical literature”
and “Named Entity Recognition.”
Applications of Deep Learning
in Healthcare and Biomedicine
and computational biology research in the public health domain. In the end future
scope of deep learning algorithms would be discussed from a modern healthcare
perspective.
1 Introduction
In the last 10–15 years there has been a drastic advancement in data acquiring tech-
nologies in the field of life sciences, together with improvements in computational
biology and techniques of digital storage which has transformed modern biology
into a data-rich science from a data-poor one. Owing to this development, research
today is data-driven and there are multiple potential solutions to a biological prob-
lem today, unlike before where one question granted one answer. Bioinformatics
deals with assisting in the handling of this large dataset in different aspects, be it
storing, extracting or analyzing data. Techniques of extracting data are further han-
dled by computational biology techniques using programming and algorithms. This
set of methods used to discover meaningful relationships, patterns and functions in
biological data is called ‘data mining’.
In the early 1990s, an area of study that became a popular part of Computational
Science is ‘Soft Computing’. Soft computing as a term represents all the methodolo-
gies that provide flexible information processing capability while handling ambigu-
ous real-life situations. While hard computing aims for precision, soft computing
deals in the domain of partial truth, ambiguity, inaccuracy and approximation to
obtain the solution for a problem [1, 2]. Earlier this was not possible and only simple
systems could be precisely analyzed and modelled by computational approaches,
while systems of medicine, biology, management studies, humanities and other sim-
ilar fields of study which are more complex remained difficult to control by conven-
tional analytical and mathematical methods. However, soft computing techniques
complement each other and therefore biological processes are more closely resem-
bled by soft computing techniques than traditional techniques that mostly work on
logic, such as predicate logic and sentential logic. The main constituents of soft com-
puting are—genetic algorithms, rough sets, fuzzy logic, neural networks, and signal
processing tools such as wavelets. Of these, neural networks have a wide scope in
terms of classifying and representing biological data computationally. Neural net-
works are strong and exhibit good learning and generalization abilities in data-rich
environments [3]. The algorithms used in Neural networks are called Machine Learn-
ing Algorithms.
Applications of Deep Learning in Healthcare and Biomedicine 59
improving with experience, without any specific programming. To do this, the sys-
tem must be made conversant with a dataset, which is called ‘training data’. Majorly
the two machine learning methods—supervised and unsupervised learning train an
algorithm. At the time of training, a certain set of instructions are provided and it is
from them that Supervised learning generates a function that reproduces the output
[2]. The training process is called to ‘regression’ when the data in output has a con-
tinual value and ‘classification’ when a categorical value is present in output data
[6]. Unsupervised learning includes the creation of a function that considers the hid-
den structures from unlabeled input data, unlike supervised data. During the training
phase pre-processing of training data set is done and important features are extracted.
Preprocessing involves noise reduction, feature extraction, image rectification and
similar operations. For every new application, it is necessary to design features in
a new way because feature extraction is a challenging task, especially when it is of
medical importance. This process is frequently called “hand-crafting” of features, in
the deep learning literature. Depending on the feature vector x Rn, the classifier
must predict the precise class y, which is characteristically assessed by a function ŷ
= f(x) which gives the classification result ŷ directly. The parameter vector θ of the
classifier is obtained during the training phase and later checked on a separate test
data set.
Artificial Neural Network is a well-known classification and regression algorithm
in Machine learning, which represents the units of several layers in the computational
analysis by imitating the architecture and signal transmission of the neurons and their
synapses in the human brain. The ANN consists of interrelated artificial neurons
where each neuron implements a simplistic classifier model that gives a decision
signal as outputs based on a certain weighted summation of proofs. A wide number
of these basic computational elements are accumulated together to form the ANN [7].
Here the features of the network are trained by a certain valuable algorithm like the
‘back-propagation’ algorithm, where signals from the input and anticipated decision
outputs are presented in pairs, mirroring the situation where the brain focuses on an
external stimulus of sensation to learn to achieve specific jobs (Fig. 1).
Machine learning features used in input data can be numerical and nominal values.
Defining logical and powerful features is fundamental to machine learning studies.
ANN has shown extraordinary performance in numerous areas, but also drawbacks
such as a decline in the local minimum during optimization, and overfitting (over-
training) for certain values. Artificial neural networks based predictive techniques
have over the last few years shown incredible capabilities in solving problems of non-
linear modelling [8] in various applications, but most of these methods composed
of shallow architectures because of problems related to deep networks training. Due
to fast learning algorithms that have been proposed recently deep architecture has
attracted a lot of consideration lately especially since deep ANNs have proved to
outperform conventional methods of pattern recognition, classification and machine
learning domains. DNN is composed of a series of layers stacked. Prediction is made
found on the first layer i.e. the input. Output in the last layer predicts a class or value.
Hidden layers are those between the input and output layers, and they are called so
because their condition does not relate to data that is observable.
Applications of Deep Learning in Healthcare and Biomedicine 61
Fig. 1 A conceptual analogy between real neurons (on the left) and artificial neurons (on the right)
The multi-layered construction of the neural networks permits them to make more
complicated decisions. For explicit training models, each edge demands weights that
are optimized. These weights use the sum of a wide number of characteristics and are
initialized at random and eventually organized by a good algorithm for optimization
like the ‘gradient descent’ algorithm. After the application of training sample data
to the network, there is an evaluation of a loss function between the target class and
the prediction. All features are thence mildly updated towards the course that would
be favouring the minimization of a loss function. On the basis of these networks,
numerous classes of deep learning exist, all with varied approaches. Depth of layers
is extended by DNNs as compared to the traditional ANN, along with a demonstration
of better performance in recognition studies and prediction, when the layers become
complicated.
Deep learning can go unsupervised unlike shallow learning (supervised), and with
little guidance, it learns uniquely complex patterns from raw data of high dimension
[12]. This optimization is called as the tradeoff of the breadth or depth; that is. Deep
learning has demonstrated its usefulness in—language and image recognition, video
games, replication of painting styles or even classical composition of music.
Representation learning is the type of learning required in these tasks; where there
is detection/classification of patterns from unprocessed raw data, especially at times
when the data in question is hierarchical in construct. For example, Image recog-
nition starts with learning a pecking order of sub-images from pixels with edges,
and then motifs, up until the final output is a full object. Being particularly unsuper-
vised, deep neural network algorithms can act as feature detector units at each layer
which slowly but ultimately extract more sophisticated and invariant features from
the original input signals [12, 13]. Machines can now accurately identify millions
of images which seems like an impossible task as per human standards. Using deep
learning machines are able to learn to differentiate between similar objects or a sen-
tence with high accuracy. They have also motivated the machine learning community
towards bringing to fruition the idea of automation of tasks such as image recogni-
tion, prediction, classification and annotation in biology, where the huge complexity
and vastness of data now overshadows human analytical capabilities [14, 15].
Deep learning has given rise to immense possible and ongoing applications across
the world, both in the Biological and non-biological domains.
The application of deep learning has increasingly progressed ever since the advent
of Convoluted Neural Network in early 2000. It has been since used for numerous
applications with wide success such as image segmentation and face recognition.
However, these did not gain much attention in research and the industry, at least not
before 2012 in an open ImageNet Competition, that comprised of millions of images
for training and 150,000 pictures exclusively for verification and testing [16]. This
competition created a new field and had an enormous effect, leading the researchers to
collaborate and compete, without making them collect a large-scale labelled dataset
[5]. ‘Dropout’—a new technique for regulating, and a novel image extension skill,
were used to improve the results of this competition. Furthermore, big giants in the
IT and AI world such as Microsoft, Google and Facebook started considering image
recognition using algorithms of deep learning as important areas of research. Post
this, techniques in deep learning showed a 16% error rate in 2012 and it diminished
to 3% and below in 2016, therefore surpassing object classification performance by
Applications of Deep Learning in Healthcare and Biomedicine 63
any human being. Object classification innovations have been relocated to semantic
segmentation and object localization. The RNN-based language model and CNN-
based image recognition framework were integrated to establish a visual questioning
and answering, and an image captioning system.
Another important area is speech recognition where computer science and elec-
trical engineering knowledge, and research in linguistics, and health care (including
radiology) can be combined. Technologies that bring about the translation and recog-
nition of the speech to text by computational equipment, including robotics and smart
technologies, have been developed by many researchers. Lately, due to advances in
deep learning and big data, there has been tremendous progress in speech recognition
[17]. This is evident from the numerous available speech recognition systems in mul-
tiple international firms, such as Facebook, Google, and by the numerous scientific
papers that have been published in the research field on this topic.
The expression profile of a gene can be considered a snap or image of the activities
taking place inside a given cell or tissue very much like how a picture (image) is
representative of the objects in an environment. Patterns of gene expression demon-
strate a cell’s physical state in the same way how objects in a picture are represented
by a pixel pattern. This is how similarities can be compared between biological data
and the kind of data deep learning has been quite successful with particularly, audio
and image data.
In quite the same manner how two very similar but classically different images
must be distinguished by deep learning algorithms regardless of background, two
very similar but classically different pathologies of the disease may be discerned
which is why thus discrimination of basic differences is absolutely essential. Invari-
ance and selectivity are needed for both gene expression analysis and image recog-
nition and are also two descriptors of CNNs [18].
Very similar analogies can be made with other deep learning applications; for
example, language prediction, requires sequential learning with RNNs and this is
very similar to signaling in biology, where one occurrence can be predicted from
previous occurrences in the same way that a word in a sentence can be predicted
from the preceding group of words. Another similar example would be the structural
prediction of biological targets such as proteins.
While these parallel comparisons are illustrative in nature, they also have various
advantages together with DNNs that reinforce their case for biological applications.
First and foremost, deep networks require the datasets for successful analysis which
life science data more than enough provides. Also, DNNs are well designed to make
use of well spread, noisy, and high dimensional data having non-linear relationships,
which are quite endemic to data extracted in biology [18]. Furthermore, DNNs have
an ability to generalize i.e. if it is trained on a dataset once, it can be applied to
various other datasets as well, which as it turns out for the better, is already required
64 S. Mittal and Y. Hasija
Fig. 2 Deep Neural Network Assembly. a Input data—it consists of data from Electronic Health
Records, clinical data, and also molecular data from microarray, MRI, etc. b Data preprocessing—in
this step the source data is preprocessed before analysis by a deep neural network. Techniques of
standardization, normalization, noise reduction and others are being used. c Deep Neural Network—
pre-processed data is used in several hidden layers all the while extracting important features and
resulting in output layer with trained neurons. d Output—this result helps in various biomedical
and healthcare applications such as—diagnosis of disease, genotype-phenotype correlation, disease
prediction, studying pharmacogenomics and drug response, among many others
Let us review the current and possible applications in Biomedicine in this section
when it comes to deep learning.
3.1 Biomarkers
important when it comes to assessing the outcomes of clinical trial and identifying
diseases and monitoring them, specifically near diseases like cancer. For the modern
translational medicine identification of specific biomarkers with high sensitivity is a
big challenge [10, 2]. An essential tool for biomarker development is Computational
biology which may use any source of data, virtually speaking, from proteomics to
genomics.
One of the most successful applications of DNN across the world has been in image
analysis. Architectures of deep learning have proven better at recognizing objects
in pictures than human detection and traditional image recognition. As a result,
around the world, all advanced software systems use deep learning for image anal-
ysis involving object recognition, retrieval, and categorization [2, 14]. Naturally, in
medicine, this had been of great value to researchers and technicians in identifying
disease based on pictures of symptoms, especially in dermatological disorders but
also in images showing gene expression and internal body imaging [2, 10]. Convo-
luted Neural Networks evidently have shown to be most useful in this arena of image
analysis.
3.5 Splicing
Another area of Biomedicine where deep learning is highly used is splicing, which
is indicative of the biological activity in eukaryotic organisms. Current techniques
prove insufficient in regulating splicing be it the structure of spice site, its state or
splicing silencers or enhancers. But the most evident problem is of ‘raw reads’ at
splice code locations which are essentially shorter than actual genes with a really high
level of duplication [2, 10]. Deep learning comes to the rescue with high efficiency
when it comes to studying splicing mechanism and understanding splice codes,
outperforming Bayesian methods in splicing prediction.
Deep Learning represents data hierarchically and extract and learn from interac-
tions which are complex which is quite beneficial for protein network analysis. For
example, using phosphorylation data a deep learning a belief network (bimodal) was
created to predict the response of human cells to stimulus from the response of rat
cells to the same stimulus. The algorithm used showed a very high accuracy over
traditional approach [2, 10]. Also, analytical approaches (algorithms) of proteomics
do not require large training data, unlike other ML algorithms. It is also true that
proteomics is still very new to research compared to transcriptomics and contains
very less data for analysis.
Applications of Deep Learning in Healthcare and Biomedicine 67
Protein modelling, including folding and protein dynamic, comprise the study of
structural biology and chemistry. For good function prediction of enzymes, RNA
binding, substrate and antigen-binding, perfect structure determination is important.
Diseases such as Alzheimer’s and Parkinson’s are a result of the accumulation of
abnormal proteins which are identified through structural biology studies. Compar-
ative modelling is a technique to predict the secondary structure of a protein, based
on homology of the compound but due to a limited number of well-annotated com-
pounds, it is this is not easy [2, 10]. Applying deep learning using sequence has
greatly improved protein structure prediction. Certain proteins are particularly very
important even after lacking a unique structure. These proteins are called IDPs or
intrinsically disordered proteins with the domains without a continuous structure
called intrinsically disordered regions or IDRs. Deep learning algorithms have been
used to separate IDP/IDR from structured proteins. Back in 2013, ‘DNdisorder’—a
sequence-based deep learning predictor was published by Eickholt and Cheng which
was highly successful at predicting disordered proteins compared to other advanced
predictors. In 2015, ‘DeepCNF’ an even better predictor was developed which could
predict IDPs and particular proteins with IDRs by obtaining and analyzing data
from experiments. This proved to be a better algorithm than those used in ab initio
predictors.
With the findings of the human genome project, a huge amount of previously unex-
plored biological data has been obtained including genes, proteins and also knowl-
edge on processes describing how genes interact with the external environment to
produce proteins. Also, developments in life sciences and biotechnology have drasti-
cally reduced the cost of gene sequencing and directed disease treatment by genome
and proteome analysis [20]. Translational Medicine essentially involves the appli-
cation of research performed in basic biological laboratories at the clinical level by
making use of inputs from clinical observations. And Bioinformatics entails the use
of computational techniques and algorithms to critically store, represent or analyze
biological data including metabolites within cells, RNA expression, DNA sequence
and proteins [21]. Translational bioinformatics integrates these two fields; in the sense
that it involves the development of databases and algorithms to research basic cellu-
lar and molecular data by keeping enhancement of clinical care as the ultimate goal.
Simply put, research in translational bioinformatics unites molecular information
(small molecules, lipids, protein, RNA and DNA) with knowledge about clinical
entities (patients, symptoms, diseases, pathology reports, laboratory tests, clinical
images and drugs) to improve our biological understanding and ultimately patient
care. This has given rise to bioinformatics research in personalized medicine where
treatment is designed specifically for the individual and not generally for many.
Machine learning in the field of traditional bioinformatics comprises of 3 research
areas—process prediction, disease prevention and personalized treatment. These
domains are governed by three major areas of life sciences—genomics, epigenomics,
and pharmacogenomics. While genomics is the study of DNA structure, genes, cre-
ation of proteins and phenotypic expression for creation of targeted therapies, phar-
macogenomics is aimed at creating more effective drugs with minimal side effects
while providing specialized treatment for individual and epigenomics is the study of
effect of environmental factors on the interaction between and formation of proteins
[21].
Genetic variants among population and species are created as a result of alternative
splicing which is hence one of the popular areas of study involving machine learning.
Their understanding could be the steppingstone in detecting diseases early. Another
application of deep learning and machine learning algorithms in computational biol-
ogy is the protein-protein interaction study using QSAR (Quantitative Structure-
Activity Relationship) and CPI (Compound-Protein Interaction) [21]. These also
help in modelling proteins binding to RNA. Also, due to several reasons such as
transcriptional or translational errors, instability in the chromosome, cancer progres-
sion or differentiation of cells, DNA methylation affects the expression of DNA,
which is an area requiring more study using deep neural networks.
Applications of Deep Learning in Healthcare and Biomedicine 69
One of the most applicative fields of Deep Learning, showing great potential for
growth is biosensing in smart devices used in healthcare. The algorithms may be
used in devices to monitor calorie intake, assist those with partial vision, detect
irregularities in biomedical devices and more. Some of these applications have been
discussed in the subsections below.
Dieticians say that an optimum diet comprising a limited number of calories should be
consumed in order to stay healthy and fit. But today obesity is rampant to the level of
becoming an epidemic and being one of the causes of dangerous and chronic diseases
such as those related to the heart and others such as type II diabetes. This can be
overcome by keeping a track of the kind and quantity of consumed food and duration
and type of exercises or physical activities performed, all of which contribute to a
healthy disposition. However, to do this requires competent technology that is able to
select characteristics which may generalize from the numerous foods and activities
[21]. This is achieved using wearable devices and smartphones that monitor and
manage food intake and energy expense.
Recently a Calorie measurement system was created which acted as an assistant
that could estimate the number of calories in a food item in a picture and this infor-
mation then helps the consumer control or prevent health problems concerned with
a disease by controlling the intake of that food item [21]. The system is applied
by means of smartphones and uses Convoluted Neural Networks (CNNs), leading
to many more advanced techniques such as cloud computing on mobiles, size cal-
ibration and distance estimation that help in recognizing food type, estimating the
calories and for classification of human activities, such as—a baby crawling, some-
one falling (abnormal activities will raise alarm and inform the family members)
[21]. Also, on comparing different datasets of human activity recognition and the
performance of CNN based method on them it was found that deep learning method
is more generalizable as it has better classification accuracy. Furthermore, smart
wearable devices which are low powered are less efficient which is why they cannot
handle greater computational complexity needed in deep learning. In such situations
using preprocessing standardizing techniques is recommended as they decrease dif-
ferences caused by properties of sensor like orientation and position from changing
data in the input.
Individuals suffering from prolonged illness and those whose state is critical need to
be closely monitored and it is hence important to analyze discrepancies in their vital
70 S. Mittal and Y. Hasija
signs. Abnormalities, however, vary patient to patient and are affected by equipment
and noise. Machine learning techniques greatly contribute to this approach for detect-
ing irregularities [21]. EEG is an equipment used to record electronic brain activity;
in 2010 Wulsin et al. [22] proposed an approach to detect discrepancies in an EEG
using Deep Belief Network (DBN). These use large datasets which proved DBN
to be a more effective method even outperforming SVM [21]. In 2015 Wang et al.
[23] created a DBN which compressed the signal thereby resulting in 50% energy
saving while keeping the same neural decoding accuracy, which is a breakthrough
in developing low power implantable and wearable sensors.
These comprise devices that are used to understand object shape and volume and clas-
sify them by operating in the three-dimensional space. It could be used for patients
suffering from visual or audio impairment, speech impairment, etc. with the feedback
provided by the user in the form of gesture, tactile feedback or audio feedback. Deep
learning greatly helps in enhancing such devices; for example, in 2016 a CNN based
wearable device was proposed by Poggi et al. [24] to aid people having impaired
vision in detecting an obstacle. Similarly gestured based assistive devices have been
proposed for patients with audio impairment and also for a highly sensitive environ-
ment like during surgery where a touch-free human-computer interaction would be
preferable [21]. In fact, in 2015 Huang et al. [25, 26] had proposed a DNN based
method for recognizing sign language that used real-time data. However, many such
applications like gesture recognition are quite challenging due to a great number of
possible distinctions in hand postures and due to subsequent algorithm complexity.
This field aims to study a large amount of aggregated data in the medical domain
in order to augment and grow the decision support system in the clinical sphere as
well as increase healthcare data assessment for assuring good quality and easy access
to medical services. The EHR (Electronic Health Records) are very data-intensive
sources of information with respect to patient data including their drug prescriptions,
treatments recommended, diseases diagnosed, records of vaccinations and labora-
tory tests results from machines such as EEG and clinical images both internal and
external. Mining into this extensive dataset would certainly provide us with a greater
understanding of the disease and eventually improve its management [21]. How-
ever, there are several disadvantages to this. For example, due to irregular compiling
of information, there is complexity in data. Similarly, erratic delays between the
recognizing of disease and diagnosis of disease increase the complexity of learning.
Deep learning comprehends data depiction in both supervised and unsupervised
conducts and its accomplishments are greatly attributed to its capacity to learn unique
Applications of Deep Learning in Healthcare and Biomedicine 71
Through the means of analyzing the extent of disease and interaction with the envi-
ronment, public health is aimed at improving healthcare facilities, preventing diseases
and prolonging life. The domain of public health involves epidemic and pandemic
studies, and their applications include air quality checks, assurance of drug safety,
surveillance of epidemic, studies of environmental factors on lifestyle diseases such
as obesity. Computational methodologies help in creating models for such studies;
however, they are currently limited as they lack the ability to include real-time data
in the analysis, Deep learning however if incorporated promise a better and stronger
ability to generalize. This is because they are data-driven methods and are also able
to optimize the cost function with the availability of new datasets [10]. An example
of one such optimization algorithm is ‘stochastic gradient descent’, which is widely
used in DNNs. Therefore, for analyzing public health data deep learning methods
along with network analysis and recommendation systems are most advised.
Assessing and predicting air pollutant concentration is one such application of
deep learning. A system has been designed by Ong et al. in 2015 [31] which collects
data from sensors in more than 52 cities of Japan and based on this, it forecasts air
pollution level in the country [21]. The DNN method used is trained in an online
manner and comprises of stacked Autoencoders. However, it is also true, as was
found out, that deep learning techniques are affected by incomplete data of the
real world. Tracking of disease outbreaks by performing epidemiology studies and
assessment of lifestyle diseases through social media is another very interesting
application of deep learning in the health sector. Examples of such diseases are Ebola
and Influenza. In 2015, Zhao et al. [32] used Twitter to track the health of the public,
continuously and quite accurately applications [1]. Here, DNNs are used to check
for characteristics describing an epidemic and their changes with the environment to
track the development of the disease. Not just this but messages on twitter may also
72 S. Mittal and Y. Hasija
be used to study antibiotics and have shown a good forecast of intestinal diseases.
To classify antibiotic-related classes DBN was used while in 2016, Zou et al. [33]
used deep learning to identify three types of intestinal diseases. Furthermore, in
2016, Garimella et al. [34] used geographically marked pictures from Instagram
to track drinking, obesity, smoking and other lifestyle diseases and compared the
classification by users with deep learning annotations. The results of the study stated
that deep learning-based algorithmic annotations were more successful in predicting
and categorizing behaviours such as drug abuse and drinking.
Data from mobile phones such as texts or phone call location can greatly be used to
characterize the behaviour of human beings. This technique uses CNN, it is gaining
increasing popularity and is found highly accurate for prediction of gender and age of
individuals. Therefore, metadata of individuals, mobile networks, social media data
and EHRs help in forming policy for public health. This could also help in keeping
large scale surveillance for diseases and create alert mechanisms at their onset or
at the time when symptoms appear. However, collection of such personal data also
poses the risk of intrusion of one’s privacy be it through social media platforms like
Facebook, Twitter, Instagram or through databases containing sensitive data with
low security and prone to easy exploitation. Hence, the current situation requires
individuals to be able to control access to their private health information while at
the same time creating mechanisms to gain more information for large scale study
using deep learning algorithms (Table 1).
and requires skilled individuals such as GPU programmers. Scientists also find deep
learning unable to answer some important questions and provide solutions. For one,
many high-level visualizations obtained using deep learning are not easy to inter-
pret. Plus, there are sometimes no provisions to apply changes in case of any issue in
classification. Moreover, deep learning is not suitable for all kinds of diseases, par-
ticularly rare diseases. Evidence also suggests that DNNs can also be easily tricked
to obtain misclassified information by making minute changes in the input.
74 S. Mittal and Y. Hasija
6 Conclusion
References
1. Angermueller, C., Pärnamaa, T., Parts, L., Stegle, O.: Deep learning for computational biology.
Mol. Syst. Biol. 12(7), 878 (2016)
2. Cao, C., et al.: Deep learning and its applications in biomedicine. Genom. Proteom. Bioinf.
16(1), 17–32 (2018)
3. Rajeswari, K., Vivekanandan, N., Amitaraj, P., Fulambarkar, A.: A study on redesigning modern
healthcare using internet of things, pp. 59–69 (2017)
4. Jiang, F., et al.: Artificial intelligence in healthcare: past, present and future. Stroke Vasc.
Neurol. 2(4), 230–243 (2017)
Applications of Deep Learning in Healthcare and Biomedicine 75
5. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
6. Nelson, D., Wang, J.: Introduction to artificial neural systems. Neurocomputing 4(6), 328–330
(2003)
7. Jain, A.K., Mao, J., Mohiuddin, K.M.: Artificial neural networks: a tutorial. Computer 29(3),
31–44 (1996)
8. Pour, M.P., Seker, H., Shao, L.: Automated lesion segmentation and dermoscopic feature seg-
mentation for skin cancer analysis. In: Proceedings of the Annual International Conference of
the IEEE Engineering in Medicine and Biology Society, EMBS, pp. 640–643 (2017)
9. Norgeot, B., Glicksberg, B.S., Butte, A.J.: A call for deep-learning healthcare. Nat. Med. 25(1),
14–15 (2019)
10. Mamoshina, P., Vieira, A., Putin, E., Zhavoronkov, A.: Applications of deep learning in
biomedicine. Mol. Pharm. 13(5), 1445–1454 (2016)
11. Ching, T., et al.: Opportunities and obstacles for deep learning in biology and medicine. J. R.
Soc. Interface 15(141) (2018)
12. Min, S., Lee, B., Yoon, S.: Deep learning in bioinformatics. Brief. Bioinf. 18(5), 851–869
(2017)
13. Erickson, B.J., Korfiatis, P., Akkus, Z., Kline, T., Philbrick, K.: Toolkits and libraries for deep
learning. J. Digit. Imaging 30(4), 400–405 (2017)
14. Akkus, Z., Galimzianova, A., Hoogi, A., Rubin, D.L., Erickson, B.J.: Deep learning for brain
MRI segmentation: state of the art and future directions. J. Digit. Imaging 30(4), 449–459
(2017)
15. Miotto, R., Wang, F., Wang, S., Jiang, X., Dudley, J.T.: Deep learning for healthcare: review,
opportunities and challenges. Brief. Bioinf. 19(6), 1236–1246 (2017)
16. Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural
Comput. 18(7), 1527–1554 (2006)
17. Neapolitan, R.E., Neapolitan, R.E.: Neural networks and deep learning. In: Artificial Intelli-
gence, pp. 389–411 (2018)
18. Esteva, A., et al.: A guide to deep learning in healthcare. Nat. Med. 25(1), 24–29 (2019)
19. Faust, O., Hagiwara, Y., Hong, T.J., Lih, O.S., Acharya, U.R.: Deep learning for healthcare
applications based on physiological signals: a review. Comput. Methods Programs Biomed.
161, 1–13 (2018)
20. Kim, K.G.: Book review: deep learning. Healthc. Inform. Res. 22(4), 351 (2016)
21. Ravi, D., et al.: Deep learning for health informatics. IEEE J. Biomed. Health Inf. 21(1), 4–21
(2017)
22. Wulsin, D., Blanco, J., Mani, R., Litt, B.: Semi-supervised anomaly detection for EEG wave-
forms using deep belief nets. In: Proceedings—9th International Conference on Machine Learn-
ing and Applications, ICMLA 2010, pp. 436–441 (2010)
23. Wang, A., Song, C., Xu, X., Lin, F., Jin, Z., Xu, W.: Selective and compressive sensing for
energy-efficient implantable neural decoding. In: IEEE Biomedical Circuits and Systems Con-
ference: Engineering for Healthy Minds and Able Bodies, BioCAS 2015—Proceedings (2015)
24. Poggi, M., Mattoccia, S.: A wearable mobility aid for the visually impaired based on embedded
3D vision and deep learning. In: Proceedings—IEEE Symposium on Computers and Commu-
nications, Aug 2016, pp. 208–213
25. Huang, J., Zhou,W., Li, H., Li, W.: Sign language recognition using real-sense. In: 2015 IEEE
China Summit and International Conference on Signal and Information Processing, ChinaSIP
2015—Proceedings, pp. 166–170 (2015)
26. Tang, A., Lu, K., Wang, Y., Huang, J., Li, H.: A real-time hand posture recognition system
using deep neural networks. ACM Trans. Intell. Syst. Technol. 6(2), 1–23 (2015)
27. Shin, H.-C., Lu, L., Kim, L., Seff, A., Yao, J., Summers, R.M.: Interleaved text/image deep
mining on a large-scale radiology database for automated image interpretation. J. Mach. Learn.
Res. 17(1–31), 2 (2015)
28. Liang, Z., Zhang, G., Huang, J.X., Hu, Q.V.: Deep learning for healthcare decision making
with EMRs. In: Proceedings—2014 IEEE International Conference on Bioinformatics and
Biomedicine, IEEE BIBM 2014, pp. 556–559 (2014)
76 S. Mittal and Y. Hasija
29. Korzinkin, M., et al.: Deep biomarkers of human aging: application of deep neural networks
to biomarker development. Aging (Albany NY) 8(5), 1021–1033 (2016)
30. Nie, L., Wang, M., Zhang, L., Yan, S., Zhang, B., Chua, T.S.: Disease inference from health-
related questions via sparse deep learning. IEEE Trans. Knowl. Data Eng. 27(8), 2107–2119
(2015)
31. Ong, B.T., Sugiura, K., Zettsu, K.: Dynamically pre-trained deep recurrent neural networks
using environmental monitoring data for predicting PM2.5. Neural Comput. Appl. 27(6),
1553–1566 (2016)
32. Zhao, L., Chen, J., Chen, F., Wang, W., Lu, C.T., Ramakrishnan, N.: SimNest: social media
nested epidemic simulation via online semi-supervised deep learning. In: Proceedings—IEEE
International Conference on Data Mining, ICDM, Jan 2016, pp. 639–648
33. Zou, B., Lampos, V., Gorton, R., Cox, I.J.: On infectious intestinal disease surveillance using
social media content, pp. 157–161 (2016)
34. Garimella, K., Alfayad, A., Weber, I.: Social media image analysis for public health. In: Pro-
ceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 5543–5547
(2015)
35. Fakoor, R., Ladhak, F., Nazi, A., Huber, M.: Using deep learning to enhance cancer diagnosis
and classification. In: Proceeding of the ICML Work. Role Mach. Learn. Transform. Healthc.
(2013)
36. Spencer, M., Eickholt, J., Cheng, J.: A deep learning network approach to ab initio protein
secondary structure prediction. IEEE/ACM Trans. Comput. Biol. Bioinf. (2014)
37. Quang, D., Chen, Y., Xie, X.: DANN: a deep learning approach for annotating the pathogenicity
of genetic variants. Bioinformatics (2015)
38. Zeng, T., Li, R., Mukkamala, R., Ye, J., Ji, S.: Deep convolutional neural networks for annotating
gene expression patterns in the mouse brain. BMC Bioinf. (2015)
39. Ditzler, G., Polikar, R., Rosen, G.: Multi-layer and recursive neural networks for metagenomic
classification. IEEE Trans. Nanobiosci. (2015)
40. Wang, C., Liu, J., Luo, F., Tan, Y., Deng, Z., Hu, Q.N.: Pairwise input neural network for
target-ligand interaction prediction. In: Proceedings—2014 IEEE International Conference on
Bioinformatics and Biomedicine, IEEE BIBM 2014 (2014)
41. Tian, K., Shao, M., Wang, Y., Guan, J., Zhou, S.: Boosting compound-protein interaction
prediction by deep learning. Methods (2016)
42. Angermueller, C., Lee, H.J., Reik, W., Stegle, O.: DeepCpG: accurate prediction of single-cell
DNA methylation states using deep learning. Genome Biol. (2017)
43. Witteveen, M.J.: Identification and elucidation of expression quantitative trait loci (eQTL) and
their regulating mechanisms using decodive deep learning (2014)
44. Zhou, J., Troyanskaya, O.G.: Predicting effects of noncoding variants with deep learning-based
sequence model. Nat. Methods (2015)
45. Zhang, S., et al.: A deep learning framework for modeling structural features of RNA-binding
protein targets. Nucleic Acids Res. (2015)
46. Mansoor, A., et al.: Deep learning guided partitioned shape model for anterior visual pathway
segmentation. IEEE Trans. Med. Imaging (2016)
47. Shan, J., Li, L.: A deep learning method for microaneurysm detection in fundus images. In:
Proceedings—2016 IEEE 1st International Conference on Connected Health: Applications,
Systems and Engineering Technologies, CHASE 2016 (2016)
48. Fritscher, K., Raudaschl, P., Zaffino, P., Spadea, M.F., Sharp, G.C., Schubert, R.: Deep neural
networks for fast segmentation of 3D medical images. In: Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformat-
ics) (2016)
49. Avendi, M.R., Kheradvar, A., Jafarkhani, H.: A combined deep-learning and deformable-model
approach to fully automatic segmentation of the left ventricle in cardiac MRI. Med. Image Anal.
(2016)
50. Cheng, J.Z., et al.: Computer-aided diagnosis with deep learning architecture: applications to
breast lesions in US images and pulmonary nodules in CT scans. Sci. Rep. (2016)
Applications of Deep Learning in Healthcare and Biomedicine 77
51. Rose, D.C., Arel, I., Karnowski, T.P., Paquit, V.C.: Applying deep-layered clustering to mam-
mography image analytics. In: Proceedings of the 2010 Biomedical Science and Engineering
Conference, BSEC 2010: Biomedical Research and Analysis in Neuroscience, BRAiN (2010)
52. Wang, J., MacKenzie, J.D., Ramachandran, R., Chen, D.Z.: A deep learning approach for
semantic segmentation in histology tissue images. In: Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformat-
ics) (2016)
53. Xu, T., Zhang, H., Huang, X., Zhang, S., Metaxas, D.N.: Multimodal deep learning for cervical
dysplasia diagnosis. In: Lecture Notes in Computer Science (including subseries Lecture Notes
in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016)
54. Sun, L., Jia, K., Chan, T.H., Fang, Y., Wang, G., Yan, S.: DL-SFA: deeply-learned slow feature
analysis for action recognition. In: Proceedings of the IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (2014)
55. Ravi, D., Wong, C., Lo, B., Yang, G.Z.: Deep learning for human activity recognition: a
resource efficient implementation on low-power devices. In: BSN 2016—13th Annual Body
Sensor Networks Conference (2016)
56. Dolmans, D.H.J.M., Loyens, S.M.M., Marcq, H., Gijbels, D.: Deep and surface learning in
problem-based learning: a review of the literature. Adv. Health Sci. Educ. 21(5), 1087–1112
(2016)
Shubham Mittal is dynamic individual currently pursuing his master’s in bioinformatics from the
Delhi Technological University. Having completed his bachelor’s in biotechnology he is highly
motivated towards computational research in life sciences and possesses an in-depth knowledge
of the field. In his free time Shubham likes to play basketball, listen to music and read fiction.
Dr. Yasha Hasija is an Associate Professor in the Delhi Technological University. She holds a
bachelor’s and master’s degree in biotechnology and Ph.D. in Bioinformatics. Besides having a
sound academic foundation Dr. Yasha is a vibrant individual and a very good orator. Specializing
in genome informatics and interaction study with human diseases, some of her research interests
are—genetic analysis of dermatological disorders, tuberculosis study and role of human genetic
variations in age-related disorders.
Deep Learning for Clinical Decision
Support Systems: A Review
from the Panorama of Smart Healthcare
Abstract Innovations in Deep learning (DL) are tremendous in the recent years and
applications of DL techniques are ever expanding and encompassing a wide range
of services across many fields. This is possible primarily due to two reasons viz.
availability of massive amounts of data for analytics, and advancements in hardware
in terms of storage and computational power. Healthcare is one such field that is
undergoing a major upliftment due to pervasion of DL in a large scale. A wide vari-
ety of DL algorithms are being used and being further developed to solve different
problems in the healthcare ecosystem. Clinical healthcare is one of the foremost areas
in which learning algorithms have been tried to aid decision making. In this direc-
tion, combining DL with the existing areas like image processing, natural language
processing, virtual reality, etc., has further paved way in automating and improv-
ing the quality of clinical healthcare enormously. Such kind of intelligent decision
making in healthcare and clinical practice is also expected to result in holistic treat-
ment. In this chapter, we review and accumulate various existing DL techniques and
their applications for decision support in clinical systems. There are majorly three
application streams of DL namely image analysis, natural language processing, and
wearable technology that are discussed in detail. Towards the end of the chapter, a
section on directions for future research like handling class imbalance in diagnostic
data, DL for prognosis leading to preventive care, data privacy and security would
be included. The chapter would be a treat for budding researchers and engineers who
are aspiring for a career in DL applied healthcare.
1 Introduction
Deep Learning has showed its potential in recent years for reducing the large datasets
to a more abstract representation well suited for classification and prediction applica-
tions, which is the heart of many smart tech-systems. Majority of the DL algorithms
comprise of sequence of blocks embedded with primitive linear or non-linear oper-
ations that operate on the data flowing from a block to the other, thereby learning
a more condensed representation of the information contained in the dataset and
thereby aiding decision making process [1].
A healthcare system comprises of doctors, nurses, front-line managers, middle-
level managers, senior managers, and board of directors. Decision making process
is a crucial aspect of this group and the decisions taken by them can be classified
to be clinical and non-clinical. Clinical decision support systems (CDSS) are the
technology driven arrangements that assist a physician or any medical practitioner
for better decision making process. Under clinical systems, there are decisions taken
with respect to diagnosis, therapy, treatment and medical prescription, while the non-
clinical decisions include those taken with respect to resource allocation, budgets,
strategic planning, etc. Even though DL can assist in decisions of all kinds, in this
chapter we stick to the core objective of discussing about the use of DL in clinical
decision making.
Clinical decisions are one among the many complex and challenging decision
support systems, mainly because of the various measurable and non-measurable
attributes involved in decision making and complex relations that exist between those
attributes. The attributes include patients’ beliefs, lifestyles, experiences, education
level, diagnostic reports, historical health records and so on. DL algorithms can serve
as effective tools for supporting the decision making process, however the attributes
input to the algorithms must be measurable and quantifiable.
To understand where DL fits in the bigger picture of CDSS, let us look at a general
block diagram of a CDSS shown in Fig. 1.
The system has mainly three blocks: patient’s primary data [2]—this data com-
prises of observed symptoms, diagnostic reports, medical records of the patient, etc.,
secondary data comprises of data external to healthcare system which include the
patient’s food and drinking habits, sensitivity of body to certain allergens, patient’s
rights, etc., which are to be considered for more informed decisions regarding health-
care. The knowledge base refers to the historical medical records of other patients
stored in the database which can serve as a reference while taking decision with
regard to the current patient. In [3, 4], we see the authors coming up with a fuzzy-
rule based system referred as virtual clinic with an objective to automatically assign
doctors to patients and then assisting the doctors in giving prescriptions by using the
historical knowledge base. However, as this database expands, it may contain thou-
sands of entries making it almost impossible to search them thoroughly for informa-
tive records. This is where DL comes in handy with the tremendous representation
power of deep neural networks. Machine learning and/or deep learning models have
the potential to compress huge databases into abstract miniature representations and
Deep Learning for Clinical Decision Support Systems … 81
can almost replace the knowledge base, making it optional. These models can be
directly used to serve as a decision making tool in the CDSS. As an example, in [5],
authors use deep feed forward neural networks to predict the inpatient clinical order
patterns. The features considered were comorbidity, patient sex and race, Interna-
tional classification of diseases (ICD)-diagnosis codes and so on, from the electronic
health records. They concluded that deep neural network based model outperformed
standard of care human authored order sets in predicting actual clinical practices. In
[6], authors use convolutional neural networks on the electronic health records of the
patients and extract high-level semantic information of the diagnosis and generate a
report. This result is used as assistance for medical practitioners to conclude on the
health status. There are many such applications of ML and DL in different domains
of clinical healthcare and also potential applications of DL that can be explored in
the future, all of which will be covered in the later parts of this chapter.
The rest of the chapter is organized as follows: in Sect. 2, we review the existing
works on the applications of DL with image processing (computer vision) for CDSS,
Sect. 3 deals with the applications of DL in Natural Language Processing (NLP) for
CDSS while highlighting certain existing challenges, Sect. 4 deals with DL and
wearable device technology based CDSS, Sect. 5 looks at the issues involved in
using DL for CDSS, Sect. 6 discusses future research perspectives on the use of DL
for CDSS and finally, Sect. 7 summarizes and concludes the chapter.
Image analysis is an area that was well explored in smart healthcare. Ever since the
digital imaging came into existence, automated analysis of the images using naive
rule based architectures is being done. The era in the early 90s shifted to the use of
82 E. Sandeep Kumar and P. Satya Jayadev
simple machine learning algorithms to extract useful patterns and information from
the images. However, this involved a lot of hand engineering from deciding which
features to extract, how to extract, which algorithm to use for decision making, and
so on. The recent advances in DL come as a big relief since the architectural nature
of DL is so powerful that it can extract the features and approximate a prediction
function from the given data seamlessly. This very special potential of DL algorithms
made it a preferred tool for image analysis and computer vision applications.
In all the existing research works, the blocks involved in the image analysis are
similar. The blocks are summarized in Fig. 2: image acquisition includes various
methods using which images of an entity is captured, these images are passed through
the preprocessing stages where the images are filtered or subjected to manual bound-
ing box carving, and the modified image is passed into a convolutional neural network
(CNN) block for training. The obtained image is passed to an interpreter (optional)
which can be a fully connected network, autoencoders, and so on, that quantifies the
obtained image from the previous stages into a required form that is suitable for a
medical practitioner to understand. We shall now look at the various applications of
DL in analysis of medical images, proposed by researchers in recent years.
Convolutional neural networks are seen often in the works that use DL for image
analysis. The reasons for such an extensive usage of CNNs are: a CNN learns the
relevant features like how human brain extracts features from an image. Another
important characteristic of CNNs is weight sharing [7], where the kernels are shared
across an image which gives the advantages of learning the local patterns efficiently
and increasing the model efficiency by reducing the number of parameters involved
in the whole process. Transfer learning [7] which is explicitly used in image analysis
is easy in case of CNNs than conventional dense neural networks.
Let us see few works that use CNNs for image analysis tasks. In [8], the authors
review various medical imaging applications of DL. They notify that image analysis
has been carried majorly on pathology, lung, brain, cardiac, abdomen, breast, bone,
retina, etc. Alongside, various imaging modalities are used, such as MRI, CT, X-
ray, PET, ultrasound and visible range, of which MRI and visible light microscopic
imaging are majorly used in image analysis. In addition, the authors state that image
analysis techniques like segmentation, classification (for medical examination and
inferences, and object detection), and registration are widely studied. Among these,
segmentation of a required region of interest (RoI) and detection of an object in a
given image are most studied among image analysis methods due to their practical
implications.
In [9], the authors proposed a methodology for segmentation of regions of interest
applied to identifying heart chambers. The methodology has three parts viz. the first
part uses convolution neural networks (CNNs) [10] to locate the area containing left
ventricle (LV) in the image frame, the second part consists of stacked autoencoders
to infer the shape of the LV from the image fed from the first part, the third part
comprises of a Dense-NN to segment and deliver a binary mask of the LV. The
algorithm was trained and validated on a publicly available LV datasets (MRI scans)
obtaining an accuracy of 96.69%.
In [11], the authors propose a scribble based CNN for image segmentation task.
As stated by the authors, a completely automated DL algorithm performs poorer on
the unseen/test images and hence a bounding box is needed to concise the search
space for the algorithm. This bounding box based training method has provided better
segmentation accuracy. The work in [12], propose a method of multimodal image
segmentation where authors use MRI, PET and CT imaging. The images are passed
through three separate CNNs and the outputs are fused together to get a more precise
segmented image. In [13], the authors propose a novel architecture of using deep-
CNNs (DCNNs) to work collaboratively towards the segmentation of brain tumor
and skin lesions. The DCNNs are paired and whenever a DCNN misclassifies a data
input, a synergic error is produced that updates the whole network together with the
usual back propagated error. Similar kind of works using CNNs are presented below
in Table 1 in a confined manner.
Though CNNs are shown to be very effective in object detection and segmenta-
tion, they required datasets with large number of samples and correspondingly high
memory requirements and processing power. Also they fail to detect the variations
in pixel information at the boundaries. Therefore, authors in [18] propose a recur-
rent neural network (RNN) based architecture where it learns the level-set based
deformable models (LDMs, also known as the geometric or implicit active contour
models) evolving under constant and mean curvature velocities. The specific tasks
considered in this work were the segmentation of the Optic Disc and Cup in color
fundus images, cell nuclei in histopathology images and the left atrium in cardiac
MRI volumes. The block diagram will remain the same as in Fig. 2, however CNN
block is replaced by RNNs. Similar kinds of works that aim at medical segmentation
using CNNs and RNNs can be seen in [19–21]. The Table 2 shows the image datasets
being used in the majority of the research works related to medical imaging.
84 E. Sandeep Kumar and P. Satya Jayadev
Summary In this section various DL algorithms and their use in image analysis task
was reviewed. Majority of the existing works in image analysis focus on segmentation
and detection of RoI in images. DL architectures like CNNs and its variants can be
widely seen. RNNs were also applied for a few imaging tasks, and combination of
deep learning with naive machine learning techniques like support vector machines
(SVMs) are also encountered in the literature. Even though image segmentation using
machine learning techniques was studied for many decades, it was a tedious job to
extract meaningful information from the images (especially medical images) due
to a lot of hand feature engineering involved. Usage of DL algorithms reduced this
effort and clinical support systems reliant on the medical imaging inferences got a
tool to take the decisions in a timely manner. It is also observed that supervised deep
learning techniques were employed more than the unsupervised learning techniques
in the existing literature related to image analysis.
Deep Learning for Clinical Decision Support Systems … 85
can access their electronic health record (EHR), which is a real-time patient data
record, they cannot interpret it. One of the prime reasons is the lack of time from
the medical practitioners to make patients understand the EHR data. By using NLP,
one can understand the data and keep his health on check through suggested medical
prescriptions, daily activity chart, and so on. That apart, converting an image or a
pdf into informative text and thereby parsing and analyzing it to extract useful infor-
mation is another application of NLP. One best example is an IBM Watson machine
[41], where the machine is trained to run on the patient’s data and extract the risk
features and thereby predict possible diseases that could affect the patient.
Let us go through the existing works in the literature that uses NLP for clinical
support system. The work in [42] presents an approach on usage of NLP to extract
the potential medical conditions from the free-text medical reports. The entire pro-
cess here is composed of two main components: the background application and
the problem list management application. The background app is responsible for
extracting the information about possible medical conditions using rule based NLP
from the medical documents and stores it in a central database. Problem list man-
agement app accesses the data stored in the database, and concludes on the medical
problem of a patient. The work focused on 80 different types of medical conditions
like Arrhythmia or Ischemic heart disease; Mitral stenosis or Left bundle branch
block; Wheeze or Pain. In [43], authors propose an NLP based method to analyze
and compare the health records of the patients who are more likely to commit suicide
and who have already attempted suicide. The work is based on the fact that many
patients who are at the risk of committing suicide meet their physicians for consul-
tancy. This study used eNQUIRENet, a database that links EHR data across multiple
non-integrated primary care clinical organizations representing more than 3 million
patients and 1700 clinicians. Three sources were used to confirm that the patient
has a suicidal tendency—firstly searching ICD-9 codes (International Classification
of Diseases codes) indicating suicide attempt or ideation: E950–959 (attempt) and
V62.84 (ideation) from the EHR, second being parsing the HPI field (History of
Present Illness) to recognize the entries that are relevant to the symptoms of the
suicide like self harm, hang, cut attempts; third field is the PHQ-9 (Patient Health
Questionnaire) examination where the depression severity is recorded. The extracted
fields confirmed that suicide attempts is more likely seen than only ideation. A sim-
ilar work is seen in [44] to infer on the presence of acute bacterial pneumonia based
on chest X-ray reports of 292 patients using rule based NLP.
However, all the methods mentioned above do not use concepts of DL even though
they are considered to be NLP systems for clinical support. In this context, we
are proposing a method that uses CNN for text classification. The method has the
following stages: (i) extraction of keywords from the data records (ii) Converting the
word sequences from the text/sentences and medical codes to a vector form using
a look-up table/feature mapping process and (iii) classifying the text into disease
occurrence by feeding the obtained sequence of vectors to a sequence of convolution
and pooling layers. The block diagram shown in Fig. 3 explains this method. The
output of the classification layer can be used for any prediction or identification
purposes in CDSS.
Deep Learning for Clinical Decision Support Systems … 87
In a similar way, we can use RNNs for learning from texts. Figure 4 is a possible
architecture based on RNNs of which can be used for learning from text data. The
figure shows a series of RNN cells connected sequentially to form a network. The
words in the clinical text or the medical codes are fed as the input to this sequence
learner and the output can be taken from all the RNN cells or just the last RNN
cell based on the requirement. For instance, the text or symptomatic information
extracted from the EHR can be fed to these networks to predict the most probable
disease affecting the patient.
A similar network can be built by replacing vanilla RNNs in the architecture
by long short term memory (LSTM) cells. Usage of LSTMs has an advantage of
88 E. Sandeep Kumar and P. Satya Jayadev
carrying forward the information for a longer part of the sequence using a memory
cell and multiple gates. This helps the neural network to learn the changes in training
the dataset with fewer errors.
The following are a few links to the datasets often used in NLP for clinical support
and healthcare applications.
Dataset Remarks
MIMIC [45] Developed by MIT and has anonymised health record of approx.
40,000 critical patients
i2b2 [46] Health records of nearly 1500 patients
HealthData [47] Health data from US Federal Government
BCHC data platform [48] Health data from 26 cities, for 34 health indicators and across 6
demographic indicators
HMD [49] Human mortality database
MHealth dataset [50] Database of body motion and physical activities
Medicare [51] Data on services and procedures that physicians and other
healthcare professionals provided to Medicare beneficiaries
LSDB [52] Data related to life sciences
(continued)
Deep Learning for Clinical Decision Support Systems … 89
(continued)
Dataset Remarks
HCUP-US [53] Datasets contain encounter-level information on inpatient stays,
emergency department visits, and ambulatory surgery in US
hospitals
SEER [54] Data about cancer incidence segmented by demographic groups
such as age, race, and gender, provided by the US government
BROAD [55] Data categorized by project such as brain cancer, leukemia,
melanoma, etc.
Third is the relation extraction like which treatment effects what, which test is for
what and so on.
In the above discussions, few DL techniques like CNNs and RNNs are explained
in detail. However, there are other DL techniques that are widely used for NLP
applications for clinical decision support such as Boltzmann machines and its variants
like deep belief networks [57], autoencoders and its variants like sparse autoencoders,
variational and denoising autoencoders. In that context, in [58] authors used deep
belief networks (DBNs) that uses restricted Boltzmann machines (RBMs) as building
blocks for call-routing in call–center customer hotline that gives technical assistance
for a Fortune–500 company. RBMs have an advantage of extracting useful features
from the data using visible and hidden node architecture. The obtained features are fed
to the layers of RBMs to form DBN, and trained using Kullback–Leibler divergence.
In addition, DBNs are used as feature extractors for the traditional machine learning
algorithms like SVMs, Maximum entropy and boosting. The obtained results in
that work proves that combining DBNs with SVMs, provide better accuracy that
using those learning models individually for solving the call-routing problem. The
same method can be used to process speech in medical domain as well. Few other
applications of RBMs are seen in [59–61].
• Data heterogeneity: EHR data is available in different forms varying from hand-
written text to printed documents. DL algorithms must be able to parse and under-
stand this data. Specifically, clinical texts contain abbreviations, shorthand nota-
tions and vary from one clinician to another.
• Policy and data privacy issues: Training using DL algorithms requires large
datasets. Providing this data to DL researchers is always bound by the policies and
the privacy concerns of the patients.
• Deciding benchmarks: Since many researchers use their own private data they
are hesitant to share the data to other researchers and hence, setting a common
benchmark for a task in clinical support is difficult.
• Inherent problems of DL: These problems come from the DL algorithms them-
selves such as the choice of the model for a task, data size, tuning hyper parameters,
high performance hardware requirements, over fitting and under fitting issues,
generalization issues, flexibility (bias and variance tradeoffs) and multitasking
(learning multiple tasks together taking advantage of common knowledge) issues.
Summary Natural language processing (NLP) is one among the well sought areas
of deep learning research communities. The use of DL to understand and interpret
the health records saves time of clinicians while providing timely medications to
patients. NLP applications involve the use of a wide range of algorithms from simple
rule based data parsing techniques to usage of convolution and recurrent neural
networks. There are few challenges and issues in using DL based NLP for CDSS and
Deep Learning for Clinical Decision Support Systems … 91
privacy breach which arises due to transfer of data to an external site (cloud) can be
avoided.
In [65], the authors propose a new idea of using a smart phone as the sensing device
with DL programs running on the phone itself. The accelerometers, gyroscopes and
the magnetometer sensors available on smart phones are used to study the human
activity. The work contains use of SIFT (Scale Invariant Feature Transform) for
feature extraction from the signals picked up by the smart phone sensors and the
Deep Learning for Clinical Decision Support Systems … 93
obtained features are passed onto convolution neural network for classifying the
signal into a human activity.
In [66], the authors propose a complete architecture for CDSS based on wearable
technology and basic machine learning algorithms. The architecture contains four
tiers: tier-1 does pervasive monitoring of the physiological signals like ECG, EEG,
respiratory signals, oxygen and heart rate, body temperature, ankle and foot motion.
The obtained signals are passed to tier-2 which provides preliminary decision support
to the physicians even though accurate laboratory measurements are not yet avail-
able at this stage. In tier-3, a more detailed analysis of the patient combined with
the laboratory measurements is carried out. Finally tier-4 provides post-diagnostic
suggestions, prescriptions and so on. All these tiers are internally connected to a diag-
nosis engine that contains machine learning algorithms providing decisions to every
tier. All the laboratory test and diagnosis data is fed to that engine that provides
adequate decisions at every point of time. The machine learning assistance block
contains single or ensemble of learning algorithms. The authors have explored the
usage of random forest, naive bayes, K-nearest neighbor, SVM, best-first decision
tree and multilayer perceptron models for diagnosis inference. These models open-
up ways for exploring the usage of deep learning algorithms instead of traditional
ML algorithms.
A very interesting work is observed in [67], where the authors propose a method to
monitor the symptoms of mental health using wearable technology. The locomotion
data is picked by GPS, accelerometer and gyroscopes, speech is picked by micro-
phones in smart phones or watch, facial expressions by the camera in the phone, eye
blink pattern by camera, electrodermal activity by a smart watch, social interaction
pattern by voice calls, twitter and other social network data. Though not many details
are discussed as to how these signals can be utilized for monitoring mental health,
this opens up a new direction, where a CDSS based on learning of the mental health
signals can be designed using the same methodology dealt in [65].
In [68], use of wearable technology to remotely monitor elderly citizens is pro-
posed and referred it as Smart Healthcare Monitoring System (SW-SHMS). The
architecture of SW-SHMS has three main parts: patient’s environment where the
body is attached with sensors to read temperature, blood oxygen level, heart rate
and this sensed data is transmitted to the patient’s smart phone or a gateway device
via which the data reaches the cloud. The corresponding block diagram is shown in
Fig. 8.
Cloud performs various analytics on the data using machine learning and/or DL
algorithms to extract useful inference which is later sent to the monitoring platform
containing of the doctors who can take clinical decisions and take precautionary
measures.
According to the survey of existing works in [63], these are the list of DL algo-
rithms that are often seen in combination with wearable technology, they are: deep
unsupervised learning—restricted boltzmann machines, deep belief networks, deep
boltzmann machines, autoencoders and variational autoencoders, generative adver-
sarial networks and sequence learning; deep supervised learning—feed forward neu-
ral networks, deep neural networks, spike neural networks, sequence to sequence
94 E. Sandeep Kumar and P. Satya Jayadev
The following are a few problems that are still prevailing towards usage of DL for
CDSS: The following are a few problems that are still prevailing towards usage of
DL for CDSS:
1. Regulations and policies: There are no fixed rules and regulations for using
DL in clinical decision support systems. To overcome this difficulty, US FDA
Deep Learning for Clinical Decision Support Systems … 95
made the first set of regulations [69] for assessing AI systems in healthcare. The
guidelines mentioned by FDA clearly notifies about the use of data and adaptive
designs in clinical trials. In this direction, Arterys’ medical imaging platform
became the first FDA-approved DL platform for CDSS.
2. Data sharing: Training and validation of DL systems requires huge amount of
data, and the sharing of it among the hospitals and the DL experts. Currently
there are no incentives for people to share data and also they are bound by IP
rights and privacy policies. However, the data exchange is now slowly turning
towards a reward based system, one best example is the insurance companies
collect data from physicians for data analytics and also crowd sourcing of health
data is slowly booming up.
3. Data compatibility: Sometimes the data obtained by the machines and the pro-
cedures adopted in healthcare is often not useful for DL/ML systems due to lack
of compatibility with the algorithms in use.
4. Privacy issues: As already mentioned, health data is personal information of an
individual and many times family member, relatives and clinicians may refuse to
provide the data as a notion of privacy breach. To solve this DL experts came up
with the concept of distributed machine learning where the training and testing of
the learning algorithm will happen at the place where data is generated without
transferring it to the centralized cloud. However, the method might still take a
considerable amount of time to become acceptable to medical practitioners and
be regularly used by them.
5. Sociocultural issues: Most of the patients or clinicians do not trust the use of AI
in healthcare and in many cases people are more cautious to stake their lives or
careers for using AI. Also, people working in medical domain have feared job
insecurity due to the AI systems showing higher level of accuracy than human
experts. In addition, the concept of AI is not understandable by majority common
people in our society and there is fear due to unawareness and uncertainty.
6. Transparency: Many DL algorithms contain black boxes without much inner
details and lack in explaining the clinicians why certain prediction are coming
from an algorithm. This makes a clinician not to have much on trust AI based
systems.
7 Conclusions
References
1. Esteva, A., Robicquet, A., Ramsundar, B., Kuleshov, V., DePristo, M., Chou, K., Cui, C.,
Corrado, G., Thrun, S., Dean, J.: A guide to deep learning in healthcare. Nat. Med. 25, 24–29
(2019)
2. Safran, C., Bloomrosen, M., Hammond, W.E., Labkoff, S., Markel-Fox, S., Tang, P.C., Detmer,
D.E.: Toward a national framework for the secondary use of health data: an American medical
informatics association white paper. J. Am. Med. Inf. Assoc. 14(1), 1–9 (2007). https://doi.org/
10.1197/jamia.m2273. ISSN 1067-5027. PMC 2329823. PMID 17077452
3. Atta-ur-Rahman, M.I.B.A: Virtual clinic: a CDSS assisted telemedicine framework. In:
Telemedicine Technologies, chap. 15, 1st edn. Elsevier (2019)
4. Atta-ur-Rahman, S.M.H., Jamil, S.: Virtual clinic: a telemedicine proposal for remote areas
of Pakistan. In: 3rd World Congress on Information and Communication Technologies
(WICT’13), pp. 46–50, 15–18 Dec, Vietnam (2013)
5. Wang, J.X., Sullivan, D.K., Wells, A.J., Wells, A.C., Chen, J.H.: Neural networks for clinical
order decision support. AMIA Jt. Summits Trans. Sci. Proc. 2019, 315–324 (2019)
6. Yang, Z., Huang, Y., Jiang, Y., Sun, Y., Zhang, Y.-J., Luo, P.: Clinical assistant diagnosis for
electronic medical record based on convolutional neural network. Sci. Rep. 8(6329) (2018)
7. Yamashita, R., Nishio, M., Do, R.K.G., Togashi, K.: Convolutional neural networks: an
overview and application in radiology. Insights Imaging 9, 611–629 (2018). https://doi.org/
10.1007/s13244-018-0639-9. Springer Publications
8. Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., van der Laak,
J.A.W.M., van Ginneken, B., Sánchez, C.I.: A survey on deep learning in medical image
analysis. Med. Image Anal. 42, 60–88 (2017)
9. Avendi, M., Kheradvar, A., Jafarkhani, H.: A combined deep-learning and deformable-model
approach to fully automatic segmentation of the left ventricle in cardiac MRI. Med. Image
Anal. 30, 108–119 (2016)
10. Szegedy, C., Toshev, A., Erhan, D.: Deep Neural Networks for Object Detection. NIPS (2013)
11. Wang, G., Li, W., Zuluaga, M.A., Pratt, R., Patel, P.A., Aertsen, M., Doel, T., David, A.L.,
Deprest, J., Ourselin, S., Vercauteren, T.: Interactive medical image segmentation using deep
learning with image-specific fine tuning. IEEE Trans. Med. Imaging 37(7), 1562–1573 (2018)
12. Guo, Z., Li, X., Huang, H., Guo, N., Li, Q.: Medical image segmentation based on multimodal
convolutional neural network: study on image fusion schemes. In: IEEE 15th International
Symposium on Biomedical Imaging (ISBI 2018), 4–7 Apr 2018, Washington, D.C., USA,
pp. 903–907
13. Zhang, J., Xie, Y., Wu, Q., Xia, Y.: Medical image classification using synergic deep learning.
Med. Image Anal. 54, 10–19 (2019)
14. Koitka, S., Demircioglu, A., Kim, M.S., Friedrich, C.M., Nensa, F.: Ossification area localiza-
tion in pediatric hand radiographs using deep neural networks for object detection. PLoS One
13(11), e0207496 (2018). https://doi.org/10.1371/journal.pone.0207496
15. Deniz, C.M., Xiang, S., Hallyburton, R.S., Welbeck, A., Babb, J.S., Honig, S., Cho, K., Chang,
G.: Segmentation of the proximal femur from MR images using deep convolutional neural
networks. Sci. Rep. 8(16485) (2018)
16. Abd-Ellah, M.K., Awad, A.I., Khalaf, A.A.M., Hamed, H.F.A.: Two-phase multi-model auto-
matic brain tumour diagnosis system from magnetic resonance images using convolutional
neural networks. EURASIP J. Image Video Process. 2018, 97 (2018)
17. Kamnitsas, K., Ledig, C., Newcombe, V.F.J., Simpson, J.P., Kane, A.D., Menon, D.K., Rueck-
ert, S., Glocker, B.: Efficient multi-scale 3D CNN with fully connected CRF for accurate brain
lesion segmentation. Med. Image Anal. 36, 61–78 (2017)
18. Chakravarty, A., Sivaswamy, J.: RACE-net: a recurrent neural network for biomedical image
segmentation. IEEE J. Biomed. Health Inf.
19. Wang, S., He, K., Nie, D., Zhou, S., Gao, Y., Shen, D.: CT Male pelvic organ segmentation
using fully convolutional networks with boundary sensitive representation. Med. Image Anal.
(2019)
98 E. Sandeep Kumar and P. Satya Jayadev
20. Ambellan, F., Tack, A., Ehlke, M., Zachow, S.: Automated segmentation of knee bone and
cartilage combining statistical shape knowledge and convolutional neural networks Data from
the osteoarthritis initiative. Med. Image Anal. 52, 109–118 (2019)
21. Gao, Y., Phillips, J.M., Zheng, Y., Min, R., Fletcher, P.T., Gerig, G.: Fully convolutional
structured LSTM networks for joint 4D medical image segmentation. In: IEEE 15th interna-
tional symposium on biomedical imaging (ISBI 2018), Washington, DC, 2018, pp. 1104–1108.
https://doi.org/10.1109/isbi.2018.8363764
22. http://brainweb.bic.mni.mcgill.ca/brainweb/
23. http://braintumorsegmentation.org/
24. https://nihcc.app.box.com/v/ChestXray-NIHCC
25. https://www.cancerimagingarchive.net/
26. http://www.oasis-brains.org/#data
27. http://adni.loni.usc.edu/
28. https://fitbir.nih.gov/
29. http://cecas.clemson.edu/~ahoover/stare/
30. http://lbam.med.jhmi.edu/
31. https://www.insight-journal.org/midas/
32. http://archive.ics.uci.edu/ml/index.php
33. http://www.via.cornell.edu/databases/
34. http://www.eng.usf.edu/cvprg/
35. https://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI
36. http://www.isi.uu.nl/Research/Databases/SCR/
37. http://www.via.cornell.edu/crpf.html
38. http://peipa.essex.ac.uk/info/mias.html
39. http://www2.it.lut.fi/project/imageret/diaretdb1/
40. https://oai.epi-ucsf.org/datarelease/
41. IBM Watson Clinical Decision support system. https://www.ibm.com/watson-health/solutions/
clinical-decision-support
42. Meystre, S., Haug, P.J.: Natural language processing to extract medical problems from elec-
tronic clinical documents: performance evaluation. J. Biomed. Inf. 39(6), 589–599 (2006).
ISSN 1532-0464
43. Anderson, H.D., Pace, W.D., Brandt, E., Nielsen, R.D., Allen, R.R., Libby, A.M., West, D.R.,
Valuck, R.J.: Monitoring suicidal patients in primary care using electronic health records. J.
Am. Board Fam. Med. 28(1), 65–71 (2015). https://doi.org/10.3122/jabfm.2015.01.140181
44. Fiszman, M., Chapman, W.W., Aronsky, D., Evans, R.S., Haug, P.J.: Automatic detection of
acute bacterial pneumonia from chest X Ray reports. J. Am. Med. Inform. Assoc. 7(6), 593–604
(2000)
45. https://mimic.physionet.org/
46. https://www.i2b2.org/NLP/DataSets/Main.php
47. https://healthdata.gov/search/type/dataset
48. https://bchi.bigcitieshealth.org/indicators/1827/searches/34444
49. https://www.mortality.org/
50. https://archive.ics.uci.edu/ml/datasets/MHEALTH+Dataset
51. https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/
Medicare-Provider-Charge-Data/Physician-and-Other-Supplier.html
52. https://dbarchive.biosciencedbc.jp/index-e.html
53. https://hcup-us.ahrq.gov/databases.jsp
54. https://seer.cancer.gov/faststats/index.html
55. https://gengo.ai/datasets/18-free-life-sciences-medical-datasets-for-machine-learning/?utm_
campaign=c&utm_medium=quora&utm_source=rei
56. Shickel, B., Tighe, P.J., Bihorac, A., Rashidi, P.: Deep EHR: a survey of recent advances in
deep learning techniques for electronic health record (EHR) analysis. IEEE J. Biomed. Health
Inf. 22(5), 1589–1604 (2018). https://doi.org/10.1109/JBHI.2017.2767063
Deep Learning for Clinical Decision Support Systems … 99
57. Sarikaya, R., Hinton, G.E., Deoras, A.: Application of deep belief networks for natural lan-
guage understanding. IEEE/ACM Trans. Audio, Speech, Lang. Process. 22(4), 778–784 (2014).
https://doi.org/10.1109/TASLP.2014.2303296
58. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10),
1345–1359 (2010). https://doi.org/10.1109/TKDE.2009.191
59. Jin, Y., Zhang, H., Du, D.: Improving deep belief networks via delta rule for sentiment clas-
sification. In: IEEE 28th international conference on tools with artificial intelligence (ICTAI),
San Jose, CA, pp. 410–414 (2016). https://doi.org/10.1109/ictai.2016.0069
60. Jiang, X., Zhang, H., Duan, F., Quan, X.: Identify Huntington’s disease associated genes based
on restricted Boltzmann machine with RNA-seq data. BMC Bioinf. 18(1), 447 (2017). https://
doi.org/10.1186/s12859-017-1859-6
61. Tomczak, J.M.: Learning informative features from restricted Boltzmann machines. Neural
Process. Lett. 44(3), 735–750 (2016). https://doi.org/10.1007/s11063-015-9491-9. Springer
Publications
62. https://www.apple.com/in/watch/
63. Dargazany, A.R., Stegagno, P., Mankodiya, K.: Wearable DL: wearable internet-of-things and
deep learning for big data analytics—concept, literature, and future. Mob. Inf. Syst. (8125126),
20 (2018). https://doi.org/10.1155/2018/8125126
64. Xu, M., Qian, F., Zhu, M., Huang, F., Pushp, S., Liu, X.: DeepWear: adaptive local
offloading for on-wearable deep learning. IEEE Nat. Future Mob. Inf. Syst. Article ID
8125126, 20 (2018). https://doi.org/10.1155/2018/8125126TransactionsonMobileComputing,
https://doi.org/10.1109/tmc.2019.2893250
65. Ravi, D., Wong, C., Lo, B., Yang, G.: Deep learning for human activity recognition: a resource
efficient implementation on low-power devices. In: IEEE 13th international conference on
wearable and implantable body sensor networks (BSN), San Francisco, CA, pp. 71–76 (2016).
https://doi.org/10.1109/bsn.2016.7516235
66. Yin, H., Jha, N.K.: A health decision support system for disease diagnosis based on wearable
medical sensors and machine learning ensembles. IEEE Trans. Multi-Scale Comput. Syst. 3(4),
228–241 (2017). https://doi.org/10.1109/tmscs.2017.2710194
67. Abdullah, S., Choudhury, T.: Sensing technologies for monitoring serious mental illnesses.
IEEE Multimedia 25(1), 61–75 (2018). https://doi.org/10.1109/mmul.2018.011921236
68. Al-khafajiy, M., Baker, T., Chalmers, C., Asim, M., Kolivand, H., Fahim, M., Waraich, A.:
Remote health monitoring of elderly through wearable sensors. Multimed. Tools Appl. 78(17),
24681–24706 (2019). https://doi.org/10.1007/s11042-018-7134-7. Springer Publications
69. Jiang, F., Jiang, Y., Zhi, H., et al.: Artificial intelligence in healthcare: past, present and future.
Stroke Vasc. Neurol. 2 (2017). https://doi.org/10.1136/svn-2017-000101
Pappu Satya Jayadev earned his Bachelors in Electrical and Electronics Engineering, with dis-
tinction, from Gayatri Vidya Parished College of Engineering, Visakhapatnam. Currently, he is a
graduate scholar (M.S. + Ph.D.) at IIT Madras, working with Dr. Ramkrishna Pasumarthy and Dr.
Nirav Bhatt. He is affiliated with the Robert Bosch Center for Data Science and AI, and Systems
and Control groups at IIT Madras. His research interests include analysis, optimization and con-
trol of systems, applying the tools of machine learning and deep learning. His works have been
published in multiple national and international conferences.
Review of Machine Learning and Deep
Learning Based Recommender Systems
for Health Informatics
Current studies have shown that keeping track of lifestyle related information
such as daily steps, body weight and spent calories are very useful to develop user
awareness that may ultimately lead to healthy lifestyle—a crucial component for
treating many chronic diseases. In fact, these measurements over time may reveal
interesting insights concerning the user efforts and the final outcome. In this context,
recommender systems could be readily utilized for health informatics which may lead
to the improvement of chronic health conditions. Thus, in this chapter an overview
of recommender systems is presented. The state-of-the-art applications where these
are used, machine learning, especially, deep learning techniques that are applied is
also detailed in this chapter.
Hence this field explores the design of more effective drugs for personalized treat-
ment thereby reducing side effects. Understanding the influence of environmental
factors on formation of protein and their interactions is another interesting application
where deep learning techniques are found to be very useful.
• Medical Imaging—Automated medical image analysis is a crucial requirement
today for modern medicine. In recent years, deep learning techniques, especially
convolutional neural networks are becoming increasingly popular in the medical
imaging research community. It is because deep learning techniques are found
Review of Machine Learning and Deep Learning … 105
to perform extremely well for computer vision applications and run-time perfor-
mance of such techniques could be improved when parallelized on GPUs.
• Pervasive Sensing—Ambient, wearable and even implantable sensors are used to
monitor body vitals for health for elderly specifically under free living conditions.
Regular monitoring of energy expenditure of a person throughout the day along
with his food intake helps him to curb obesity and thus improve personal health.
Different wearable and ambient sensors are used to monitor daily activities pro-
vide assistance for elderly patients to improve their quality of life. Human activity
recognition can also be utilized for rehabilitation of heart and stroke patient and
post trauma recovery. Such activity recognition can be performed using wear-
able and implantable assistive devices. Continuous monitoring of body vital signs
are important for improving the treatment of patients in critical care as physical
conditions of such patients need to be carefully analyzed [16].
• Public Health—This has come up as an important discipline as it aims to prevent
disease proactively by analyzing the possible spreading patterns of a disease. It also
aims to investigate the influence of environmental factors on social behaviors. The
spread of a disease or even social habits induced by environmental factors can be
localized to a small area, a state or even across country. Public health applications
mostly focus on different patterns of spread of epidemics and lifestyle diseases and
analyze the inherent factors influencing such behavior. As the data size increases,
scalability becomes a crucial issue that is hard to be addressed by the conventional
predictive models. Therefore, performance tuning of these systems is difficult and
can only be done by domain experts. The deep learning algorithm designs mostly
explore online machine learning. Thus, cost function optimization takes place
sequentially and new training datasets are considered as input to the system. So,
evidently, deep learning techniques play important roles in the health recommender
system for research studies in public health [17].
A health recommender system has several phases following the basic architecture of
a health informatics system as described in Fig. 2. Publicly available health datasets
and quality metrics are two key concerns for the success of recommender systems
in health informatics.
Machine learning algorithms are very useful for various recommendation systems
for the application domains stated in the previous section. It can provide better
recommendations from traditional approaches. It can reduce computation complexity
106 J. Saha et al.
and work with multi source data. Existing Health Recommender Systems (HRS) can
be classified into two categories based on their application.
(a) Disease Diagnosis HRS
People with multiple health conditions may have specific challenges and co-
morbidities. The onset of a challenge could divulge an underlying medical condition.
In this way, the medical conditions may be diagnosed early so as to provide early care
through recommendations, which is otherwise not possible. Thus, medical conditions
leading to medical emergencies may also be prevented.
Healthcare recommender systems for diagnosis and monitoring of chronic dis-
eases play an important role in the continuous monitoring and support of people in
need through extending proper advice and prediction of risks associated with diag-
nosed diseases. Such systems may act as managing and controlling tools to assist
physicians and patients. However, providing an accurate recommendation for med-
ical data in real-time is a challenging task due to factors such as the complexity of
medical data in terms of unbalanced, large, multi-dimensional, noisy and/or missing
data.
Depression and mental disorder are increasingly becoming a major problem in
present society. Depression is usually accompanied by a negative effect, the assort-
ment of physical, emotional, and behavioral symptoms. Hence, an intelligent health
recommender system is proposed in [18], based on smartphones to monitor patients
with a mental disorder (mainly related to anxiety) and provides treatment as neces-
sary.
Recommender systems are designed exploiting IoT enabled technologies for m-
Health domain [3] to acquire patient data based on which proper advice is rendered.
Such systems facilitate the task of caregivers by suggesting suitable advice that may
lead patients towards a better quality of life. To tackle with sufficient dataset, existing
benchmark dataset has been referred to the experiment with the proposed system.
Review of Machine Learning and Deep Learning … 107
Using heterogeneous sensors various physiological signals are sensed to analyze the
patient condition to prescribe personalized solutions. The Cloud based architecture
of recommender systems help in uploading and downloading of health data with
proper access control policy.
In [19], a recommender system especially for patients suffering from chronic dis-
eases such as diabetes is designed to improve quality of life by assisting both patients
and caregivers with the prediction of accurate disease related risks and trustworthy
health recommendations. Accurate prediction model has been built to diagnose risks
related to chronic diseases by applying multiple classifications using decision tree
algorithms and to prescribe more accurate medical advice by applying unified collab-
orative filtering based on patients’ medical history, external features, etc. Challenges
of existing recommender systems are: (i) missing or erroneous data due to human
error or sensor devices, large size of medical database, etc. (ii) two dimensional data
problems—one is based on historical recommendations and another is the relation
between the patient’s external features and the practitioner’s advice. Accordingly the
recommendation system presented in [19] is found to outperform in terms of recall,
precision using random forest algorithm compared to other algorithms such as J48,
decision stump, REP tree, etc.
Intelligent and accurate recommender system development has attracted funding
for its relevance in current socio-economic condition and having the support of
enabling technologies such as IoT, machine learning, big data, etc.
In [20], the authors proposed and developed a recommender system for person-
alized care and support of people suffering from dementia, which causes memory
loss to the sufferers whose number is increasing alarmingly worldwide. This work
is funded by EU H020 project and targeted to build a software platform consider-
ing dementia patients and their caregivers as a dyad. Dependence of recommender
systems on user data creates problems that are termed as cold-start problem. Deal-
ing with new users in the system is problematic as sufficient information may not be
present in the database for a new user. There should be a balance between generalized
solutions based on general model and over-accuracy/overestimation.
This type of recommender systems is intended for users to semantically explore and
detect his/her disease related conditions. Such systems often follow a layered archi-
tecture as in [21]. This is comprised of (i) user layer to keep a record of interactions
of user agents and their preferences, to manage semantic search, data source access
and ranking of preferences, (ii) data layer to store acquired data with access control.
The performance of this system can be improved further by combining this semantic
based approach with a more structured medical practitioner based method.
The huge popularity of health related videos on the Internet raises concerns about
the video quality and content. To aid people referring to such videos a content based
recommender system is designed in [22] to link with health related videos to content
rich websites. Method of such linking is done by application of NLP that is, metadata
or keywords are extracted from YouTube videos like video name, title, topic, etc.
108 J. Saha et al.
that are used to search for semantic web based content for reference. Correctness
and effectiveness of such linking are evaluated through several metrics measurements
such as relevance, precision, etc.
Systems are also designed to search and select trustworthy health related web
based contents available in the internet for recommendation with the individualistic
approach [23]. In this context, recommender systems could be categorized as col-
laborative recommender system, content based, and knowledge based recommender
systems, etc. Profiles of users and items, social media information are generally fed
as input to the recommender systems.
Selecting the proper learning technique for analyzing health data is important to
mitigate several challenges of the health recommender system. Such techniques are
applied to build patterns to describe, analyze, predict data and define the current
health status of the users. Several works could be found in medical image process-
ing to diagnose and earlier detection of diseases using different machine learning
techniques. The existing learning techniques are Supervised, Semi-supervised and
Unsupervised as shown in Fig. 3.
The decision tree (J48) algorithm can be used in classification and regression problem
and it can solve the problem by using tree representation. It can represent the decision
explicitly and visually. Each tree contains internal node and leaf nodes. The internal
node corresponds to an attribute and class labels are present in the leaf node. The
representation of the tree is understandable as if-then rules are used here. Trees are
grown arbitrarily, so a minimum number of inputs should be fixed for leaf node
or the maximum depth of the model should be specified. Pruning helps to improve
performance and reduce the complexity of this algorithm. It removes a few branches
of the tree, which make use of features having low importance.
The authors in [19] proposed a health recommender system for disease based on
decision tree and collaborative filtering. The disease related data are mostly huge and
collected from multiple sources. Most of the time data are multi-dimensional and few
data are missing or noises are present in the dataset. It becomes difficult to handle
those data using traditional approaches. Filtering techniques are used to remove the
noises and reduce the ambiguous labels. Decision tree is applied here to build a
model for predicting, diagnosis of the diseases and their risk. An ensemble model of
Random Forest is built using several decision trees. The unified collaborative filtering
method helps to achieve better recommendation on the basis of previous records and
other features.
Decision trees are either used alone or in combination with other supervised
classifiers for HRS. In [27], the authors considered smartphone based and wrist
worn motion (accelerometer, gyroscope and linear acceleration) sensors to identify
several complex activities like smoking, eating, drinking coffee, etc. Naive Bayes,
decision tree and k nearest neighbor (kNN) three different classifiers are used for the
work with different window size to recognize simple as well as complex activities.
GENEActiv is a wrist-worn triaxial accelerometer that is used in [28], to classify
walking, running and stationary activities and achieved good accuracy. The authors
in [29] deployed both support vector machines (SVM) and decision trees in their
framework.
Depression prediction and monitoring is a crucial challenge for the health recom-
mender systems. Huge data like user behavior, daily activities, mood details, etc. are
needed for analyzing and predicting the disease. The heterogeneous data make the
system complex. Hence the authors in [18] proposed an intelligent system to provide
Review of Machine Learning and Deep Learning … 111
useful recommendation. Combination of Decision tree and SVM are used to build
this system. Various external factors related to depression are considered to build
this prediction model.
Logistic regression is a classification technique that applies the sigmoid function for
a linear combination of input features. It can predict the data based on real-valued
inputs that are combined linearly using weights or coefficient values. In general, the
outputs are binary values 0 or 1. The output of Logistic regression classification when
applied on a diabetic dataset with default parameter is shown in Fig. 4a.
In [30], the authors proposed a device independent activity monitoring with a
minimal number of smartphone inertial sensors. The energy efficient ubiquitous
system is machine learning based and, performs well with Logistic Regression using
inexpensive time domain features.
Sometimes, it could be hard to detect all individual class labels with appreciable
accuracy using one base classifier. An ensemble of classifiers can be applied instead.
The ensemble model combines the outcome of different base learners. Every base
learner attempts to classify the test set instances based on the training set instances.
The ensemble model takes a decision about the class label of the test instances
through combining the outcome of all the base learners. This adds generality to the
112 J. Saha et al.
system. Bagging and boosting are the two methods of ensembling that are heavily
used in literature. In bagging, the training set is divided into a no. of bags and a base
classifier is tuned according to each of these subsets forming a set of classification
models. But, in boosting, the same training set is applied in different iterations, though
each instance is assigned a different weight depending on the ease of classifying the
instance in the previous iteration.
Ensemble may indicate a combination of different condition based classifiers also.
For instance, in [32], a condition based ensemble classifier is formed to address the
effect of using different smartphones (having various hardware configurations) and
usage behavior, such as, how the smartphone is carried by the user (shirt pockets, right
pants pocket, or right hand) on detailed HAR. It follows the principles of bagging.
The health care recommendation systems for consumers need to make relevant
suggestions on the basis of predicting probability values for different health condi-
tions. The ensemble model is used in [33] to build this kind of model. The Bayesian
network and Random Forest are used to build the ensemble model and it provides
the better recommendation.
In Multi-instance learning, each object contains a set of instances and only associated
with a single label as shown in Fig. 5a. Thus, every single instance need not be labeled,
only a bag of instances is assigned a proper label.
Semi-supervised learning is essential for sparsely labeled data. The authors in [34]
proposed a HAR framework to monitor user daily activity. The dataset is sparsely
labeled. They applied Multi-Instance Learning (MIL) for handling different annota-
tion strategies. Few novel extensions of MIL are also found in literature to reduce
the required level of traditional supervision. MI-SVM, citation kNN classifiers are
also designed to deal with multiple instances having a single label. Several types
of bags are used to represent the continuous dataset in MIL. In [34], three types
of labeling (Single, multi-labeled and majority voting) for the bag of instances are
considered to represent the entire test and training dataset. Iterative multi-instance
Support Vector Machine (SVM) is found to perform better for single labeled bags,
whereas the standard multi-instance SVM has been found to perform better for multi
labeled bags.
In Multi-label learning, the training dataset contains instances associated with a set
of labels. It can classify the label sets of unseen instances on the basis of training
instances with known label sets. In general, one instance is present in a multi-label
object and K number of class labels are associated with it as shown in Fig. 5b.
The authors in [35] proposed a HAR system based on Multi-label machine learn-
ing and Expectation-Maximization (EM) algorithm. The system can identify several
activities correctly when there is a time gap between the two actions. The pseudo
sequence data are used for the entire experiment. The multi-label data set is stochas-
tically labeled. EM algorithm is executed and the probability distribution of the data
labels is learned.
The graph based semi-supervised learning technique is also used for HRS based HAR
systems. A small set of labeled data with few unlabeled data is found to be present
in the experimental dataset reported in [36]. The HAR framework can record long
duration activity data, by using experience sampling without detailed annotations by
propagating provided labels to the neighboring data.
In unsupervised learning, data sets need not have any label, the data pattern is
unknown to us and we need to find the hidden patterns in the unlabeled data. It
is useful when the approximation of the data label is poor. Clustering is an unsu-
pervised learning mechanism to grouping similar data into clusters. Representative
clustering mechanisms that could be applied in HRS is discussed below.
114 J. Saha et al.
Fig. 6 DBSCAN clustering technique a original dataset before clustering, b clustered data [38]
extracted. The mix of Gaussian method with DBSCAN clustering makes this system
more efficient. With proper tuning of MinPts and eps, it achieves good accuracy for
daily living activities when the number of activity is unknown to the system.
Hierarchical clustering is another type of clustering algorithm. The data points are
grouped together to form a tree or hierarchy of clusters. The clusters are graphically
represented using dendrogram. Initially, all data points are assigned a cluster. It needs
a terminating condition to stop the algorithm. In general, two types of hierarchical
clustering are available, one is Agglomerative (bottom–up) and another is Divisive
(top–down). Agglomerative clustering starts with each cluster representing a single
data point. All the similar pair of clusters are merged in each step. On the other hand,
divisive clustering starts from top level with a single cluster and it includes all the
data points. It splits the top level cluster into child clusters in each step until the
individual child clusters contain only a single data point. The condition of cluster
build-up is known as linkage or dissimilarity of two objects.
Several types of linkage are used in Hierarchical clustering. The smallest distance
between two points of two different clusters is known as Single link or Min, whereas
the maximum distance between two points of two different clusters is known as
complete link or max. Initially, the distance between each pair of points is computed
for individual clusters and then the average distance between all the points of two
different clusters is computed. This is known as Average link or Group Average.
Alternatively, Ward’s method can also be used that computes the sum of the square of
the distances of individual points of two different clusters. Few state-of-the art works
are summarized in Table 2. These works are based on pervasive sensing applications
116 J. Saha et al.
that also have implications in public health. The table shows how recent works heavily
use different types of machine learning techniques stated above.
Specificity: Probability that a test result will be negative when a negative label is
detected.
TN
Speci f icit y =
(F P + T N )
Accuracy: Overall classification performance for all classes is denoted by the fol-
lowing equation in the state of the art literature.
(T P + T N )
Accuracy =
(T P + T N + F P + F N )
F-measure (F1): It computes a model’s accuracy that combines precision and recall.
If the output has low false positives and low false negatives, the classifier is correctly
identifying real objects. It is defined as follows.
Pr ecision × Recall
F1 = 2 ×
Pr ecision + Recall
118 J. Saha et al.
Precision: Precision talks about how accurate the model is out of those predicted
positive, and how many data points are actually positive.
TP
Pr ecision =
(F P + T P)
Recall: The completeness of classifiers can be measured using recall. A low recall
indicates many False Negatives. Recall indicates how many of the Actual Positives
captured by the model are really labeled as Positive (True Positive).
TP
Recall =
(T P + F N )
Most of the existing works reported here use one or more of the above mentioned
performance metrics.
node information and structural information are merged with a deep learning method
to achieve better performance.
Some of the deep learning techniques are discussed below.
reduce the feature dimensions. The softmax function is used here in the last layer for
the classification. It provides better results from state-of-the art classifiers.
Now-a-days, deep learning plays a crucial role in HAR. The authors in [43] pre-
sented a deep learning framework based on the operation of CNN, LSTM, and ELM
classifier. Most of the existing HAR system applied handcrafted features with expert
knowledge, like statistical methods, etc. Here CNN is used in the first stage to extract
features from accelerometer signals and it is considered as a higher-level abstraction
of raw data. It is difficult to recognize the sequence of activity from real-time sensor
data as the temporal dependencies are ignored in the basic structure of this deep net-
work. Several challenges may occur during human activity monitoring, like the simi-
larity between few activity classes (like normal walk or slow walk), variable changes
in accelerometer value in a period of time, etc. Hence the authors applied Long Short-
Term Memory (LSTM) along with basic CNN to achieve better recognition accuracy
from basic CNN. But in real-time, it becomes difficult to achieve good classification
result. So, the Extreme Learning Machine (ELM) is integrated with CNN-LSTM to
improve the performance of the proposed framework in real-time. The parameters of
hidden layers are chosen randomly and weights are calculated using the least square
method. The proposed framework achieves 0.88 F-Score applying Baseline CNN,
whereas CNN-LSTM-ELM technique achieves better prediction, and the results are
improved with the proposed technique.
CNN is used in [44] to model intelligent health recommender systems. It works
on supplementary data to find the recommended hospital on the basis of previous
data analysis. The Convolution Restricted Boltzmann Machine (RBM) model is the
combination of RBM and CNN, works as two layer model and use the features
of both the learning methods. It can work with big data and help to build effective
health recommender system. Two techniques like Root Square Mean Error and Mean
Absolute Error are used to minimize the system errors.
In real life, it is difficult to represent all problems with fixed length inputs and out-
puts. Like time series human daily activity accelerometer trace, and due to continuous
data pattern there will be new data samples in each time window, hence it requires a
capable system to store the sequence of data and use the context of the information.
The Recurrent Neural Networks (RNN) is a robust neural network that can utilize
sequential information of the data pattern. This helps to build context aware recom-
mendation systems. RNNs can capture the information about previous computation
and use it as input to the next hidden layer. It can process the large network towards
the time direction in the training phase and fast sequential process in the identifica-
tion phase. Sometimes, the output of RNN model not only depends on the previous
output in the sequence but also needs future elements as shown in Fig. 7a. This kind
of network is known as Bidirectional RNN. RNN can work well with different size
of processing inputs. It can take historical information for computation. Generated
Review of Machine Learning and Deep Learning … 121
Fig. 7 Deep learning technique of a recurrent neural network and b restricted Boltzmann machine
weights are shared across the network. But, computation is slow from other network
and sometimes it is difficult to access the information of distant past.
The authors in [45] proposed a novel patient monitoring framework using RNN
and Density Based Clustering method. It can monitor ECG signals, and identify ECG
beats with different heart rates of the user. Here, features are extracted automatically,
based on morphology information including the current heartbeat and T wave of
the former heartbeat. It computes a strong correlation between ECG signals and
considers ECG beats with various lengths. Here Long Short Term Memory (LSTM),
a variation of RNN is applied to maintain the details of the previous context.
The authors in [11] proposed a HAR system based on DBN using wearable sen-
sors. Features are extracted from the raw data set automatically. The Linear Discrim-
inant Analysis (LDA) and Kernel Principal Component Analysis (KPCA) are used
to reduce feature space dimensionality. Several hyper parameters like mini batch
size, initial value of weight, learning rate, number of hidden layers and units, etc. are
needed to be configured for DBN.
5.2.2 Autoencoder
using encoding and decoding process of Autoencoder and the data is stored as a
matrix. Parameters are optimized to reduce reconstruction issues.
In [49], the authors proposed a deep learning based collaborative health rec-
ommender system based on heterogeneous data from multiple sources. Variational
autoencoder neural network is designed to learn the details of primary care of doctors
and extract the various features of patient to incorporate with user profile. It is found
to perform appreciably well.
6 Conclusion
Learning the health data to detect and identify a disease or anomalies in activities of
the user is an important challenge to build a robust health recommender system. We
can find various applications of this type of systems like healthcare, early diagnosis,
elderly care, fitness tracking, and activity monitoring or fall detection. This chapter
provides an insight into the learning techniques used in health recommender systems.
It presents the recent trends and developments in machine learning techniques as well
as deep learning techniques. Deep learning techniques are found to perform better
and make the system more efficient and intelligent due to their automated feature
extraction techniques. Here, we have also discussed several unsupervised learning
techniques, and how it is helpful when the data set is completely unknown to the
system.
References
1. Swan, M.: Sensor mania! The internet of things, wearable computing, objective metrics, and
the quantified self 2.0. J. Sens. Actuator Netw. 1(3), 217–253 (2012)
2. Calero Valdez, A., Ziefle, M., Verbert, K., Felfernig, A., Holzinger, A.: Recommender systems
for health informatics: state-of-the-art and future perspectives. In: Holzinger, A. (ed.) Machine
Learning for Health Informatics. Lecture Notes in Computer Science, vol. 9605. Springer,
Cham (2016)
3. Erdeniz, S.P., Maglogiannis, I., Menychtas, A., Felfernig, A., Tran, T.N.T.: Recommender
systems for IoT enabled m-health applications. In: Iliadis, L., Maglogiannis, I., Plagianakos,
V. (eds.) Artificial Intelligence Applications and Innovations. AIAI 2018. IFIP Advances in
Information and Communication Technology, vol. 520. Springer, Cham (2018)
4. Ramsundar, B., Kearnes, S., Riley, P., Webster, D., Konerding, D., Pande, V.: Massively Mul-
titask Networks for Drug Discovery (2015). arXiv:1502.02072
5. Zhang, S., Zhou, J., Hu, H., Gong, H., Chen, L., Cheng, C., Zeng, J.: A deep learning framework
for modeling structural features of RNA-binding protein targets. Nucleic Acids Res. 44 (2015).
https://doi.org/10.1093/nar/gkv1025
6. Tian, K., Shao, M., Wang, Y., Guan, J., Zhou, S.: Boosting compound-protein interaction
prediction by deep learning. Methods 110 (2016). https://doi.org/10.1016/j.ymeth.2016.06.024
7. Xu, T., Zhang, H., Huang, X., Zhang, S., Metaxas, D.N.: Multimodal deep learning for cervical
dysplasia diagnosis. In: Ourselin, S., Joskowicz, L., Sabuncu, M., Unal, G., Wells, W. (eds.)
Medical Image Computing and Computer-Assisted Intervention—MICCAI 2016. MICCAI
2016. Lecture Notes in Computer Science, vol. 9901. Springer, Cham (2016)
124 J. Saha et al.
8. Brosch, T., Tam, R., The Alzheimer’s Disease Neuroimaging Initiative: Manifold learning of
brain MRIs by deep learning. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.)
Medical Image Computing and Computer-Assisted Intervention—MICCAI 2013. MICCAI
2013. Lecture Notes in Computer Science, vol. 8150. Springer, Berlin (2013)
9. Rose, D.C., Arel, I., Karnowski, T.P., Paquit, V.C.: Applying deep-layered clustering to mam-
mography image analytics. In: Biomedical Sciences and Engineering Conference, Oak Ridge,
TN, pp. 1–4 (2010)
10. Acharya, U.R., Fujita, H., Oh, S., Hagiwara, Y., Tan, J.H., Adam, M.: Application of deep
convolutional neural network for automated detection of myocardial infarction using ECG
signals. Inf. Sci. 415–416, 190–198 (2017)
11. Hassan, M.M., Huda, S., Uddin, M.Z., Almogren, A., Alrubaian, M.: Human activity recogni-
tion from body sensor data using deep learning. J. Med. Syst. 42, 99 (2018)
12. Poggi, M., Mattoccia, S.: A wearable mobility aid for the visually impaired based on embedded
3d vision and deep learning. In Proceeding of IEEE Symposium of Computer and Communi-
cation, pp. 208–213 (2016)
13. Huang, J., Zhou, W., Li, H., Li, W.: Sign language recognition using real-sense. In: Proceeding
of IEEE China, SIP, pp. 166–170 (2015)
14. Garimella, V.R.K., Alfayad, A., Weber, I.: Social media image analysis for public health. In:
Proceeding of CHI Conference Human Factors Computer System, pp. 5543–5547 (2016)
15. Zou, B., Lampos, V., Gorton, R., Cox, I.J. On infectious intestinal disease surveillance using
social media content. In: Proceeding of 6th International Conference on Digital Health Con-
ference, pp. 157–161 (2016)
16. Saha, J., Chowdhury, C., Biswas, S.: Two phase ensemble classifier for smartphone based
human activity recognition independent of hardware configuration and usage behavior.
Microsyst. Technol. 24, 2737 (2018)
17. Huang, T., Lan, L., Fang, X., An, P., Min, J., Wang, F.: Promises and challenges of big data
computing in health sciences. Big Data Res. 2(1), 2–11 (2015)
18. Yang, S., Zhou, P., Duan, K., Hossain, M.S., Alhamid, M.F.: emHealth: towards emotion health
through depression prediction and intelligent health recommender system. Mob. Netw. Appl.
23, 216–226 (2018)
19. Hussein, A.S., Omar, W.M., Li, X., Ati, M.: Efficient chronic disease diagnosis prediction and
recommendation system. In: Proceeding of IEEE-EMBS Conference on Biomedical Engineer-
ing and Sciences, Langkawi, pp. 209–214 (2012)
20. Felipe, LO., Barrué, C., Cortés, A., Wolverson, E., Antomarini, M., Landrin, I., Votis, K.,
Paliokas, I., Cortés, U.: Health recommender system design in the context of CAREGIVER-
SPROMMD project. In: Proceeding of PETRA ’18: The 11th PErvasive Technologies Related
to Assistive Environments Conference, June, Corfu, Greece (2018)
21. Morrell, T.G., Kerschberg, I.: Personal health explorer: a semantic health recommendation
system. In: Proceeding of IEEE 28th International Conference on Data Engineering Workshops,
Arlington, VA, pp. 55–59 (2012)
22. Bocanegra, C.L.S., Ramos, J.L.S., Rizo, C., Civit, A., Fernandez-Luque, L.: HealthRecSys: a
semantic content-based recommender system to complement health videos. BMC Med. Inform.
Decis. Mak. 17, 63 (2017)
23. Sanchez-Bocanegra, C.L., Sanchez-Laguna, F., Sevillano, J.L.: Introduction on health recom-
mender systems. Methods Mol. Biol. 1246, 131–146 (2015)
24. Keogh, E.: Instance-based learning. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine
Learning. Springer, Boston (2011)
25. Ustev, Y.E., Incel, O.D., Ersoy, C.: User, device and orientation independent human activity
recognition on mobile phone challenges and a proposal. In: The ACM Conference on Pervasive
and Ubiquitous Computing Adjunct Publication, Zurich, pp. 1427–1435 (2013)
26. Park, H., Dong, S.Y., Lee, M., Youn, I.: The role of heart-rate variability parameters in activ-
ity recognition and energy-expenditure estimation using wearable sensors. Sensors (Basel)
2017(7), 1698 (2017)
Review of Machine Learning and Deep Learning … 125
27. Shoaib, M., Bosch, S., Incel, O.D., Scholten, H., Havinga, P.J.M.: Complex human activity
recognition using smartphone and wrist-worn motion sensors. In: Sensors, p. 426 (2016)
28. Zhang, S., Rowlands, A.V., Murray, P., Hurst, T.L.: Physical activity classification using the
GENEA wrist-worn accelerometer. Med. Sci. Sports Exerc. 44, 742–748 (2012)
29. Garcia-Ceja, E., Brena, R.F., Carrasco-Jimenez, J.C., Garrido, L.: Long-term activity recogni-
tion from wristwatch accelerometer data. Sensors 14, 22500–22524 (2014)
30. Saha, J., Chowdhury, C., Biswas, S.: Device independent activity monitoring using smart
handhelds. In: Proceeding of 7th International Conference on Cloud Computing, Data Science
and Engineering—Confluence, Noida, pp. 406–411 (2017)
31. Bayat, A., Pomplun, M., Tran, D.A.: A study on human activity recognition using accelerometer
data from smartphones. Procedia Comput. Sci. 34, 450–457 (2014)
32. Saha, J., Roy Chowdhury, I„ Chowdhury, C., Biswas, S., Aslam, N.: An ensemble of condition
based classifiers for device independent detailed human activity recognition using smartphones.
Information 9(4), 94 (2018)
33. Jamshidi, S., Torkamani, M.A., Mellen, J., Jhaveri, M., Pan, P., Chung, J., Kardes, H.: A hybrid
health journey recommender system using electronic medical records. In: The Proceedings
of the 3rd International Workshop on Health Recommender Systems, HealthRecSys 2018,
co-located with the 12th ACM Conference on Recommender Systems (ACM RecSys 2018),
Vancouver, BC, Canada (2018)
34. Stikic, M., Schiele, B.: Activity recognition from sparsely labeled data using multi-instance
learning. In: Proceeding of Location and Context Awareness. LoCA 2009. Lecture Notes in
Computer Science, vol. 5561. Springer, Berlin (2009)
35. Toda, T., Inoue, S., Tanaka, S., Ueda, N.: Training human activity recognition for labels with
inaccurate time stamps. In: Proceeding of UbiComp ’14 Adjunct, pp. 863–872, 13–17 Sept
2014
36. Stikic, M., Larlus, D., Schiele, B.: Multi-graph based semisupervised learning for activity recog-
nition. In: Proceeding of International Symposium on Wearable Computers, Linz, pp. 85–92
(2009)
37. Ong, W.H.: An unsupervised approach for human activity detection and recognition. Int. J.
Simul. Syst. Sci. Technol. 14(5) (2013)
38. https://medium.com/odessa-ml-club/a-journey-to-clustering-introduction-to-dbscan-
e724fa899b6f. Last seen 20/5/2019
39. Kwon, Y., Kang, K., Bae, C.: Unsupervised learning for human activity recognition using
smartphone sensors. Expert Syst. Appl. 41(14), 6067–6074 (2014)
40. Lara, O.D., Labrador, M.A.: A survey of human activity recognition using wearable sensors.
In: IEEE Communication Surveys and Tutorials, vol. 15 (2013)
41. Yuan, W., Li, C., Guan, D., et al.: Socialized healthcare service recommendation using deep
learning. Neural Comput. Appl. 30, 2071–2082 (2018)
42. Eskofier, B.M., et al.: Recent machine learning advancements in sensor-based mobility analysis:
deep learning for Parkinson’s disease assessment. In: Proceeding of 38th Annual International
Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Orlando, FL,
pp. 655–658 (2016)
43. Sun, J., Fu, Y., Li, S., He, J., Xu, C., Tan, L.: Sequential human activity recognition based on
deep convolutional network and extreme learning machine using wearable sensors. Hindawi J.
Sens. (8580959), 10 (2018)
44. Sahoo, A.K., Pradhan, C., Barik, R.K., Dubey, H.: DeepReco: deep learning based health
recommender system using collaborative filtering. Computation 7(25) (2019)
45. Zhang, C., Wang, G., Zhao, J., Gao, P., Lin, J., Yang, H.: Patient-specific ECG classification
based on recurrent neural networks and clustering technique. In: Proceeding of 13th IASTED
International Conference on Biomedical Engineering (BioMed), Innsbruck, Austria, pp. 63–67
(2017)
46. Miotto, R., Li, L., Kidd, A.B., Dudley, J.T.: Deep patient: an unsupervised representation to
predict the future of patients from the electronic health records. Sci. Rep. 6, 1–10 (2016)
126 J. Saha et al.
47. Fakoor, R., Ladhak, F., Nazi, A., Huber, M.: Using deep learning to enhance cancer diagnosis
and classification. In: Proceedings of the 30th International Conference on Machine Learning,
JMLR: W&CP vol. 28, Atlanta, Georgia, USA (2013)
48. Sedhain, S., Menon, A.K., Xie, L., Sanner, S.: AutoRec: auto encoders meet collaborative
filtering. In: Proceeding of 24th International Conference World Wide Web, Florence, Italy
(2015)
49. Deng, X., Huangfu, F.: Collaborative variational deep learning for healthcare recommendation.
IEEE Access 7, 55679–55688 (2019)
Jayita Saha is currently pursuing the Ph.D. degree in Computer Science and Engineering at
Jadavpur University, India. She received her B. Tech. and M. Tech. degrees in Computer Science
and Engineering from Durgapur Institute of Advanced Technology and Management and Jadavpur
University, India, in 2008 and 2011, respectively. Her research interests include Human Activity
Recognition and machine learning.
Suparna Biswas is an Associate Professor in the Department of Computer Science and Engineer-
ing, Maulana Abul Kalam Azad University of Technology (formerly WBUT), India since 2018.
She obtained her M.E. and Ph.D. from Jadavpur University. She was awarded Post Doctoral Fel-
lowship from Erusmus Mundus in 2014 to carry out research work at Northumbria University, UK.
She has co-authored a number of research papers published in Conferences and journals of inter-
national repute. She has served as a reviewer in Conferences and journals of international repute.
Her areas of research interests are Mobile Computing, Network Security, Wireless Body Area Net-
work, Healthcare Applications etc.
Deep Learning and Electronics
Health Records
Deep Learning and Explainable AI
in Healthcare Using EHR
Abstract With the evolving time, Artificial Intelligence (AI) has proved to be of
great assistance in the medical field. Rapid advancements led to the availability of
technology which could predict many different diseases risks. Patients Electronic
Health Records (EHR) contains all different kinds of medical data for each patient,
for each medical visit. Now there are many predictive models like random forests,
boosted trees which provide high accuracy but not end-to-end interpretability while
the ones such as Naive-Bayes, logistic regression and single decision trees are intel-
ligible enough but less accurate. These models are interpretable but they lack to see
the temporal relationships in the characteristic attributes present in the EHR data.
Eventually, the model accuracy is compromised. Interpretability of a model is essen-
tial in critical healthcare applications. Interpretability helps the medical personnel
with explanations that build trust towards machine learning systems. This chapter
contains the design and implementation of an Explainable Deep Learning System
for Healthcare using EHR. In this chapter, use of an attention mechanism and Recur-
rent Neural Network(RNN) on EHR data has been discussed, for predicting heart
failure of patients and providing insight into the key diagnoses that have led to the
prediction. The patient’s medical history is given as a sequential input to the RNN
which predicts the heart failure risk and provides explainability along with it. This
represents an ante-hoc explainability model. A neural network having two levels and
attention model is trained for detecting those visits of the patient in his history that
could be influential and significant to understand the reasons behind any prediction
done on the medical history of the patient data. Thus, considering the last visit first
proves to be beneficial. When a prediction is made, the visit-level contribution is
prioritized i.e. which visit contributes the most to the final prediction where each
visit consists of multiple codes. This model can be helpful to medical persons for
predicting the heart failure risks of patients with diseases they have been diagnosed
with based on EHR. This model is then worked upon by local interpretable model-
agnostic explanations (LIME) which provide the different features that positively
and negatively contribute to heart failure risk.
1 Introduction
Artificial Intelligence (AI) has a huge impact in the Medical Domain. Applications
like managing medical records, assisting to the physicians while operations, predict-
ing diseases based on patient history, drug creation or health monitoring applications
like FitBit, everything can be achieved using AI. But the number of users using these
systems is still less. The main reason for this being that it is difficult for a human
being to trust a machine when it comes to their health. It should be mandatory to
understand what exactly was responsible for output the model gave when dealing
with healthcare as a domain suggested in the new European General Data Protection
Regulation act.
Talking to domain experts revealed that physicians do not prefer using Artificial
Intelligence (AI) systems as an aid to their diagnosis. According to them, these
systems do pace up their diagnosis process but do not provide the reasons behind
their decisions. Thus, they cannot trust the system and have to continue with their
manual method of diagnosis. Thus, even though huge advancements have been made
in the research sector, due to a lack of trust in the systems they are unable to find
wide-ranging business applications. This chapter contains the design of Explainable
AI system based on EHR data [1, 2]. Use of an attention mechanism and Recurrent
Neural Network (RNN) on EHR data has been discussed, for predicting heart failure
of patients and providing insight into the key diagnoses that have led to the prediction
using LIME.
The chapter is organized as follows.
In Sect. 2, related work is discussed. Then Sect. 3, describes the proposed Method-
ology. Section 4 describes the experiments, evaluation. Conclusion and future work
are described in Sect. 5.
2 Related Work
research, and even in court. In the medical domain, the demand for interpretable
and explainable models is increasing. They must be to re-enact the decision-making
and knowledge extraction process. Explainability is classified into two categories:
ante-hoc and post hoc. Ante-hoc systems incorporate explainability into the model
itself, whereas post hoc systems involve explaining the predictions of a complex
model using a secondary simpler model. Examples of ante-hoc models are decision
trees, linear regression, fuzzy inference system, etc. Examples of post hoc models are
algorithms like BETA (Blackbox Explanations using Transparent Approximations).
Zhao [4] used Electronic Health Records (EHR) datasets filled with a wealth of
all kinds of medical data for each patient for each medical visit. Existing methods
of data analysis on EHR datasets prove to be impossible to understand due to its
size, dimensionality, and irregularity. Heart failure (HF) is difficult to predict as it
is an overarching condition rather than a distinct phenotype. Choi et al. [5] uses
CNNs in the context of natural language processing to process this data. MIMIC
III dataset has been used, consisting of 46,520 patients, 651,047 diagnosis events,
240,095 procedures, and 4,156,450 predictions. For each patient, information about
the ICD9(International Classification of Diseases) codes, procedure items and, drug
names are extracted from the EHR records, and arranged in a sequence similar to a
“sentence”. A word2vec model is then used to convert these sentences to embeddings,
which are then used to train the CNN. The activation function used is Rectified linear
units (ReLu) in the convolutional ad fully connected layers.
Heart failure (HF) is a complex condition whose prediction has proved particularly
difficult due to the various conditions and events that lead to it. Heart failure may
occur due to kidney failure, coronary artery diseases, neural disorders, diabetes,
medications for other conditions, procedures performed, and previous instances of
heart attacks. This complex nature makes it very difficult to predict heart failure.
EHR datasets hold the key to solving this task, however, its size has made it virtually
impenetrable by traditional techniques. Hence, the authors of this paper have taken the
novel approach of using CNNs in the context of NLP (Natural Language Processing)
to efficiently process this data. The data is first concatenated into a sequence form
drawn from diagnoses, procedures and medications. This sequence is then fed to
an embedding layer. Random and word2vec embeddings both have been used for
comparison. Multiple such embedding vectors are stacked and are together fed into
the CNN. The CNN processes this input and produces a binary output (HF or not).
Guestrin et al. [6] discussed in their paper that machine learning models are black
boxes. Trust can be built by understanding the reasons behind predictions. It provides
insights into the model and can be used as a technique to assess model performance
and build better, more accurate and correct models.
Guestrin et al. [6] introduce a new algorithm called LIME algorithm for explaining
predictions of any model. LIME treats the given model as a ‘black-box’ and tries
to explain a prediction instance x by trying to learn the behavior of the prediction
function f(x) in the surroundings of x. The instances surrounding x are obtained
by computing random perturbations of the input by random sampling of the input.
The random sampling is uniform, so as to maintain ensure samplings evenly in the
surrounding of x. This allows obtaining a locally faithful explanation of the prediction
132 S. Khedkar et al.
3 Proposed Methodology
consists of 46,520 patients, 651,047 diagnosis events, and 240,095 procedures. For
each patient, there exists information about the ICD9 (International Classification
of Diseases) codes related to the diagnosis and procedures conducted in the patient.
This dataset was used for the Attention-based RNN model.
The second dataset that was used was Cleveland dataset [9] from the UCI ML
repository and used by many to build heart disease prediction models. This is a small
dataset consisting of 303 patient records. It was used to study the LIME algorithm.
134 S. Khedkar et al.
Attention Mechanism:
Attention mechanisms in neural networks work almost similar to visual attention
mechanism found in humans. Attention, when seen from the human perspective,
tells the human brain what exactly is to be understood and visualized about the
model’s work. The attention mechanism allows the network to refer back to the input
sequence while calculating attention values and does not force the network to encode
all input information into a vector of fixed length.
Only patients showing codes relevant to heart failure are considered for training.
Also, the patient must have made a minimum of three visits within twelve months.
Attention is given to the ICD9 codes in each visit, calculating their contributions to
the output.
• Result integration
The results from both the attention levels are integrated, and the final output of the
model is forwarded along with contribution scores for presentation and visualization.
Visualizations of the attention scores of each ICD9 code by visit are created to analyze
which diagnoses contributed the most to the output of the model.
Deep Learning and Explainable AI in Healthcare Using EHR 135
Attention models work by “attending” to input parts while predicting the output,
instead of processing it sequentially. So, if the attention value of a particular feature
in the input is high, it would imply that it highly influenced the output. This gives us
the advantage of being able to interpret the model to understand what part of the input
was considered while predicting whether heart failure is present or not. Attention
maps visualization make us understand where the network sees when trying to make
the prediction. The attention mechanism gives the network the capability to access
the internal memory. So the network chooses what to retrieve from memory. The
weighted combination of all memory locations is retrieved by the network.
A sequence is important for every task that is performed in our everyday lives. Be
it our language where the sequence of words is important or the data of a genome
sequence where every sequence has a different meaning. Time defines the occurrence
of events in time series data. Thus a specific neural network model is needed, known
as the recurrent neural network, designed to work on data that is defined by time.
Medical history of any patient is vital for predicting accurate medical diagnosis.
RNNs make use of this medical history as sequential information to predict the
patient’s heart failure risk and the explanations behind this prediction.
Recurrent Neural Networks (RNNs) surfer’s from the vanishing gradient problem.
It causes information from the past to be washed out for long input sequences. To
solve this problem, many techniques such as Long Short Term Memory (LSTM)
units, Gated Recurrent Units (GRUs).GRU (Gated Recurrent Unit) tries to solve this
problem using vectors called gates.
RNN has vanishing gradient problem which is solved in GRUs by using two
‘gates’—reset and update gate. These two vectors identifies what information(values)
should be passed to the output (the next t-state in the RNN). GRUs can be trained to
preserve information from long ago, without forgetting it over time.
Let us look at some of the workings of GRUs and the mathematics behind it shown
in Fig. 2. A GRU can be represented diagrammatically as shown.
The notations in the figure are as follows:
The sigmoid functions represent the gates mentioned earlier, one each for the
update and the reset gates.
136 S. Khedkar et al.
1. Update Gate
The update gate determines how much past information from previous time steps is
passed along to the future.
The update gate value zt for time step t is calculated by using the following
formula:
z t = σ W (z) xt + U (z) h t−1
2. Reset Gate
This gate is used to decide how much of the past information to forge from the model.
It is calculated as follows:
rt = σ W (r ) xt + U (r ) h t−1
Final memory content, i.e., the output forwarded to the next time step, is calculated
in two steps.
First, the relevant information from the past is stored using the reset gate, in a
variable called the current memory content, h t :
Deep Learning and Explainable AI in Healthcare Using EHR 137
h t = tanh(W xt + rt U h t−1 )
The attention mechanism-based model has two levels of attention, which first
detects influential past visits and then detects significant clinical variables within
those visits. The attention model tries to imitate a physician’s behavior during an
encounter. Just like a physician, it gives greater attention to recent clinical visits, by
considering the recent visits first and the previous visits later, i.e., in reverse order.
This is because stationary models often put together all the previous information,
thus ignoring any information that is time-dependent and can result in loss of tem-
poral relationships present in the input data, which can lead to input data having
huge temporal differences getting similar predictions. So considering the last visit
first proves to be beneficial, as a result of which the model knows which visit is
more important and the model is trained on visit specific features that contribute to
prediction.
When a prediction is made, the visit-level contribution is prioritised i.e. which
visit contributes the most to the final prediction where each visit consists of multiple
codes. Also, the variable level contribution i.e. which variable contributes more to
the final prediction must be known. The model can be viewed in three parts. Part 1
is governed by GRU for visit-level attention weights and since each visit consists
of multiple variables, Part 2 is governed by GRU that generates attention weights
for variable-level. Part 3 is Multi-Layer Perceptron to embed visit information to
preserve interpretability. The visit is embedded to a lower dimensional space using
MLP. Parts 1 and 2 make side loops which later are combined with the MLP model
for prediction. As there is no loop in the prediction process, the model is interpretable
end to end.
There are two major advantages of the model:
1. Running the GRU in reverse time order gives computational concessions
2. There can be a substantial improvement in the prediction process when times-
tamps are used. Timestamps provide the duration of the time spent by the patient
in the ICU. This parameter adds to the accuracy of the model, as longer ICU
stays can indicate increased risk.
The patients having heart failure and their qualifying ICD_9 diagnosis codes
were extracted from the MIMIC III dataset to train and test the attention-based
explanatory models. There are 2349 patients, having 9587 admission records having
135,709 diagnoses records in the dataset prepared and 2989 unique ICD_9 codes.
These extracted patients conform to the conditions of having at least 3 visits and
having diagnosis codes from a list of heart failure related diagnosis codes. This list
138 S. Khedkar et al.
has been compiled from data provided by the creators of the MIMIC III dataset itself,
and some experts in the field.
• Hyperparameter tuning:
It is very easy to achieve a very high accuracy while training the data using dense
neural networks, but these might not generalize well to validation and test set. Also,
eschewing deep/complex architectures may lead to low accuracy on the data sets.
Hence, a sweet spot has to be found which generalizes well and has a high accuracy.
Some models fail due to saddle points and local minima making gradients zero, hence
hyper-parameters like learning rate need to be tweaked and change the optimizer to
either Adam or Adadelta to not get stuck and stop learning further.
Hyperparameters:
• Number of Layers: It must be chosen wisely as a very high number may introduce
problems like overfitting and vanishing and exploding gradient problems and
a lower number causes the model to with high bias and low potential model.
As the model have two separate GRU units for training visit level codes and
variable level codes, visit codes and variable codes with 128 hidden alpha layers
and 128 hidden beta layers are trained respectively. As the model’s performance
metric is accuracy, on changing the number of hidden layers to 256 there was no
significant change in the accuracy of the model. Thus the hidden layer count is
kept at 128 so as to maintain the simplicity of the model. The linear embedding
applied to the initial list of integers were tweaked from 128 embedding size to
256 as it showed an increase in the accuracy of the model by 9% and also a
substantial decrease in the cost function.
• Activation Function: The popular choices in this are ReLU, Sigmoid, Tanh,
and LeakyReLU. For the Update gate (used to determine how much of the past
information is to be passed on) and Reset gate (used to decide how much of the
past information to forget) of the GRU models, sigmoid and tanh as activation
functions are being used.
• Optimizer: It is the algorithm used by the model to update weights of every
layer after every iteration to minimize the cost function. For this model, initially
AdaGrad was used as an optimizer but it has some concerns of its own like
continually decaying learning rate η, manual selection of the learning rate η. To
resolve these concerns optimizer was switched to Adadelta.
• Initialization: Doesn’t play a very big role as defaults work well but still one
must avoid using zero or any constant value (same across all units) weight ini-
tialization. The weights are initialized between −1 and 1 for linear embedding,
for visit level (alpha), for variable level (beta). The biases are initialized with
0’s of suitable data type and format.
• Batch Size: It is indicative of no. of patterns shown to the network before the
weight matrix is updated. If the batch size is less, patterns would be less repeat-
ing and hence the weights would be all over the place and convergence would
Deep Learning and Explainable AI in Healthcare Using EHR 139
become difficult. For this model, batch size is initialized as 100. This was appro-
priate as modifying it any further was only increasing the time taken for execu-
tion.
• Number of Epochs: The no. of epochs is the no. of times the whole training
dataset is passed through the model. Seventeen epochs are used here as increas-
ing/decreasing it further does not affect on accuracy. The number of epochs is
an important hyperparameter since an increase in this number might result in
overfitting of the model and a decrease in it may yield poor results as the model
may not function to its fullest potential. Overfitting can lead to generalization,
which eventually would result in vanishing and exploding gradient problems.
• Dropout: The keep-probability of the Dropout layer can be thought of as hyper-
parameter which could act as a regularizer to help us find the optimum bias-
variance spot. Dropouts are applied to two places: (1) to the input embedding,
(2) to the context vector c_i. Their respective dropout rates are 0.4 and 0.4
respectively. In simplest terms, this value is precise as it complements the per-
formance metrics of the model. Dropout values affect the performance so it is
recommended to tune them for the data.
• L1/L2 Regularization: Any machine learning model needs to learn from all
features provided to it. L2 regularization is applied to W_emb (weight of linear
embedding layer), w_alpha (weight of visit level GRU model), W_beta ((weight
of variable level GRU model)), and w_output (at the output layer after the
concatenation of alpha, beta weights with the embedding of the input vector).
Trained model is evaluated based on performance measure using test dataset. The
difference between the predicted value and its corresponding real values is measured
by the cost function. To find this cost (train_cost), the Adadelta optimization algo-
rithm is used. Adadelta is an optimization algorithm from the family of Stochastic
Gradient Descent algorithms. It finds the minimum cost value. It uses various weights
and always updates the weights according to the loss, so every time it gets to try new
weight values. The model is first run with some initial weights and the algorithm
updates them, trying to find the right combination by performing thousands of iter-
ations. It is important to note that Adadelta is looking for the minimum cost, not
minimum weights, and hence it is only updating weights, not minimizing them.
6. Fit a simple model to the permuted data, explaining the complex model outcome
with the features from the permuted data weighted by its similarity to the original
observation.
7. Extract the feature weights from the simple model and use these as explanations
for the complex model’s local behavior.
Results for LIME Algorithm using Multilayer Perceptron, Random Forest and Naïve
Bays is described below.
With a change in the number of input layers from 128 to 256, accuracy was increased
to 82.6%, which was initially 73% as showed in Figs. 6 and 7.
As shown in Figs. 8 and 9, here D_428: Congestive Heart Failure, D_427: Tachy-
cardia and D_996: Mechanical Complication due to a cardiac pacemaker.
On running the model on test data, a ‘.txt’ file is generated. It contains the contribu-
tion score of each ICD9 code with respect to each visit for an individual patient. This
graph shows the contribution of the ICD9 code at a particular visit for a particular
patient.
As shown in Fig. 10, the accuracy of the RNN model was 82.5%. Some diseases
were misclassified with this accuracy but considering that the model is also capable of
interpretation the accuracy of the model can be said to be efficient. Also, this allows
the engineer to understand that the model is somewhere giving wrong predictions
and must be improved, thus increasing transparency.
Figure 11, graph plots the number of patients found having a particular disease. For
example, Chronic Kidney Disease has the highest count of 1670 patients; similarly,
Acidosis is found in 1638 patients, etc.
On average, which are those diseases that contribute negatively and has the least
effect on heart failure are plotted in Fig. 12. For example, Tobacco use disorder
contributes as low as −0.0134; similarly Obesity with average −0.0127 has also less
contribution.
144 S. Khedkar et al.
On average, which are those diseases that contribute positively and have the most
effect on heart failure are plotted in this graph. For example, Atrial fibrillation con-
tributes the most with average score 0.0253; similarly, Congestive Heart Failure with
average 0.0206 has also more contribution as shown in Fig. 13.
Figure 14 shows the total number of times that particular disease was found in a
visit for a particular patient. For example, Chronic Kidney Disease was found 4029
times in total; similarly, Congestive Heart Failure was found 3742 times, etc.
Deep Learning and Explainable AI in Healthcare Using EHR 145
Fig. 10 Graphical output for a patient using ante-hoc explanatory model—ATTENTION mecha-
nism
5 Conclusions
Artificial Intelligence (AI) and neural networks, in particular have seen unprece-
dented advancement in the last decade, mainly due to constantly improving com-
putational capabilities. However, these advancements have not been harnessed in a
business and social perspective due to a lack of trust in the models owing to their
Deep Learning and Explainable AI in Healthcare Using EHR 147
black-box nature, with business applications still using relatively simple and less
accurate algorithms.
This conundrum signals an urgent need to bring about explainability and inter-
pretability to deep neural networks. This chapter addresses this need by describing
an explainable neural network, which can explain its own predictions, while also
comparing it with a post hoc explainer like LIME. Predicting the possibility of heart
failure in an interpretable manner would give doctors an early warning, and help
reduce readmission rates. The model using RNN gives 82.5% Accuracy. This solu-
tion would contribute towards building trust in AI, and also towards putting neural
networks into widespread and constructive use. In Future, the model can be extended
to predict other diseases. The efficacy of ensembled algorithms can be tested for more
precise predictions.
References
1. Goldberger, A.L., Amaral, L.A.N., Glass, L., Hausdorff, J.M., Ivanov, P.Ch., Mark, R.G., Mietus,
J.E., Moody, G.B., Peng, C.-K., Stanley, H.E.: PhysioBank, PhysioToolkit, and PhysioNet(June
13): components of a new research resource for complex physiologic signals. Circulation
101(23), e215–e220. (Circulation Electronic Pages; http://circ.ahajournals.org/content/101/23/
e215.full) (2000)
2. MIMIC III dataset (Medical Information Mart for Intensive Care III). https://mimic.physionet.
org/
3. Holzinger, A., Biemann, C., Pattichis, C.S., Kell, D.B.: What do we need to build explainable
AI systems for the medical domain (2017). arXiv:1712.09923v1
4. Zhao, C., Shen, Y., Yao, L.-P.: Convolutional neural network-based model for patient represen-
tation learning to uncover temporal phenotypes for heart failure (2017)
148 S. Khedkar et al.
5. Choi, E., Bahadori, M.T., Kulas, J.A., Schuetz, A., Stewart, W.F., Sun, J.: RETAIN: an inter-
pretable predictive model for healthcare using reverse time attention mechanism. In: 30th con-
ference on neural information processing systems (NIPS), Barcelona, Spain (2016)
6. Guestrin, C., Singh, S., Ribeiro, M.T.: Why should i trust you? Explaining the predictions of
any classifier (2016). arXiv:1602.04938
7. Bahdanau, D., Cho, K.H., Bengio, Y.: Neural machine translation using attention mechanism
paper, ICLR (2015)
8. Choi, E., Bahadori, M.T., Song, L., Stewart, W.F., Sun, J.: GRAM: graph-based attention model
for healthcare. In: ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining, pp. 787–795 (2017)
9. Cleveland Heart Disease Dataset: (1988). https://archive.ics.uci.edu/ml/datasets/heart+Disease
Priyanka Gandhi is a Software Developer with active research interests in the field of AI and
Big Data Technologies. She has pursued Bachelor in Computer Engineering from V.E.S.I.T., Uni-
versity of Mumbai in 2019.
Gayatri Shinde is a Software Developer with active research interests in the field of AI and
Deep Learning. She has pursued Bachelor in Computer Engineering from V.E.S.I.T., University
of Mumbai in 2019.
Vignesh Subramanian is working as Software Developer with active research interests in the
field of AI and Big Data Analytics. He has pursued Bachelor in Computer Engineering from
V.E.S.I.T., University of Mumbai in 2019.
Deep Learning for Analysis of Electronic
Health Records (EHR)
1 Introduction
Over the earlier decade, emergency clinic selection of electronic health record
(EHR) systems has expanded numerous folds, which gave $30 billion motivators to
restorative organizations, medical clinics and specialists to receive EHR systems [1].
According to the most recent report, about 84% of medical clinics have embraced at
any rate a fundamental EHR framework, a 9-overlay increment from 2008 [2]. More-
over, office-based doctor appropriation of essential and ensured EHRs has expanded
to 87% from 42% [3].
EHR frameworks store every patient experience information, including statistic
data, research facility tests and results, analysis, remedies, clinical notes, radiologi-
cal pictures, and so forth [1]. While for the most part intended for improving social
insurance proficiency from a dynamic stance, numerous investigations have discov-
ered optional utilizations of clinical data [4, 5]. Specifically, the patient information
included in EHR frameworks have been utilized for several such assignments as
medicinal idea extraction [6, 7], infection deduction, quiet direction displaying, clin-
ical choice emotionally supportive networks, and more (Table 1).
Until some most recent couple of years, a significant part of the techniques to
investigate rich EHR information, depended on, customary statistical and machine
learning procedures like logistic regression, support vector machines (SVM), and
random forests [13]. Of late, deep learning strategies have made incredible progress
in a few spaces through catching long-run conditions and deep hierarchical feature
construction in information in an able way [14]. Taking a gander at the ascent in the
fame of deep learning strategies and the inexorably tremendous measure of patient
information, there has been additionally, an expansion in the quantity of publications
which apply deep learning to EHR information for clinical informatics errands yield-
ing better performance over conventional techniques and which require less tedious
pre-processing and highlight designing.
This chapter audits the particular deep learning strategies employed for EHR
information examination and inference, and talks about the strong clinical applica-
tions empowered by such advances. Dissimilar to other new studies, which explored
deep learning in the broad context of health informatics applications of informatics,
running from genome investigation to biomedical picture examination, this chapter
is focussed only on deep learning methods combined to EHR information [15].
Usage of the EHR frameworks has altogether extended in both hospital an ambulatory
thought context [2]. EHR use at crisis hospitals and clinics can improve understanding
thought by reducing errors, growing profitability, and improving care coordination,
while furthermore giving a rich wellspring of data for examiners. EHR frameworks
can change in functionality terms, and are regularly masterminded into basic EHR
without clinical notes, basic clinical notes with EHR, and comprehensive systems.
152 P. S. Gangwar and Y. Hasija
While lacking additionally created usefulness, even basic EHR frameworks can give
an information on patient’s medicinal history, challenges, and medication use. EHR,
since, was generally proposed for internal hospital administrative assignments, a
couple classification design were available for record relevance therapeutic informa-
tion and cases. A couple of models consolidate investigation codes, system codes,
re-look office perceptions, and solution codes. These codes can change between foun-
dations, with midway guide pings kept up by resources. Given the gigantic display
of schemata, mixing and investigating data across over wordings and between foun-
dations is a consistent region of research. A couple of the profound EHR frameworks
in the part proposes sorts of clinical code portrayal that credit themselves even much
viably to across foundation examination and application. EHR frameworks store a
couple of sorts of patient information, including demographics, diagnoses, physi-
cal exams, sensor measurements, laboratory test results, prescribed or administered
medications, and clinical notes [15]. EHR data is heterogeneous, include data types:
(1) Numerical sums, for instance, BMI (weight file),
(2) Date time objects, for instance, birth date or time of insistence,
(3) Categorical characteristics, and
(4) Natural language free-content, for instance, advance notes or discharge sum-
maries. Besides, these data types can be mentioned sequentially to outline the
explanation behind,
(5) Derived course of action of time, for instance, perioperative essential sign or
multimodal tolerant history.
While other biomedical data, for instance, restorative pictures or genomic in-
course of action exist and are peddled in later huge articles, in this review we focus
on these 5 data sorts found in many present day EHR frameworks.
Machine learning strategies can be thoroughly apportioned into 2 vital orders: super-
vised and unsupervised learning. Supervised learning procedures incorporate deriv-
ing a mapping capacity for example y equals f(x), sources of info x to yields y.
Examples of supervised learning tasks include regression and classification, with
algorithms including logistic regression and support vector machines. On the other
hand, the target of unsupervised learning frameworks is to get fascinating properties
of the scattering of x. E.g. of unsupervised learning tasks include clustering and
density estimation. The representation of inputs is a fundamental issue spanning all
types of machine learning frameworks. For every datum point, attributes set called
as, features, are separated to use as input to ML frameworks. In standard ML, the
features used to be hand-made reliant on territory data. One inside norms of deep
learning is the automatic data-oriented feature extraction.
Deep Learning for Analysis of Electronic Health Records (EHR) 153
D
E(θ, D) = − [log P(Y = yi |xi , θ )] + λθ P (1)
i=0
The main term in condition constrains the whole of the log setback over the whole
preparing data-set (D); 2nd term tries to restrict p-standard of the educated model-
parameters θ i which is constrained by a tuneable-parameter λ. This second term
is called as regularization; and is a strategy used to keep a model from over-fitting
and to manufacture its ability to total up to new, covered points of reference. The
misfortune work is generally upgraded using back propagation, a framework for
weight streamlining that limits misfortune in reverse.
In the rest of this area, a few normal kinds of profound learning-models utilized for
deep EHR application are assessed, which is/are all founded on the ANN’s design
Fig. 2 A fundamental
neural-network [16]
154 P. S. Gangwar and Y. Hasija
A MLP is a kind of ANN which comprise of multiple hidden layers, in which every
neuron in the layer I is totally associated with one another neuron in the layer I + 1.
Conventionally, these systems are constricted to two or three shrouded layers, and
the information streams just in one direction, as opposed to repetitive/undirected
models. Expanding the possibility of a single-layer ANN, each shrouded unit forms
a weighted sum of the yields from the past layer, trailed by a nonlinear initiation
(σ) of the determined aggregate as in condition. Here; d is the amount of units in
past layer x j is the yield from the past layer’s jth hub, and wij and bij are weight and
inclination substances related with each x j . Customarily sigmoid/tan h were picked
nonlinear enactment capacities, however present day systems are utilizing capacities,
for example, amended direct units (ReLU) [17].
⎛ ⎞
d
hi = σ ⎝ x j wi j + bi j ⎠ (2)
j=1
In the wake of advancing hidden layer loads amid preparing, the system learns
a connection between data x and yield y. As more hidden layers are included, it is
normal that the information will be appeared in an obviously progressively unique
way in light of each shrouded layer’s nonlinear enactment. While the MLP is one of
least troublesome models, various structures frequently combine totally associated
neurons.
CNN had transformed into an incredibly common gadget of late, especially in the
image processing community. CNNs power neighbourhood availability on the unre-
fined information. For instance, instead of viewing a 50 × 50 picture as 2500 irrele-
vant pixels, increasingly significant features are separated by studying the image as
an accumulation of neighbourhood pixel patch. Basically, a one-dimensional (1D)
time course of action can in like manner be considered as an integration of neigh-
bourhood signal bits. The condition for 1-D convolution is showed up in condition,
where x is information sign and w is gauging capacity or is convolutional channel.
Deep Learning for Analysis of Electronic Health Records (EHR)
Fig. 3 The most widely recognized architectures of deep learning for examining EHR information [16]
155
156 P. S. Gangwar and Y. Hasija
∞
C1d = x(a)w(t − a) (3)
a=−∞
CNNs incorporate deficient associations as the channels are reliably humbler than
the information, accomplishing usually unobtrusive number of parameters. Convolu-
tion in like way invigorates parameter sharing since each channel is associated over
the whole data. In a CNN, the convolution layer is diverse convolutional channels
depicted over, all tolerating a similar responsibility from the past layer, which ideally
make sense of how to remove unmistakable lower-level highlights. Thus, a subsam-
pling or pooling layer is conventionally interfaced to signify the removed highlights
(Fig. 4).
Standard RNN varieties incorporate the long short-term memory (LSTM) and
gated recurrent unit (GRU) model, the two named to as gated-RNNs. while standard-
RNNs are involved inter-connected shrouded sanctum units, every unit in the gated-
RNN is supplanted by an uncommon cell which contains an inward recurrent circle
and an arrangement of doors which controls the movement of information. They
have showed up in demonstrating longer term progressive conditions among various
preferences.
z = σ (W x + b) (5)
x̄ = σ W z + b (6)
At the point when AE is prepared, a lone data is bolstered through the net-work,
with most deep hidden layer initiations filling in as the data’s encoded portrayal.
AEs serve to change the information into a format where simply the most critical
inferred measurements are put away. Thusly, they resemble standard dimensional-
ity decrease systems like principal component analysis (PCA) and singular value
decomposition (SVD), yet with a basic bit of leeway for complex issues because
of nonlinear changes by methods for each concealed layer’s enactment capacities.
158 P. S. Gangwar and Y. Hasija
Other unsupervised deep learning engineering for learning input information por-
trayals is RBM. The purpose behind RBMs resembles auto-encoders, yet RBMs
rather take a stochastic point of view by assessing the probability dispersion of the
data information. Thusly, RBMs are regularly seen as generative model/s, attempt-
ing to demonstrate the hidden technique by which the information was created. The
acknowledged RBM is an imperativeness based model with two-fold discernible
units (~v) and shrouded units (~h), with essentialness work indicated in condition.
E(v, h) = −b T v − c T h − W v T h (7)
In a BM, all the units are totally associated, while in a RBM there are no associa-
tions between any two discernible units/any two concealed units. Preparing a RBM is
consistently practiced through stochastic improvement, for instance, Gibbs testing.
In this area, we review the present forefront in clinical applications coming about in
light of continuous advances in profound EHR learning. A diagram generally deep
EHR learning ventures and the target assignments is seemed table, where we star
Deep Learning for Analysis of Electronic Health Records (EHR) 159
present errand and subtask definitions dependent on a coherent social event of ebb
and flow examine. An impressive part of the applications and results in the rest of this
area depend on datasets of private EHR having a place with autonomous medicinal
services foundations in Section VII. In any case, a couple of concentrates incorpo-
rated, a transparently available fundamental thought information base, similarly as
open clinical note datasets (Table 2).
Assessing method
Accuracy, review, and F1 score are the essential classification measurements for the
assignments including single idea extraction, Temporal event extraction [18], and
160 P. S. Gangwar and Y. Hasija
clinical relation extraction [19]. The study on clinical shortened form development
used exactness as its assessment method. While a few studies share comparative
assignments and assessment measurements.
Presently, carefully assembled example are utilized for mapping between organized
medicinal thoughts, where each thought is appointed an unmistakable code by its
significant metaphysics. These static various levelled associations disregard to gauge
the natural likenesses between thoughts of various sorts and coding plans. Continuous
deep learning systems utilized for progressively point by point examination and
logically careful prescient assignments.
In this area, at first deep EHR strategies for addressing discrete medicinal codes is
depicted as certifiable esteemed vectors of discretionary measurement. These under-
takings are, all things considered, unsupervised and focus on normal associations
and gatherings.
Several recent studies have applied deep unsupervised representation learning tech-
niques to derive EHR concept vectors that capture the latent similarities and natural
Deep Learning for Analysis of Electronic Health Records (EHR) 161
clusters between medical concepts. We insinuate this district as EHR thought por-
trayal, and its fundamental goal is to get vector portrayals from meagre medicinal
codes to such a degree, that practically identical thoughts are adjoining in vector
space. Inactive Encoding: Aside from NLP-roused strategies, other typical profound
learning portrayal learning methodology have similarly been utilized for addressing
EHR thoughts. Tran et al. plan an adjusted restricted RBM which uses an organized
preparing strategy to fabricate portrayal interpretation. They assessed the nature of
associations between different restorative thoughts, and found that preparation direct
models on portrayals got through AEs massively outflanked customary straight mod-
els alone, achieving top tier execution.
As the whole and accessibility of itemized clinical wellbeing records has detonated of
late, there is a gigantic task open door for coming back to and refining wide infection
and determination definitions and limits. A model utilization rationale of allowing
the information to legitimize itself with genuine proof by finding lethargic associ-
ations and various levelled thoughts from the rough information, with no human
supervision or earlier inclination. With the accessibility of massive proportions of
clinical information, various continuous investigations have utilized profound learn-
ing frameworks for computational phenotyping. Computational phenotyping has two
basic applications:
(1) Finding and stratifying new subtypes;
(2) Finding unequivocal phenotypes for improving arrangement under existing ail-
ment limits and definitions.
The two zones attempt to find new information driven phenotypes; the past is
an, all things considered, unsupervised endeavour i.e. hard to quantitatively survey,
where the other is naturally attached to an administered learning with viably approved
result.
similar endeavour, Shweta et al. research different RNN designs and word insert-
ing methods for distinguishing perhaps recognizable named substances in clinical
substance.
6 Interpretability
Since deep learning strategies have increased sick notoriety for creating best in class
execution on a wide assortment of errands, its real analysis is that the yield models are
hard to translate normally. Accordingly, a few deep learning structures are much of the
time alluded to as “secret elements”, where just the information and yield forecasts
pass on significance to a human spectator. The fundamental downside for this absence
of model straightforwardness is actually what makes deep realizing so viable: the
layers of nonlinear information changes that uncover concealed variables of trap
in the info. This issue exhibits an exchange off among execution and receptiveness
(Table 3).
In the clinical area, model straightforwardness is most significant, given that fore-
casts may be utilized to influence understanding medicines and certifiable restorative
basic leadership. This is the motivation behind why interpretable direct models like
calculated relapse stifle connected clinical informatics. In this part, clinical deep
learning is endeavoured to be made increasingly interpretable.
A mainstream game plan inside the picture handling network is to take a gander at
the classes of information sources that end in the most extreme enactment of each
concealed unit of a model. This speaks to a preliminary to take a gander at what
unequivocally the model has learned, and may be utilized to dole out significance to
the crude info choices. This methodology has been embraced by numerous investi-
gations encased in our outline.
6.2 Constraints
In the kind of EHR idea representation and phenotype thinks about, a few exami-
nations point to a much roundabout thought of interpretability by looking at normal
groups of the subsequent vectorised representations. In comparable manner, Nguyen
et al. venture dispersed representations of clinical occasion and patient vectors into 2
measurements by means of t-SNE, taking into consideration a subjective correlation
of comparative determinations and patient subgroups.
The issue of deep model straightforwardness was handled at long last in the Inter-
pretable Mimic Learning systems. Initial a deep neural system was prepared on crude
patient information with related marks of class, which results into a vector for every
example. An extra gradient boosting tree (GBT) was prepared on the crude patient
information, yet the deep system’s likelihood expectation was utilized as the objec-
tive name. As GBTs are interpretable straight models, highlight significance can be
appointed to the crude information highlights while outfitting the intensity of deep
systems. The copy learning technique has comparative or preferred execution over
Deep Learning for Analysis of Electronic Health Records (EHR) 165
both of the standard straight and deep models for some phenotyping and mortality
forecast undertakings, while holding the needed component straight forwardness.
This chapter provides a brief overview of current deep learning research as it pertains
to EHR analysis. This is a rising zone as seen by the fact that the greater part of the
chapter was published in past two years [1].
Tracing back the deep learning-based advances in image and natural language
processing, we see a clear chronological similarity to the progression of current
EHR-driven deep learning research. In particular, a dominant part of study in the
review are associated with the thought of representation learning, i.e., how suc-
cessfully to represent the enormous measure of crude patient information that has
amazingly turned out to be accessible in the earlier decade. Fundamental image
processing research is concerned with increasingly complex and hierarchical repre-
sentations of images composed of individual pixels. Additionally, NLP focusses on
word, sentence, and report level representations of language including singular words
or characters. Moreover, the investigation of different plans of speaking to quiet well-
being information is occurring from individual medical codes, demographics, and
vital signs [1].
References
1. Shickel, B., Tighe, P.J., Bihorac, A., Rashidi, P.: Deep EHR: a survey of recent advances in
deep learning techniques for electronic health record (EHR) analysis. IEEE J. Biomed. Health
Inform 22(5), 1589–1604 (2018)
2. Birkhead, G.S., Klompas, M., Shah, N.R.: Uses of electronic health records for public health
surveillance to advance public health. Annu. Rev. Public Health 36(1), 345–359 (2015)
3. Charles, D., Gabriel, M., Searcy, T., Carolina, N., Carolina, S.: Adoption of Electronic Health
Record Systems Among U.S. Non-federal Acute Care Hospitals: 2008–2014. The Health Infor-
mation Technology for Economic and Clinical Health (HITECH) Act of 2009 Directed the
Office of the National Coordinator for Health, vol. 4, no. 23, pp. 2008–2014 (2015)
4. Jamoom, E., Yang, N.: Table of electronic health record adoption and use among office-based
physicians in the U.S., by state. In: 2015 National Electronic Health Records Survey, pp. 1–2
(2016)
5. Botsis, T., Hartvigsen, G., Chen, F., Weng, C.: Secondary use of EHR: data quality issues and
informatics opportunities. In: AMIA Joint Summits Translational Science Proceedings, vol.
2010, pp. 1–5 (2010)
6. Skrøvseth, S.O., Augestad, K.M., Ebadollahi, S.: Data-driven approach for assessing utility of
medical tests using electronic medical records. J. Biomed. Inform. 53, 270–276 (2015)
7. Meystre, S.M., Savova, G.K., Kipper-Schuler, K.C., Hurdle, J.F.: Extracting information from
textual documents in the electronic health record: a review of recent research. In: Yearbook of
Medical Informatics, pp. 128–144 (2008)
166 P. S. Gangwar and Y. Hasija
8. Ekbal, A., Saha, S., Bhattacharyya, P.: Deep learning architecture for patient data de-
identification in clinical records. In: Proceedings of the Clinical Natural Language Processing
Workshop, pp. 32–41 (2016)
9. Choi, Y., Chiu, C.Y.-I., Sontag, D.: Learning low-dimensional representations of medical con-
cepts. In: AMIA Joint Summits Translational Science Proceedings, vol. 2016, pp. 41–50 (2016)
10. Nguyen, P., Tran, T., Wickramasinghe, N., Venkatesh, S.: Deepr: a convolutional net for medical
records. IEEE J. Biomed. Health Inform. 21(1), 22–30 (2017)
11. Choi, E., et al.: Multi-layer representation learning for medical concepts. In: Proceedings of
the ACM SIGKDD International Conference on Knowledge and Discovery and Data Mining,
pp. 1495–1504, 13–17 Aug 2016
12. Pham, T., Tran, T., Phung, D., Venkatesh, S.: DeepCare: a deep dynamic memory model for
predictive medicine. In: Lecture Notes in Computer Science (including Subser. Lecture Notes
in Artificial Intelligence Lecture Notes Bioinformatics), vol. 9652 LNAI, pp. 30–41, Feb 2016
13. Jiang, M., et al.: A study of machine-learning-based approaches to extract clinical entities and
their assertions from discharge summaries. J. Am. Med. Inform. Assoc. 18(5), 601–606 (2011)
14. Borovcnik, M., Bentz, H.-J., Kapadia, R.: A Probabilistic Perspective (1991)
15. Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. MIT Press (2016)
16. Cheng, Y., Wang, F., Zhang, P., Hu, J.: Risk prediction with electronic health records: a deep
learning approach. In: 16th SIAM International Conference on Data Mining 2016 (SDM 2016),
pp. 432–440 (2016)
17. Wong, C., Deligianni, F., Berthelot, M., Andreu-perez, J., Lo, B., Yang, G.: Deep learning for
health informatics. IEEE J. Biomed. Health Inform. 21(1), 4–21 (2017)
18. Fries, J.A.: Brundlefly at SemEval-2016 task 12: recurrent neural networks vs. joint inference
for clinical temporal information extraction. In: SemEval 2016—10th International Workshop
Semantic Evaluation Proceedings, pp. 1274–1279 (2016)
19. Lv, X., Guan, Y., Yang, J., Wu, J.: Clinical relation extraction with deep learning. Int. J. Hybrid
Inf. Technol. 9(7), 237–248 (2016)
20. Dernoncourt, F., Lee, J.Y., Uzuner, O., Szolovits, P.: De-identification of patient notes with
recurrent neural networks. J. Am. Med. Inform. Assoc. 24(3), 596–606 (2017)
21. Tran, T., Nguyen, T.D., Phung, D., Venkatesh, S.: Learning vector representation of medical
objects via EMR-driven nonnegative restricted Boltzmann machines (eNRBM). J. Biomed.
Inform. 54, 96–105 (2015)
22. Lasko, T.A., Phenotype Discovery from Electronic Medical Records How Do You Perceive a
Chessboard? (2017)
23. Che, Z., Purushotham, S., Khemani, R., Liu, Y.: Interpretable deep models for ICU outcome
prediction. In: AMIA … Annual Symposium Proceedings. AMIA Symposium, vol. 2016,
pp. 371–380 (2016)
Mr. Pawan Singh Gangwar is a dynamic individual currently pursuing his master’s in bioinfor-
matics from Delhi Technological University. Having completed his bachelor’s in biotechnology
he is highly motivated towards computational research in life sciences and possesses an in-depth
knowledge of the field. In his free time Pawan likes to play badminton, chess and listen to music.
Dr. Yasha Hasija a master of many fields Dr. Yasha is an Associate Professor in the Delhi Tech-
nological University. She holds a bachelor’s and master’s degree in biotechnology and Ph.D. in
Bioinformatics. Besides having a sound academic foundation Dr. Yasha is a vibrant individual
and a very good orator. Specializing in genome informatics and interaction study with human dis-
eases, some of her research interests are—genetic analysis of dermatological disorders, tubercu-
losis study and role of human genetic variations in age-related disorders.
Application of Deep Architecture
in Bioinformatics
Abstract Recent discoveries in the field of biology have transformed it into a data-
rich domain. This has invited multiple machine learning applications, and in par-
ticular, deep learning a set of methodologies that have rapidly evolved over the last
couple of decades. Deep learning (DL) is extensively used in many domains, includ-
ing bioinformatics for the analysis and classification of biomedical imaging data,
sequence data from omics and biomedical signal processing. It has been used to
predict protein structures, uncover gene expression regulation, classify anomalies
and understand functionalities of the brain. Basic deep neural networks, which con-
tains stacked columns of non-linear processing units, are quite versatile and has been
extensively used in almost every domain of bioinformatics. Convolutional neural net-
works have proved to be quite effective when working with image data and are used
in classifying biomedical images such as histopathology images, cell images, X-ray
images, magnetic resonance images and so on. They have been used for anomaly
classification, recognition, and segmentation. For areas that require dealing with
sequential data, such as protein structure prediction and brain decoding, recurrent
neural networks have been used extensively. Besides these, a lot of new architec-
tures are being currently explored to address some of the common drawbacks of
deep learning. Incorporation of fuzzy systems in deep learning has been done in an
attempt to improve the performance of such models. Multimodal learning in deep
learning is enabling modern architectures to work with heterogeneous data.
1 Introduction
Deep learning (DL) is an important area in machine learning that has received a lot
of attention recently. Deep learning methods work by progressively extracting com-
plex features from the input data and mapping those features from the output. The
learning algorithm can efficiently build up complex relationships between an input
and the desired output. Therefore, it has been used extensively in areas like com-
puter vision and pattern recognition, self-driving cars, robots, prediction of weather
forecasts, earthquakes, and even generate deep neural networks. The innovations
were not only fueled by the recent algorithmic advances in deep architectures as
well as the availability of large throughput data. For training deep learning models
effectively, copious amounts of data is necessary. With modern devices, sequencing
techniques and improved imaging technologies, biology have become a data-rich
field. Omics data itself is a major fraction of the accumulated data. There are also
vast repositories of image data and signal data available. To make good use of the
vast amount of available data, deep learning provides the perfect set of tools. Con-
temporary deep learning models are used to solve diverse biological problems such
as protein structure prediction, protein-protein interaction analysis, protein function
prediction, bioimage analysis, brain signal analysis and so on.
Previously, to make sense of biological data, many other well-known algo-
rithms were used, such as support vector machines (SVMs), hidden markov models
(HMMs), random forests, Gaussian networks, Bayesian networks, and so on. These
have been heavily implemented in proteomics, genomics, systems biology, and many
other related domains. No matter what algorithm is used, the performance of the
method depended extensively on what features were presented as the input. Features
represent the input data which, subsequently processed by machine learning algo-
rithms, provide the relevant output. But selecting what the right features are can be
quite a difficult task, especially in the domain of omics. This has been a great con-
tribution to deep learning that has not only helped make massive progress in other
domains but in bioinformatics too.
Deep learning has shown great promise in real-world applications where the majority
of machine learning algorithms have failed. Most early machine learning approaches
relied heavily on the knowledge of domain experts for feature engineering—the task
of crafting the inputs for the machine learning model. This was a great limitation
since the task of processing raw data for creating features was a tedious task. This is
something that deep learning takes care of, but generating the features on its own and
mapping them to the outputs. This is done by multiple layers of nonlinear processing
units, called artificial neurons, or perceptions. Each neuron of a layer is connected
with all the other neurons of the preceding and the succeeding layer, but with none
Application of Deep Architecture in Bioinformatics 169
of the neurons in its own layer. Layers of these neurons stacked one after the other
forms a deep neural network. Each neuron can be tweaked using a set of parameters,
called weights and biases. As the deep neural network is trained, these parameters
are adjusted by the learning algorithms so that the error is minimized. As each layer
of neurons gets trained over each iteration, they get better at extracting the relevant
features from the input data. Most deep learning models are built based on this
technique only. Deep learning architectures that can be broadly classified into three
groups: deep neural networks (DNN), convolutional neural networks (CNN) and
recurrent neural networks (RNN). The term deep neural networks is a very generic
term that is often used to refer to all deep learning architectures, but in this case,
refers to multilayer perceptrons (MLPs), restricted Boltzmann machines (RBMs) and
stacked autoencoders (SAEs). CNNs have been used for computer vision problems
before and in this case, medical image, microscopic images are extensively used with
CNNs for analysis. RNNs are used to predicting or analyzing sequence data, such as
biomedical signal data and sequence data.
Proteins are essentially polymers of amino acids [1, 2]. After an amino acid sequence
is created after transcription and translation, the chain of amino acids take up different
shapes as it folds onto itself. The sequence of the chain of amino acids represent the
primary structure of a protein. This chain is formed by the peptide bonds during the
synthesis of the protein. Hence, amino acid sequences are also called polypeptides.
These polypeptides fold into simple structures such as loops, sheets or helices. These
structures are known as secondary structures. Secondary structures are regular local
sub-structure on the polypeptide backbone chain. Depending on the amino acids that
are present in the chain, the different structures are formed. Formally, the structures
are mainly of three types: αhelix, β-sheets and Loops [1, 2]. The hydrogen bonds
that exist between the carbonyl oxygen and amine hydrogen in the peptide backbone
determine this. Subsequently, the tertiary structure of a protein is its 3D structure
which is formed as the secondary structure folds onto itself. The tertiary structure
has a one polypeptide chain backbone with single or multiple protein secondary
structure (PSS). There are different bonding and non-bonding types of energy at work
that determine the tertiary structure [1, 2]. These include the covalent bond energy,
Hydrogen-Hydrogen (H–H) interaction energy, electrostatic energy, van der Waals
forces and other intra-molecular forces. Many tertiary structures together combine
to form the quaternary structure. This happens when multiple protein structures bind
together to reach a minimum global energy state [3].
Protein structure prediction is quite a difficult task due to the various parameters
at play. Predicting secondary structure from the primary sequences is not that diffi-
cult with contemporary methodologies. Chou-Fasman algorithm is a statistical tool
that was initially used to find the secondary protein structure from its polypeptide
sequence [4]. Now, multiple methods are available that can perform this prediction
170 S. Sen et al.
with a higher accuracy. For instance, a RNN can easily outperform the Chou-Fasman
algorithm. However, predicting the 3D structure is quite a challenge. Two types of
computational techniques i.e., template based and ab initio are implemented to pre-
dict three-dimensional structure computationally. Among them, the template-based
technique is quite dated and depends on sequence similarity with another known
structure sample e.g., homology modelling. However, the utmost target is to design a
structure with a global minimum energy. Till date, multiple machine learning based
algorithms are designed [5, 6]. Most of them approximate the results with multiple
structures and then further optimize that result.
The function of a particular protein depends on its structure of a protein. A pro-
tein interacts with other elements (mostly proteins except for DNA binding proteins)
through the binding site. The interaction partner of protein is decided by compatible
binding sites among proteins. Similarly, the protein-protein interaction (PPI) network
is derived from multiple interaction partners [7, 8]. At biological level, PPI networks
have their own importance. Technically, proteins are the main functional elements of
any biological elements. Their behaviour depends on the functions and the interaction
partners which assist in defining the position of a protein in any biological pathway
[7, 8]. Predicting the functions and interaction partners are few known challenges in
computational biology. Already different Hidden Markov model [9], Genetic Algo-
rithms are implemented to solve this type of issues at some optimal level. However,
the processing time is quite larger. So there is more scope of algorithmic improve-
ments in such a field where few deep architectures are implemented e.g., Zhao and
Gong [10] describes a deep model to predict protein protein interaction pairs.
Deep architecture has a greater impact on image processing [11]. Therefore, deep
learning approaches are implemented in medical images to diagnose unusual dis-
eased conditions [12, 13]. In recent researches, it is observed that different deep
architectures are implemented on MRI data [13], hyperspectral images [14] and so
on.
Predicting protein structures is one of the oldest problems in the domain of bioinfor-
matics. Since structures are determined by the sequence, recurrent neural networks
intuitively appear to be the suitable choice. However, other networks such as CNN
and generative stochastic networks (GSN) have also been used. These are discussed
below.
Application of Deep Architecture in Bioinformatics 171
uses a feed-forward network for PSS prediction using softmax prediction function
[24]. Equations 1–8 gives the detailed description of the LSTM architecture which
is used for protein secondary structure prediction [24].
Ft = σ (W F h t−1 + W F at + b F ) (1)
It = σ (W I h t−1 + W I at + b I ) (2)
t = tanh(at WG + h t−1 WG + bG )
G (3)
Mt = Ft Mt−1 + It
gt (4)
h t = Yt tanh(Mt ) (6)
1
σ (x) = (8)
1 + exp(1 − x)
Sonderby and Winther [24] modified their LSTM architecture for protein
secondary structure prediction by introducing a feed-forward network between
recurrent-hidden state as in Eq. 7. This approach for protein secondary structure pre-
diction mainly focuses on 8-class secondary structure [31] prediction which is more
informative than the traditional 3-class and 8-class secondary structure labels were
designed using the DSSP program [32]. The DSSP program classify each residue
into eight classes (C: Loops and irregular elements (corresponding to the blank char-
acters output by DSSP), E: β-strand, H: α-helix, B: β-bridge, G: 3_{10} helix, I:
π-helix, T: Turn, S: Bend). This model uses 3 layers that have 300 or 500 LSTM
units per layer. The FF network is implemented using a two layers ReLU activation
with similar number of units per layer. The output from bidirectional forward and
backward is connected to a vector that is forwarded through two ReLU activation
layers which have 300 or 500 hidden units. This approach achieved accuracy of
Application of Deep Architecture in Bioinformatics 173
67.40% [24], better than GSN approach [33] which achieved accuracy of 66.40%.
And LSTM network also perform much better than bidirectional RNN approach [34]
which got accuracy of 51.10%.
Generative Stochastic Network (GSN) [35] has been recently used to [36] learn gen-
erative data distribution models without stating any probabilistic graphical model.
Backpropagation is applied to train the GSN model [35, 37]. GSN can estimate the
data, generated by the transition operator of a Markov Chain rather than directly
parameterizing P(X) [38]. GSN trains a stochastic computational graph for recon-
structing the input X [39]. The primary advantage of a GSN is that the computational
graph may have latent states. This is similar to generative models like Deep Boltz-
mann Machine (DBM) [40]. The architecture is described below:
There are two inputs, i.e., a feature channel X and label channel y for applying
convolution GSN. Figure 2 shows the architecture of a convolutional GSN model. For
supervised convolutional GSN, the computational graph corrupts the label channels
and reconstructs the label channels. Feature map is given as input to the first hidden
layer to compute the activation function [33]. The convolutional GSN includes an
input layer and a convolutional layer. Computational graph in convolutional GSN
utilizes layer-wise sampling which is similar to DBM [40]. The convolutional GSN
layer in computation graph of convolutional GSN must have a convolutional layer but
the pooling layer is optional. Stacked convolutional layers can be used deeper archi-
tectures [36]. The convolutional GSN approach for PSS prediction mainly focuses on
8-state secondary structure [31] prediction which gives more structural information
than 3-state secondary structure. Unlike the 3-class SS, the 8-class can distinguish
between 3-helix and 4-helix. Therefore, it can be used describe different types of
loop regions. Position-specific scoring matrix (PSSM) is also used predicting the
secondary structure of protein [41]. PSSM is a matrix of size n × b where n is
the protein length and b is the number of amino acid types. PSSM matrix is gen-
erated using the UniRef90 data set. The generated PSSM matrix is used as input
for convolutional GSN model [33]. Score of the PSSM matrix is then transformed
into a range of 0–1 using sigmoid function [42]. The protein data set is generated
by PISCES Cull PDB server [43]. The data set consists of 6128 proteins which is
divided randomly into a training set which contained 5600 proteins and a validation,
n set of 256 proteins and test dataset contained 272 proteins [33]. 8-state secondary
structure labels are determined from the 3D protein data bank (PDB) structure by the
database of secondary structure assignments (DSSP) program [32]. The training data
contain both labels and features. To inject some noise into the input labels, half of the
input labels were randomly set to zeros. The Convolutional GSN is trained globally
by backpropagation [35]. Sigmoid activation is used in the visible layer while tanh
activation function used for all other layer. This Convolutional Generative Stochas-
tic Network approach for protein secondary structure prediction [33] achieved Q8
accuracy of 66.40%, better than CNF/Raptor-SS8 [44] which achieved Q8 accuracy
of 64.90%. The main disadvantage of this convolutional GSN approach is that the
convolutional structure is hard-coded, thus it some times may not capture the spatial
organization of the protein sequence.
A deep architecture applying CNN algorithm was utilized to implement a latent deep
learning system for predicting protein structure. This architecture has two levels.
Firstly, stacked sparse autoencoder approach was implemented to extract initially
protein features and then the screened data are utilized as input for latent CNN
architecture. Detail description of the levels is given below.
Stacked Sparse Autoencoder Approach to Extract Initial Protein Features An
autoencoder is an unsupervised feature extraction model. An autoencoder consists of
three layers of artificial neurons where the intermediate hidden layer has fewer nodes
than the input layer, while the output layer and the input layer has smae number of
nodes. The goal of an autoencoder is to replicate the input in the output. Since the
data is passed through a smaller number of intermediate nodes, the features are com-
pressed and the output is represented by only the most dominant features of the input.
This is how the important features are automatically extracted by the autoencoder.
A stacked autoencoder is made out of multiple consecutive autoencoders where the
extracted features of one layer is passed as the input to the succeeding autoencoder.
[45]. The sparse autoencoder is used to extract the initial level of protein features.
This, when used in conjunction with a CNN can enable us to get a better set of
features. In the architecture, sparse autoencoder works as a reprocessing and fea-
ture extraction unit. For preprocessing, the available protein dataset is separated into
two part, training data and validation data. Binary representation is mapped with the
sequence string. From the combination of 20 amino acid, one amino acid is coded
with 1 and 0 is set in all other positions. Twenty binary strings are needed where
Application of Deep Architecture in Bioinformatics 175
each string represent one amino acid. So the size of input data is 20 × M, where M
is number of amino acid in the chain [5]. The same procedure has been used in the
output. The α-helix is represented as [1 0 0] whereas β-sheet and the Loop are repre-
sented as [0 1 0] and [0 0 1] respectively [5]. The input data of 20 × M dimension is
fed into the autoencoder to detect the initial of features from the training data. Using
this feature, the softmax classifier is trained [46] to predict the secondary structure
[47].
Deep Learning Implemented Using Latent CNN Structure CNN based deep
architecture is motivated by animal visual cortex [48]. Al-Azzawi describes at [47]
that a latent deep learning architecture can be based on the stacked sparse autoencoder.
A CNN is based on neural networks. They are composed of layers or artificial neurons
which can learn shared weights and biases. CNN uses backpropagation algorithm to
train the network [49, 50]. Local receptive field or local filter scans the entire input
data. This local filter unit shares the same weights and biases. This means that all
the neurons in the initial hidden layer learn the same features [49, 50]. The feature
extraction is done by convolution of the input data with filter and by including a
bias term, and then passing the data through an activation function. CNN applies
learned filter to convolve the features map from the previous layer. The second
operation is pooling. The pooling layer performs subsampling to decrease the size
of the output. The max-pooling is a common method of subsampling that takes the
maximum value in a local window of the output. The entire map is divided into small,
equally sized regions and the maximum value from each region is taken [49, 50].
The final layer of connection is fully connected layer. The final layer of connection
is a fully connected layer. As mentioned before, the latent CNN structure is the
combination method of stacked sparse autoencoder and deep CNN. Al-Azzawi [47]
used SCRATCH protein dataset that contains the primary and secondary structures
with their three-class descriptions. The performance of PSS prediction system is
measured by the ratio between the number of correct predictions or true positives to
the total number of attempts [47]. By using stacked sparse autoencoder the training
accuracy achieved is 62.67% and testing accuracy achieved is 61.04% [47]. The
latent deep learning approach for protein secondary structure prediction system is
achieved by the accuracy of 90.31% using SCRATCH protein dataset [47]. While
the machine learning approach proposed by Chistophe is achieved by 84.51% [51].
To understand the molecular mechanism, protein function prediction is the key point.
Under the structure-function paradigm, the functional dependencies of proteins are
associated with structures of proteins at the cellular and subcellular level. Organism-
specific function prediction of the protein from the structure or biophysical proper-
ties is a machine learning based modeling problem. Following that, the interaction
176 S. Sen et al.
Protein function prediction methods are techniques that are used to define the biolog-
ical and biochemical role of proteins. DCNN, a high-performance model in machine
learning [49, 50] is introduced to design a predictive model for protein function
prediction Fig. 3. DCNN consists of convolutional and pooling layers. The depth
of each filter increases from the start to the end in the network. The last stage is
basically made of one or more fully connected layers. DCNN architecture can be
used to predict the function of the protein [52]. The protein function is associated
with the 3D structure of the protein. The binding site also influences the functions. A
domain in a protein is a structural motif which folds into a definite structure. CATH
is a hierarchical classification of the structure of protein domains [53]. SCOP was
introduced to provide details and a elaborate description of the structure and cor-
relations of the known protein structure [54]. For tertiary structure recognition of
protein, feature extraction is a vital step. One conventional method for identification
of the 3D structure of a protein is extracting the feature vector and then comparing
them by some distance measure [55]. But this distance based method may not give
similar structures of certain types and it is very sensitive. The 3D structure is based
Fig. 3 The DCNN architecture for tertiary protein structure prediction [52]
Application of Deep Architecture in Bioinformatics 177
Fig. 4 A schematic diagram to show the workflow of deep convolution neural network
178 S. Sen et al.
PPIs are biochemical events that involve two or more protein molecules. PPI plays
an important role in the functioning of the cell. It is important to identify PPI sites
at they show which amino acid residues contribute most to the protein–protein inter-
actions. This allows them to be potential drug targets too. Furthermore, they allow
us to gain insight into metabolic and signal transduction networks. Domains like
protein engineering, protein design, drug design, and other applications heavily rely
on the understanding of PPI. For correctly predicting the PPI, methods have been
designed to predict the biding sites of monomer protein [63]. There are mainly four
kinds of approaches including machine learning [64], template-based [65], corre-
lated mutations [66] and structural model [67]. The latter is widely utilized but it
has some limitation [68]. Nowadays, deep architecture has become one of the pop-
ular approaches to perform [69–71] PPI interface residue pair prediction [10]. More
precisely, LSTM is applied [15–17]. As mentioned before, LSTM is an RNN archi-
tecture that remembers values over arbitrary intervals. Unlike RNN, the summation
units are substituted by memory cells in LSTM. The memory cells can remember
previously seen information. The output layer which is used for RNN can also be
used for LSTM. The RNN consists of the input, the hidden, and the output layer.
The input propagates through these layers in order. This is the forward pass. There
are mainly two methods for regulating the weights in the neural networks, first is
real-time recurrent learning and the second is backpropagation. For backpropagation,
first, the partial derivative has to be calculated of the loss or the error function with
respect to the output of the network. To change the weights, the partial derivative of
the loss with respect to the weights are calculated. And finally, applying the chain
rule adjustment, the direction of the weights is observed. This procedure is called
a backward pass. The output depends on the cell state which is determined by run-
ning a sigmoid layer. Thereafter, the cell state is put into the activation layer with
an activation function tanh activation function to normalize the value from −1 to 1.
Multiplying the sigmoid value at the sigmoid gate, the final output is stored. This
LSTM model train, validate and test on the International Critical Assessment of Pro-
tein–protein Interaction Prediction [72, 73]. The method has achieved the accuracy
of 90% for prediction of protein–protein interaction interface residue pairs.
CNN is a powerful tool for solving the problem in computer vision. DCNN can
automatically learn mid-level and high-level abstraction which is acquired from raw
input data. Accurate disease diagnosis is heavily depending on both image acquisition
and image interpretation. In 1996, CNN was applied to medical image processing
for breast cancer detection [74].
Application of Deep Architecture in Bioinformatics 179
Magnetic Resonance Image (MRI) plays a crucial role in medical diagnosis, espe-
cially when diagnosing issues with the brain. Structural variation of the brain may
correspond to a symptom of many diseases. Medical Image Segmentation is the
process of automatic or semi-automatic recognition of boundaries within a 2D or
3D image. The high variability in such images is the biggest challenge. Not only is
there huge variation in the anatomy of different humans, but the different medical
imaging methods, such as CT, PET X-Ray, and so on have their own distinguishing
characteristics. MRI provides quite detailed imaging. MRI images, therefore, has
been used for implementing automatic segmentation. In modern medical research,
segmentation of brain MRI plays an important role. The seriousness of some dis-
ease or evaluation in the brain can be done by observing structural variation by
measuring volumes of the region of interest [75]. There are several segmentation
methods available, which are basically edge-based and contour-based [12]. How-
ever, it is quite challenging to achieve good accuracy using mentioned methods on
brain MRI segmentation. CNN is a suitable method because it can work with multi-
dimensional vectors. Therefore, both gray-scale and color images can be processed
using CNNs [76–78]. Conventional methods of brain MRI segmentation have some
limitation too. The conventional approach for brain MRI segmentation is very time
consuming and along with that, training data is a major problem in brain MRI seg-
mentation. To conquer these difficulties, brain MRI segmentation, [79] implemented
this using a patch-based CNN architecture. Cui et al. [79] used a public data set
CANDI neuroimaging access point for brain MRI segmentation using patch-based
CNN architecture. The dataset contains 103 MRI from four diagnostic group: bipolar
disorder with and without psychosis, schizophrenic spectrum and finally, a healthy
control [80]. In [79], Cui et al. extracts a few sets of MRI data where each data
set consists of 4 to 5 MRI. These images are divided into 256 × 256 to 32 × 32
and 13 × 13 patches. The training set has nearly a hundred thousand training image
patches. This method utilized CNN for pixel-based automatic segmentation of brain
MRI [81]. In image segmentation tasks, each image patch has a label. The labels of
these patches are used to create a new segmented MRI image. The proposed CNN
architecture achieved an accuracy of 90.83% [79]. It makes use of multiple 5 × 5
kernels. This proposed CNN architecture is compared with five different deep learn-
ing architecture, three CNN (CNN1, CNN2 and CNN3) and two artificial neural
networks (ANN1 and ANN2). The layered architecture of the first two CNN archi-
tecture CNN1 and CNN2 are identical to the proposed CNN, the only difference is,
CNN1 and CNN2 used fewer features map than proposed CNN. Input patch size
for both CNN1 and CNN2 is 32 × 32. The activation function is replaced by a sig-
moid function in the convolutional layer. The third CNN architecture CNN3 used
1313 input patch size. CNN3 contains 4 convolutional layers and a fully connected
layer. Max-pooling layer is not a part of CNN3. The structure of two different ANN
are: ANN1 is a 3 layer architecture and ANN2 is a 5 layer architecture. In ANN1,
180 S. Sen et al.
Table 1 A list of applied machine learning approaches for different biological problems and along
with their performance
Implementation on biological Applied machine learning approaches Accuracy (%)
issues
Protein secondary structure Latent CNN [47] 90.3126
prediction Machine learning and structural similarity [51] 84.51
LSTM [24] 67.4
GSN [33] 67.4
Stacked sparse auto encoder [47] 62.674
CNF/Raptor-SS8 [44] 64.9
RNN [34] 51.1
Protein function prediction Protein folding [59] 89
Deep CNN [52] 88
Graph Kernel [62] 84.04
Hierarchical classification [61] 80
Protein–protein interaction LSTM [10] 90
interface residue pair prediction
Brain MRI segmentation Proposed CNN((conv, pool) = 48, (conv, pool) 90
= 96, conv = 700, conv = 19, softmax = 19)
[79]
CNN2((conv, pool) = 20, (conv, pool) = 50, 90.83
conv = 500, conv = 19, softmax = 19) [79]
CNN1((conv, pool) = 20, (conv, pool) = 50, 90.81
conv = 500, conv = 19, softmax = 19) [79]
CNN3(conv = 40, conv = 160, conv = 500, 89.97
conv = 19, softmax = 19) [79]
ANN1(3layer(1024-150-10)) [79] 86.25
ANN2(5layer(1024-800-400-150-10)) [79] 74.94
Cell classification Deep CNN(Bloodcell-3size (973 × 799 × 33)) 93
[14]
Deep CNN(Bloodcell-2size(462 × 451 × 33)) 89.92
[14]
SVM(Bloodcell-2) [14] 63.11
SVM(Bloodcell-3) [14] 56.35
Alzheimers disease recognition Deep CNN [13] 96.8588
SVM [83] 84
Identifying metastatic breast Deep CNN [84] 98.4
cancer
Annotating the pathogenicity of DNN [85] 66.1
genetic variants
Classifying and segmenting DCNN [86] 72.3
microscopy images
Application of Deep Architecture in Bioinformatics 181
the first layer, the second layer, and third layer contain 1024, 150 and 10 neurons
respectively. And in the ANN2 first layer, the second layer, the third layer, fourth
layer, and fifth layer contain 1024, 800, 400, 150 and 10 neurons respectively. The
accuracies, achieved by this 5 different architectures CNN1, CNN2, CNN3, ANN1,
and ANN2, is 89.97%, 90.18%, 86.25%, 76.68%, and 74.94% [79] respectively. The
proposed CNN performs best because of a number of feature maps. Dice-ratio (DR)
[82] is also used to measure the segmentation accuracy. The larger value indicates a
higher segmentation accuracy. The propose CNN achieved DR of 95.19%. CNN1,
CNN2, and CNN3 achieved DR of 94.12%, 94.83%, 92.62% respectively [79]. The
proposed CNN can segment complex edge pixels successfully. However, there are
also some pixels which are wrongly classified (Table 1).
References
1. Pauling, L., Corey, R.B., Branson, H.R.: The structure of proteins: two hydrogen-bonded helical
configuration of the polypeptide chain. Proc Natl Acad Sci 37(4), 205–211 (1951)
2. Ivar, B.C.: Introduction to Protein Structure. Garland Publishing, New York (1999)
3. Patel, M., Shah, H.: Protein secondary prediction using support vector machine. In: International
Conference on Machine Intelligence and Research Advancement, pp. 594–598 (2013)
4. Chou, P.Y., Fasman, G.D.: Prediction of the secondary structure of proteins from their amino
acid sequence. Trends Biomed. Sci. 2, 128–131 (1977)
5. Hasic, H., Buza, E., Akagic, A.: A hybrid method for prediction of protein secondary structure
based on multiple artificial neural networks, pp. 1195–1200. MIPRO, Opatija (2017)
6. Cheng, J., Tegge, A.N., Baldi, P.: Machine learning method for protein structure prediction.
IEEE Rev. Biomed. Eng. 1, 41–49 (2008)
7. Andreopoulos, W., Labudde, D.: Protein-protein interaction networks. In: Protein Purification
and Analysis I: Methods and Applications. iConcept Press (2013)
8. Jaimovich, A.: Understanding protein-protein interaction network. Ph.D. Thesis. Hebrew Uni-
versity (2010)
9. Asai, K., Hayamizu, S., Handa, K.I.: Prediction of protein secondary structure by the hidden
Markov model. Bioinformatics 9(2), 141–146 (1993)
10. Zhao, Z., Gong, X.: Protein-protein interaction interface residue pair prediction based on deep
learning architecture, IEEE/ACM Trans. Comput. Biol. Bioinform. (2017)
11. Krizhevsky, A., Sutskever, I., Hinto, G.E.: Imagenet classification using deep convolutional
neural network. In: Advances in Neural Information Processing System, pp. 1097–1105 (2012)
12. Cireşan, D.C., et al.: Mitosis detection in breast cancer histology images with deep neural
networks. In: International Conference on Medical Image Computing and Computer-assisted
Intervention. Springer, Berlin, Heidelberg (2013)
13. Sarraf, S., Tofighi, G.: Deep learning-based pipeline to recognize alzheimers disease using
fMRI Data. In: IEEE, Future Technologies Conference, pp. 816–820, 2016
14. Li, X., Li, W., Xu, X., Hu, W.: Cell classification using convolutional neural networks in medical
hyperspectral imagery. In: 2nd International Conference on Image, Vision and Computing,
pp. 501–504 (2017)
15. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780
(1997)
16. Greff, K., Kumar Srivastava, R., Koutin, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a
search space Odyssey (2017). arXiv:1503.04069v1
17. Gers, F.A., Schraudolph, N.N., Schmidhuber, J.: Learning precise timing with LSTM recurrent
networks. J. Mach. Learn. Res. pp. 115–143 (2002)
182 S. Sen et al.
18. Svozil, D., Kvasnicka, V., Pospichal, J.: Introduction to multi-layer feed forward neural net-
work. Chemom. Intell. Lab. Syst. 39, 43–62 (1997)
19. Toh, K.-A., Lu, J., Yau, W.-Y.: Global feedforward neural network learning for classification
and regression. In: International Workshop on Energy Minimization Methods in Computer
Vision and Pattern Recognition, pp. 407–422 (2001)
20. Bishop, C.M.: Neural network for pattern recognition. Oxford University Press Inc., New York
(1995)
21. Schmidt, W.F., Kraaijveld, M.A., Duin, R.P.W.: Feed forward neural networks with ran-
dom weights. In: 11th IAPR International Conference on Conference B: Pattern Recognition
Methodology and Systems, Proceedings, vol. 2, pp. 1–4 (1992)
22. Graves, A.: Supervised Sequence Labelling with Recurrent Neural Networks. Springer, Berlin,
pp. 5–13 (2012)
23. Pascanu, R., Gulcehre, C., Cho, K., Bengio, Y.: How to construct deep recurrent neural networks
(2013). arXiv preprint arXiv:1312.6026
24. Sonderby, S.K., Winther, O.: Protein secondary structure prediction with long short term mem-
ory networks (2015). arXiv:1412.7828v2
25. Hochreiter, S., Heusel, M., Obermayer, K.: Fast model-based protein homology detection
without alignment. Bioinformatics 23(14), 1728–1736 (2007)
26. Min, S., Lee, B., Yoon, S.: Deep learning in bioinformatics. Brief. Bioinf. 18(5), 851–869
(2017)
27. Baldi, P., Brunak, S., Frasconi, P., Soda, G., Pollastri, G.: Exploiting the past and the future in
protein secondary structure prediction. Bioinformatics 15(11), 937–946 (1999)
28. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is
difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)
29. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Pro-
cess. 45(11), 2673–2681 (1997)
30. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and
other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)
31. Yaseen, A., Li, Y.: Template-based prediction of protein 8-state secondary structures. In:
IEEE 3rd International Conference on Computational Advances in Bio and Medical Sciences
(ICCABS), pp. 1–2 (2013)
32. Wolfgang, K., Christian, S.: Dictionary of protein secondary structure: pattern recognition of
hydrogen bond and geometrical features. Biopolymers 22(12), 2577–2637 (1983)
33. Zhou, J., Troyanskaya, O.G.: Deep supervised and convolutional generative stochastic network
for protein secondary structure prediction. In: Proceeding of the 31st International Conference
on Machine Learning, Beijing, China, JMLR: W&CP, vol. 32, pp. 745–753 (2014)
34. Pollastri, G., Przybylski, D., Rost, B., Baldi, P.: Improving the prediction of protein secondary
structure in three and eight classes using recurrent neural network and profiles, proteins: struc-
ture. Funct. Genet. 47(2), 228235 (2002)
35. Bengio, Y., Thibodeau-Laufer, E., Alain, G.: Deep generative stochastic networks trainable by
backprop. In: International Conference on Machine Learning, pp. 226–234 (2014)
36. Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
37. Du, C., Zhu, J., Zhang, B.: Learning deep generative models with doubly stochastic gradient
MCMC. IEEE Trans. Neural Netw. Learn. Syst. (2017)
38. Ozair, S., Yao, L., Bengio, Y.: Multimodal transitions for generative stochastic network. arXiV:
1312.5578v4 (2014)
39. Bengio, O., Yao, L., Alain, G., Vincent, P.: Generalized denoising auto-encoders as generative
models. In: Advances in Neural Information Processing Systems, pp. 899–907 (2013)
40. Salakhutdinov, R., Hinton, G.: Deep Boltzmann machines. Appearing in Proceedings of the
12th International Conference on Artificial Intelligence and Statistics (AISTATS), Clearwater
Beach, Florida, USA, vol. 5 of JMLR: W&CP 5 (2009)
41. Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices.
J. Mol. Biol. 292(2), 195–202 (1999)
Application of Deep Architecture in Bioinformatics 183
42. Jamel, T.M., Khammas, B.M.: Implementation of sigmoid activation function for neural net-
work using FPGA. In: 13th Scientific Conference of Al-Ma’moon University College (2012)
43. Wang, G., Dunbrack Jr., R.L.: PISCES: a protein sequence culling server. Bioinformatics 19,
1589–1591 (2003)
44. Wang, Z., Zhao, F., Peng, J., Xu, J.: Protein 8-class secondary structure prediction using con-
ditional neural fields. Proteomics 11(19), 3786–3792 (2011)
45. Ng, A.: Sparse Autoencoder. CS294A Lecture notes, vol. 72 (2011)
46. Ng, A.: Supervised learning. CS229 Lecture Notes, pp. 1–3 (2000)
47. Al-Azzawi, A.: Deep learning approach for secondary structure protein prediction based on
first level features extraction using a latent cnn structure. Int. J. Adv. Comput. Sci. Appl. 8(4),
5–12 (2017)
48. Hubel, D.H., Wiesel, T.N.: Receptive fields and functional architecture of monkey striate cortex.
J. Physiol. 195, 215–243 (1967)
49. LeCun, Y., Bengio, Y.: Convolutional Networks for Image, Speech and Time-Series. AT and
T Bell Laboratories, Dept Imformatique Recherche (1995)
50. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.:
Handwritten digit recognition with a back-propagation network. Adv. Neural Inf. Process. Syst.
396–404 (1990)
51. Magnan, C.N., Baldi, P.: Perfect prediction of protein secondary structure and relative solvent
accessibility. Mach. Learn. Struct. Similarity Bioinform. 30(18), 2592–2597 (2014)
52. Tavanaei, A., Maida, A.S., Kaniymattam, A., Loganantharaj, R.: Towards recognition of pro-
tein function based on its structure using deep convolutional network. In: IEEE International
Conference on Bioinformatics and Biomedicine (BIBM), pp. 145–149 (2016)
53. Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., Thornton, J.M.: CATH a
hierarchic classification of protein domain structures. Structure 5(8), 1093–1109 (1997)
54. Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of pro-
teins database for the investigation of sequences and structures. J. Mol. Biol. 247(4), 536–540
(1995)
55. Karim, R., Al-Aziz, M.M., Shatabda, S., Rahman, M.S., Mia, M.A.K., Zaman, F., Rakin, S.:
CoMOGrad and PHOG: from computer vision to fast and accurate protein tertiary structure
retrieval. Sci. Rep. 5, 1–11 (2015)
56. Pettersen, E.F., Goddard, T.D., Huang, C.C., Couch, G.S., Greenblatt, D.M., Meng, E.C., Ferrin,
T.F.: UCSF chimera a visualization system for exploratory research and analysis. J. Comput.
Chem. 25(13), 1605–1612 (2004)
57. Kraulis, P.K.: MOLSCRIPT: a program to produce both detail and semantic plots of protein
structures. J. Appl. Crystallogr. 24, 946–950 (1991)
58. Nooruddin, F., Turk, G.: Simplification and repair of polygonal models using volumetric tech-
niques. In: IEEE Trans. Vis. Comput. Graph. 9(2), 191–205 (2003)
59. Zakeri, P., Jeuris, B., Vandebril, R.: Protein fold recognition using geometric kernel data fusion.
Bioinformatics 30(13), 1850–1857 (2014)
60. Brylinski, M., Lingam, D.: eThread: a highly optimized machine learning based approach to
meta threading and the modeling of protein tertiary structure. PLoS One 7(11), e50200 (2012)
61. Lin, C., Zou, Y., Qin, J., Jiang, Y., Ke, C., Zou, Q.: Hierarchical classification of protein folds
using a novel ensemble classifier. PLoS One 8(2), e56499 (2013)
62. Borgwardt, K.M., Ong, C.S., Schonauer, S., Vishwanathan, S.V.N., Smola, A.J., Kriegel, H.-P.:
Protein function prediction via graph kernels. Bioinformatics 21, i47–i56 (2005)
63. Giard, J., Ambroise, J., Gala, L.J.: Regression applied to protein binding site prediction and
comparison with classication. BMC Bioinform. 10(1), 1–12 (2009)
64. Cheng, J., Baldi, P.: Improved residue contact prediction using support vector machines and a
large feature set. BMC Bioinform. 8(2), 1–9 (2007)
65. Ohue, M., Matsuzaki, Y., Shimoda, T.: Highly precise protein-protein interaction prediction
based on consensus between template-based and de novo docking methods. BMC Proc. 7(7),
S6 (2013)
184 S. Sen et al.
66. Gobel, U., Sander, C., Schneider, R.: Correlated mutations and residue contacts in proteins.
BMC Proc. 7(7), S6 (2013)
67. Singh, R., Park, D., Xu, J., Hosur, R., Berger, B.: Struct2Net: a web service to predict pro-
tein–protein interactions using structure based approach. Nucleic Acids Res. 38(2), 508–515
(2010)
68. Moult, J.B., Fidelis, K., Rost, B.: Critical assessment of methods of protein structure prediction,
CASP, Round 6. Proteins (2010)
69. Lena, D.P., Nagata, K., Baldi, P.: Deep architectures for protein contact map prediction. Bioin-
formatics 28(19), 2449–2457 (2012)
70. Larochelle, H., Bengio, Y., Louradour, J.: Exploring strategies for training deep neural net-
works. J. Mach. Learn. Res. 1–40 (2009)
71. Alessandro, L., Gianluca, P., Pierre, B.: Deep architectures and deep learning in chemoinfor-
matics: the prediction of aqueous solubility for drug-like molecules. J. Chem. Inform. Model.
53(7), 1563–1575 (2013)
72. Vreven, T., Moal, H.I., Vangone, A.: Updates to the integrated protein–protein interaction
benchmarks: docking benchmark version 5 and affinity benchmark version 2. J. Mol. Biol.
427(19), 3031–3041 (2015)
73. Janin, J., Henrick, K., Moult, J.: Assessment of predicted interactions. CAPRI: a critical assess-
ment of predicted interactions. Proteins Struct. Funct. Bioinform. 52(1), 2–9 (2003)
74. Sahiner, B.: Classification of mass and normal breast tissue: a convolution neural network
classifier with spatial domain and texture images. Proteins Struct. Funct. IEEE Trans. Med.
Imag. 15(5), 598610 (1996)
75. Shaun, P.: Brain MRI Segmentation, Computational Surgery and Dual Training, pp. 45–73.
Springer, US (2010)
76. Yue-Hei Ng, J., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici,
G.: Beyond short snippets: deep networks for video classification. In: IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), pp. 4694–4702 (2015)
77. Ye, H., Wu, Z., Zhao, R.-W., Wang, X., Jiang, Y.-G., Xue, X.: Evaluating two-stream CNN for
video classification. In: Proceedings of the 5th ACM on International Conference on Multime-
dia Retrieval, pp. 435–442 (2015)
78. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Largescale video
classification with convolutional neural networks. In: Proceedings of the IEEE Conference on
International Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)
79. Cui, Z., Yang, J., Qiao, Y.: Brain MRI segmentation with patch-based CNN approach. In:
Proceedings of the 35th Chinese Control Conference, pp. 27–29 (2016)
80. Kennedy, N.D., Haselgrove, C., Hodge, M.S.: CANDIShare: a resource for pediatric neu-
roimaging data. Neuroinformatics 10(3), 319–322 (2012)
81. Leena Silvoster, M., Govindan, V.K.: Convolutional neural network based segmentation. In:
Computer Networks and Intelligent Computing: 5th International Conference on Information
Processing, ICIP, vol 157, pp. 190 (2011)
82. Zhang, W., Li, R., Deng, H., Wenlu, L., Lin, W., Ji, S., Shen, D.: Deep convolutional neural
networks for multi-modality isointense infant brain image segmentation. NeuroImage 214–224
(2015)
83. Tripoliti, E.E., Fotiadis, D.I., Argyropoulou, M.: A supervised method to assist the diagnosis
and classification of the status of alzheimers disease using data from an FMRI experiment.
In: Engineering in Medicine and Biology Society. EMBS 2008. 30th Annual International
Conference of the IEEE, pp. 4419–4422 (2008)
84. Wang, D., Khosla, A., Gargeya, R., Irshad, H., Beck, A.H.: Deep learning for identifying
metastatic breast cancer (2016). arXiv preprint arXiv:1606.05718
85. Quang, D., Chen, Y., Xie, X.: DANN: a deep learning approach for annotating the pathogenicity
of genetic variants. Bioinformatics 31(5), 761–763 (2014)
86. Kraus, O.Z., Grys, B.T., Ba, J., et al.: Automated analysis of high-content microscopy data
with deep learning. Mol. Syst. Biol. 13(924 (2017)
Application of Deep Architecture in Bioinformatics 185
Abstract Sensor-based health data collection, remote access to health data to render
real-time advice have been the key advantages of smart and remote healthcare. Such
health monitoring and support are getting immensely popular among both patients
and doctors as it does not require physical movement which is always not possible for
elderly people who lives mostly alone in current socio-economic situations. Health-
care Informatics plays a key role in such circumstances. The huge amount of raw
data emanating from sensors needs to be processed applying machine learning and
deep learning algorithms for useful information extraction to develop an intelligent
knowledge base for providing an appropriate solution as and when required. The
real challenge lies in data storage and retrieval preserving security, privacy, relia-
bility and availability requirements. Health data saved in Electronic medical record
(EMR) is generally saved in a client-server database where central coordinator does
access control like create, access, update, or delete of health records. But in smart
and remote healthcare supported by enabling technologies such as Sensors, Internet
of Things (IoT), Cloud, Deep learning, Big data, etc. EMR needs to be accessed in a
distributed manner among multiple stakeholders involved such as hospitals, doctors,
research labs, patients’ relatives, insurance provider, etc. Hence, it is to be ensured
that health data be protected from unauthorized access specifically to maintain data
integrity using advanced distributed security techniques such as blockchain.
1 Introduction
Smart and Remote Healthcare for elderly care [1] and patient’s monitoring are get-
ting increasingly popular among researchers due to its applicability and acceptance
in today’s socio-economic scenario where the average lifetime of human being has
increased leading to live with age related ailments and without personalized care.
Electronic Medical Record (EMR) [2] has traditionally been saved in distributed
databases which mostly follow client-server architecture. There is a central con-
trol called administrator to supervise or manage permission of end-users to create,
access, update, or delete health records. Health data sensed by the sensors are prone
to security attacks and vulnerabilities [3]. In the sensing unit, several tiny sensors
wearable, implanted or ambient, etc. acquire data. These devices are prone to dam-
age by fall while manual handling leading to loss of data or erroneous data, also
devices may be compromised by the adversary for stealing or tampering data, may
be replaced by illegitimate one, etc. In the communication unit, sensor data travel
through heterogeneous communication links such as short-range communication
links e.g. Bluetooth, Wi-Fi, Zigbee, WiMax having varying link quality, security
measures, etc. In storage and processing unit, data gets stored in cloud servers for
further access, processing, knowledge building, feedback or advice generation, etc.
[4]. The first phase of this process comprises sensing of data, the transmission of
data and saving the data in the cloud. The second phase comprises access of data
from the cloud, analyze or modify data, update or delete data at cloud by multiple
stakeholders in healthcare (Fig. 1). Now, for healthcare data, privacy and integrity are
two important properties to be ensured so that people do not worry about revealing
their sensitive data through unauthorized access. Integrity of health data is impor-
tant as advice generation is inaccurate if it is based on incorrect data. Underlying
security measures and principles of IoT and cloud enabled healthcare framework
helps to avoid additional computational complexity resulting in more resource con-
sumption due to implementation of encryption algorithms separately. As there are
multi-parties involved in health data access, if data needs to be encrypted at sender
and then decrypted at corresponding receiver then it would increase latency which
is not desirable in time critical application like healthcare. Moreover, devices in
IoT enabled smart healthcare systems are heterogeneous having varying resource
level e.g., sensors to Smartphone/tablet/laptop to high-end workstation/server. So,
unilateral encryption-decryption algorithms like Data Encryption Standard (DES),
Advanced Encryption Standard (AES) or Rivest-Shamir-Adleman (RSA) cannot be
applied at all levels of smart healthcare architecture. Thus, direct confidentiality may
have an implementation issue but confidentiality may be ensured by implement-
ing authentication and integrity. In blockchain [5], there is a decentralized network
where stakeholders (hospitals, doctors, research labs, and insurance provider) are
connected to each other called as blockchain nodes. Health Sensors collect data and
send to Personal Digital Assistant (PDA) which then forwards data to the blockchain
network through access points. Data forwarded to the blockchain network in one
session is called a block. Hash of the previous data is bound with the new data so
Intelligent, Secure Big Health Data Management … 189
Fig. 1 The traditional 3-tier architecture of wireless body area network (WBAN) based healthcare
[8]
that the blockchain network can validate the new block of data, once validated hash
of the data is stored in the blockchain nodes and the health information are stored in
the cloud database in encrypted format.
Big health data [6] stored in cloud Database requires further analysis using
Machine learning techniques for knowledge extraction. Deep Learning techniques
are now widely used in healthcare; some of the popular applications include early
disease detection, DNA (Deoxyribonucleic Acid) analysis, prediction of new drug
effectiveness, personalized treatments, etc. One of the big challenges of using deep
learning techniques in health informatics is the need for a huge amount of labeled
data. But EMR may contain different unlabeled data, for example, X-ray images
without any medical conditions like cancer or fibrosis. In such cases, unsupervised
learning techniques can be used for labeling of the data using data mining. For labeled
data, supervised learning can be used. For a combination of labeled and unlabeled
data, semi-supervised learning is to be applied. Convolution Neural Network (CNN)
is highly impacted Deep learning technique among others like Deep Neural Net-
work (DNN), Deep Auto encoder, Deep Belief Network (DBN), Recurrent Neural
Network (RNN) as health data ar pre-dominantly image-based nowadays. DNNs in
real-time applications such as healthcare have successfully been implemented with
parallelism support of Graphical Processing Units (GPUs) [7].
2 Related Works
This section describes some of the works related to big health data and issues related
to intelligent processing of them using deep learning techniques. Also, the security
190 S. Saif et al.
3 Preliminaries
In this section, we have discussed about Internet of Things (IoT), Bigdata, various
Deep Learning techniques, and blockchain technology briefly. Then, the proposed
architecture has been discussed in details.
The concept behind IoT is to connect the internet with humans that can be achieved
through connecting machines and other physical things with internet [30]. This tech-
nology is rapidly growing and adopted in healthcare. Usage of IoT based technologies
has helped physicians and patients a lot. For example, a patient can take advice from
doctors without physically visiting clinics or patients who need real-time monitor-
ing, do not need to visit hospitals. Using biological sensors and internet, doctors
can observe the physiological parameters of patients. Wireless body area network
(WBAN) is one of the core technologies to support remote healthcare. It basically
consists of some battery-powered lightweight wireless sensors that can wearable
and implantable. These sensors are connected with an access point using short-range
communication and that access point forwards the data to a medical facility such as
clinic, hospital. These IoT systems produce massive data which can be qualified as
“Big Data”. These data need to be handled in a secured and efficient way so that it
can be accessed by all stakeholders.
Big Data is a large dataset, which may contain data in a structured, unstructured and
semi-structured format. Structured data are basically stored in different databases
or in spreadsheets in a tabular format. Image, video, audio belong to unstructured
category and these data are very difficult to be analyzed. Semi-structured data do
not follow any strict standard, such as XML. These data can be used in emerging
applications such as clinical decision support, disease prediction, etc. through various
Machine Learning technologies. Healthcare sector produces a huge amount of data
such as sensor data, previous health records, drug records. This enormous data are
difficult to manage using traditional software or hardware systems. Use of cloud
platform reduces the cost for efficient storing and sharing.
194 S. Saif et al.
Various deep learning techniques are available; we have to choose to wisely the best
technique for a specific problem. Table 3 shows some popular methods which have
been used in health informatics.
Table 3 (continued)
Description Advantage/disadvantage Architecture
Recurrent neural network Advantage
• It has the ability to • It can store sequential
analyze streaming type events in the form of
data. Useful for the activations if feedback
applications where output connection is present
is dependent on previous Disadvantage
inputs • Training can be difficult if
• Each hidden layer has its tanh and rely activation
own weight and biases function is used
Convolutional Neural Network (CNN) is one of the most popular deep learning
methods which are inspired by human visual cortex. It is a kind of feed-forward
network that consists of many layers also is a collection of interleaved feed-forward
layers having convolutional filters. When input data are passed through the layers,
high-level features are extracted in each layer. This technique is highly helpful in the
era of medical imaging. For example, tumors can be classified from the irregularities
in tissue morphology. CNN can be applied to read pattern which is a difficult task
by human experts. For example, early stages of many diseases can be detected from
tissue samples.
Recurrent Neural Networks (RNN) is another useful technique for healthcare because
it supports streaming data and which can be analyzed further. Fixed-size of input vec-
tors are used here also data such as speech, text or DNA sequences can be provided
as input where output depends on previous input. In the architecture of RNN per-
ceptrons are interconnected with themselves, which act as a memory for consecutive
inputs. For healthcare scenario, RNN can be applied for the analysis of medical text
like anamnesis. For instance, a pool of patient has the same disease with different
symptoms. RNN has the ability to scan a set of text files to find the similarities; this
can help a physician for diagnosing an illness.
Intelligent, Secure Big Health Data Management … 197
Deep Autoencoders
Recent studies show that there is no universal set of features which works accurately
on various datasets. Feature extraction using data-driven learning method is more
accurate. So, Autoencoder Neural Network is introduced. In this case, the same
number of input and output is used so that the input vectors can be recreated instead
of assigning a class label. This is an unsupervised technique. Typically, the hidden
layer is less than the input/output layers. To extract the relevant features, it encodes
the data in lower-dimensional space, but if the input data has higher dimensionality
then a single hidden layer is not sufficient.
Deep Boltzmann Machine
Machine Learning (ML) has various successful applications in the area of health
informatics whereas Deep Learning (DL) techniques are more recent and its adoption
is slow. However, DL has rapid progress and results can be promising in spite of the
challenges. We can divide medical applications of DL in three categories.
Intelligent, Secure Big Health Data Management … 199
• Predictive healthcare, e.g., the efficiency of treatment prediction for various dis-
eases.
• Medical Decision Support, e.g., using physiological information of the patient
various disease can be detected and diagnosed.
• Personalized treatments, e.g., personalized drugs can be designed as per the need
of individual patients.
Predictive Healthcare
Personalized Treatments
the data. Personalized treatments can be offered based on various data. For example
biomarkers can be determined by DNA analysis and genome mining. Biomarkers are
nothing but a biological state (disease) indicator which can be measured. Every dis-
ease is developed in the human body itself. Biomarkers can determine this probability
of development and that can help the medical experts to provide better prediction
and diagnoses. Genomics helps to identify gene allele which is responsible for the
development of an illness. Drug effectiveness can be determined by evaluations the
differences in genes when the drug is applied, this is called Pharmacogenomics. This
helps to reduce the dosage levels as well as the side effects of the drug. Deep Learn-
ing techniques perform very well in cancer classification from gene expression data.
For example, to predict splicing pattern, features extraction from Ribonucleic Acid
(RNA) and Micro ribonucleic Acid (miRNA) data can be efficiently done using DL.
So, DL can help us to analyze data from EMR and can offer personalized medicines.
Challenges
There are many challenges of Deep Learning in the domain of health data. Depending
on the nature of medicines there is a requirement of security, availability, reliability,
efficiency. For example, a health sensor must work continuously without any inter-
ruption, so that emergencies can be handled. Some recent works show that weight
filters can be is used in CNN for extraction of high-level features but the entire learn-
ing module may become non-interpretable. Most of the researches use DL techniques
without knowing the possibility of success; if misclassification problems occur then
they do not have the ability to modify. We have discussed in the previous sections
that large datasets are required for effective and reliable training model. Nowadays
enormous healthcare data is available but disease-specific data is still limited. So
DL is not suited for applications involving rare diseases. Another common issue
in training of Deep Neural Network is overfitting problem when the small training
dataset is used. This happens when the total number of samples in the training set
is proportional to the number of parameters in that network. Overfitting problem
can be avoided by exploiting regularization techniques such as dropout during the
training process. DNN does not support raw data directly as input data; so, some
preprocessing is needed or the input domain needs to be changed. Hyper parameters
which control the architecture of a DNN, for example, the number of filters in CNN,
is a blind exploration process and accurate validation is very much required. Finding
an optimal set of hyper parameters and correct preprocessing of raw data is a chal-
lenging task and this can lead to the long training process. Another important issue in
DL is that many DNNs can be fooled easily; if the minor change is done in input data
(adding imperceptible noise in an image) then the samples will be misclassified [41].
It can be noted that most of the machine learning algorithms can be affected by this
issue. If the value of a particular feature is set very high or very low, misclassification
problem will surely arise in logistic regression. In decision trees, if a single binary
feature is switched in the final layer, then it will product incorrect results. So, we can
say that any machine learning technique is vulnerable for security attacks also, as a
simple alteration will lead the system to produce wrong results.
Intelligent, Secure Big Health Data Management … 201
In general, there are various types of blockchains which can depend on managed
data, availability of the data and actions performed by a user. We can categorize in
there
• Public Blockchain (permissionless)
• Consortium (public permissioned)
• Private Blockchain.
From the types of blockchain, it is clear that blockchains which are accessible
and visible to the public are public blockchain. However, the entire data may not be
accessible by the public, since some part of the data can be in an encrypted format
to keep participants anonymity [43]. In public blockchains, anyone can join the
blockchain and act as a node, or can become a miner; hence, approvals are required.
Cryptocurrency networks come in this category where a miner gains some economic
incentive. For instance, Bitcoin, Ethereum, Litecoin are cryptocurrency networks
based on the public blockchain.
In Consortium type of blockchains, only selected nodes are allowed to participate
in the distributed consensus process [43]. Any kind of industry can use this kind of
blockchain. Sometime consortium blockchains are developed for a particular industry
(e.g., healthcare sector), but open for public use based on approval.
Private blockchains are decentralized network [43] where only permissioned
nodes can join the network. The task of the nodes such as, to perform transactions,
to execute smart contracts or to act as a miner, is controlled in private blockchains.
Basically, a trusted organization manages the blockchain. Platforms like Ripple [44],
Hyperledger Fabric [45] only support private blockchain network.
To share data among different healthcare providers in a fast and effective way is a
challenging task. Due to non-collaboration and lack of coordination, it becomes a
barrier for effective data sharing [27]. Patients and other stakeholders of healthcare
may face problems in data sharing and retrieval process.
Management of large health records and sharing it over the healthcare providers is
not an easy task while integrating blockchain. Since health data is sensitive in nature,
Intelligent, Secure Big Health Data Management … 203
it should be shared only with trusted parties. It must be ensured that an unauthorized
entity does not get access to the data. National regulations and privacy of data must
be adhered to adopt blockchain in healthcare.
One of the big concerns in adopting blockchain in healthcare is delivery time, data
must be delivered within the required time. Patients’ lives can be in danger if the
required data is not delivered on time. Since blockchain architecture is complex in
nature, incorporating blockchain may create computational delays. A lot of research
needs be done in order to reduce delay and maintain QoS in terms of reliability before
incorporating blockchain in healthcare.
Biological sensors are important parts of healthcare and these sensor devices gen-
erate various kind of traffic. In general, data traffic is classified into two categories,
emergency traffic, and normal traffic. Traffic generated from the data gathered from
patients in an emergency situation is emergency traffic and the data gathered by
sensors in regular monitoring are known as normal traffic. So, while implementing
blockchain in healthcare, a priority mechanism is very much required, so that the
emergency traffic experiences a minimum delay compared to regular traffic.
Latency
Since data generated by health sensors is huge in volume, the nodes of the blockchain
should be capable to store these huge data. Health data may consist of medical images,
laboratory reports, drug history records, all these require a large amount of storage
space. This issue could be solved if cloud storage platforms are used.
204 S. Saif et al.
Security
Another important issue for incorporating blockchain in healthcare is the reliability of
data gathered. Although blockchain is popular because data stored in the blockchain
is immutable, sometimes data that come from the sensors may be corrupted; so,
the data will remain corrupt. Data received in the blockchain nodes may be altered
and it might be possible because of different security attacks like fake data injection,
eavesdropping, etc. So, an effective security mechanism must be taken care to ensure
the integrity of the data.
Data Mining
Blockchain is based on validation of data block; each data that comes from sensors
is considered as a block of data and data sent from the sensors each time needs to be
validated before adding to the chain. So, the problem will arise when the number of
patients is increased; in that case, it will take more time for time for mining because
the computation load will increase. So, efficient mining is also a very challenging
issue while integrating blockchain in healthcare.
4 System Model
A typical IoT based health care system consists of three layers, the first layer consists
of different health sensors like, ECG, Pulse, Blood pressure, etc. Usually, these
sensors are placed on the body of a patient. They are responsible for sensing different
physiological parameters from the patient body, and then this information is sent to
the PDA device. In the second layer, PDA device forwards the data to the medical
server through an internet connection and in the third layer doctor/medical facility get
access of the data. But data in transmit is vulnerable for various cyber-attacks. So, we
must need to adopt like confidentiality, authentication, integrity, access control. These
four parameters are well-known security requirements for health care applications
[47].
Attack Model
Traditional IoT based applications mainly faces two types of attacks: attack against
confidentiality and attack against integrity. Confidentiality means non-disclosure of
private information of patient which is prone to different threats. Some common
security attacks on confidentiality are Eavesdropping, impersonation attack, side-
channel attack, packet sniffing, etc. Therefore, it is very important to handle security
attacks against confidentiality. Integrity ensures the intactness of data during commu-
nications. Nowadays, IoT based biological sensors gather physiological information
from a patient and that information is sent to medical facilities since these data
are sent through some insecure wired/wireless links, it is easy for an adversary to
physically/remotely capture the forwarding device and manipulate the information
gathered by sensors. As a result, it may lead to the wrong diagnosis. Some of the com-
mon attacks against integrity are data modification attack, fake data injection attack,
Intelligent, Secure Big Health Data Management … 205
replay attack, etc. In our proposed framework we have considered blockchain, which
can defend attacks against integrity due it its nature of working and to handle attacks
against confidentiality various cryptographic schemes can be used for encryption of
data at forwarder device and data can be decrypted using the secret key of medical
service providers.
Proposed Architecture
Here, we propose a secure and smart framework to share the data with different
medical facilities in an effective manner. The overall blockchain-based architecture
is shown in Fig. 4 where cloud storage is used to store electronic medical record
(EMR).
In our proposed framework data gathered through sensors are first sent to a PDA
device; this device will generate the hash of health data using standard Hash algo-
rithms and after that, the Hash will be forwarded to a private blockchain network
through the internet. In the blockchain each medical facility like hospitals, labs,
insurance companies, research labs, etc. will act as blockchain node. Hash sent from
PDA device will be received by each node and that data block needs to be validated
and verified by nodes. Verification is done based on the received hash and that hash is
compared with the hash of the previously received data block. It is possible because
the data block generated by the PDA device also contains the previous block hash.
Majority of the blockchain nodes needs to verify the block. Once verified, the block
is added to the chain and a unique secret key and an identifier (ID number) are gen-
erated. The key and ID is sent back to the PDA device. The PDA device encrypts
the actual health data using the key and the encrypted data, hash of the health data,
ID is sent to the cloud-based database server. If someone tries to tamper the data
of one block, then the next blocks are also affected. Whenever any medical facility
needs to access health data stored in the cloud, first the data is identified through the
ID and then decryption is done using the secret key. Once the decryption is done,
health data becomes available. It is given as input in the various Deep Learning-based
healthcare applications. We have discussed earlier the various applications of Deep
Learning techniques for various healthcare applications like Predictive healthcare,
Medical Decision Support, Personalized treatments, etc. The main advantage of the
proposed architecture is the data sharing among the different medical facilities in a
secure manner and the data can be used for various healthcare applications. In our
architecture, security requirements are maintained. Since private blockchain is used,
data is stored/accessed by authenticated users only. Cryptographic algorithms help
to maintain the confidentiality of the data. Integrity is maintained due to the working
nature of the blockchain and access control is based on the secret key. Since the secret
key is generated by the blockchain nodes, only they have permission to decrypt the
data.
In this paper, we have described the role of blockchain and deep learning in health
informatics. Both these two emerging technologies face some challenges, which is a
research area, proper research can mitigate these issues. As discussed earlier unilat-
eral cryptographic algorithms like DES, AES, 3-DES are not a good choice to apply
in a blockchain for healthcare applications, applying these algorithms will increase
the latency in terms of data sharing. So low complexity encryption-decryption algo-
rithm design is an important research area. Key generation and key sharing should be
done in an efficient way so that it does not increase the complexity of a blockchain-
based health data-sharing platform. Moreover, as the number of stakeholders may go
on increasing in IoT enabled smart and remote healthcare, communication overhead
issues, storage overhead issues to be taken care of while designing a blockchain-
based secure healthcare system. Health data stored in EMR are largely unlabeled,
missing data, noisy data, etc. So researchers should consider the reconstruction of
data from the missing data, data filtration is needed to remove the noises. Also, health
data is big data due to the large sample size and volume of data. Budding researchers
can explore preparing own database based on their own research context besides
using existing benchmark database considering demography, geographical location,
concerned disease, etc. of target subjects to achieve more realistic intelligent data
processing results. There are many research issues in this field, proper exploration
is needed to adopt deep learning and blockchain technology in health informatics
inefficient way.
6 Conclusion
The present era is the era of smart and remote applications in various areas, health-
care in specific, where multiple stakeholders are involved related to big health data
which need to be acquired, stored, retrieved in a distributed manner using security
techniques such as blockchain and processed intelligently by applying deep learning
techniques. Issues and challenges remain in applying deep learning techniques as a
medical health record are always not complete, maybe erroneous and not be labeled.
Intelligent, Secure Big Health Data Management … 207
Also, as health record is huge in size and multi parties are involved, to execute all
steps of blockchain method may lead to additional storage overhead, communication
overhead, and latency to process a submitted request to access data thus making IoT
enabled real-time healthcare support unrealistic. This book chapter discusses all rel-
evant deep learning algorithms, and tools, presents basic and fundamental concepts
related to big data, healthcare, security, IoT, etc., and illustrates the blockchain-based
architecture and defines attack model for a complete view and exploration for the
researchers in this domain.
Acknowledgements This work has been carried out as a part of sanctioned research project from
Government of West Bengal, Department of Science & Technology and Biotechnology, project
sanction no. 230(Sanc)/ST/P/S&T/6G-14/2018.
References
1. Majumder, S., Aghayi, E., Noferesti, M., Memarzadeh-Tehran, H., Mondal, T., Pang, Z., Deen,
M.J.: Smart homes for elderly healthcare—Recent advances and research challenges. Sensors
17, 2496 (2017)
2. Bahga, A., Madisetti, V.K.: Healthcare data integration and informatics in the cloud. Computer
48(2), 50–57 (2015)
3. Movassaghi, S., Abolhasan, M., Lipman, J., Smith, D., Jamalipour, A.: Wireless body area
networks: a survey. IEEECommun. Surv. Tutor. 1–29 (2013)
4. Zhang, Y., Qiu, M., Tsai, C., Hassan, M.M., Alamri, A.: Health-CPS: healthcare cyber-physical
system assisted by cloud and big data. IEEE Syst. J. 11(1), 88–95 (2017)
5. Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system (2008)
6. Andreu-Perez, J., Poon, C.C.Y., Merrifield, R.D., Wong, S.T.C., Yang, G.: Big data for health.
IEEE J. Biomed. Health Inf. 19(4), 1193–1208 (2015)
7. Ravi, D., Wong, C., Deligianni, F., Berthelot, M., Andreu-Perez, J., Lo, B., Yang, G.: Deep
learning for health informatics. IEEE J. Biomed. Health Inf. 21(1), 2–41 (2017)
8. Karmakar, K., Saif, S., Biswas, S., Neogy, S.: WBAN security: study and implementation
of a biological key based framework. Inb: 2018 Fifth International Conference on Emerging
Applications of Information Technology (EAIT), pp. 1–6 (2018)
9. Xia, Q., Sifah, E.B., Asamoah, K.O., Gao, J., Du, X., Guizani, M.: MeDShare: trust-less medical
data sharing among cloud service providers via blockchain. IEEE Access 5, 14757–14767
(2017)
10. Hölbl, M., Kompara, M., Kamišalić, A., NemecZlatolas, L.: A systematic review of the use of
blockchain in healthcare. Symmetry 10, 470 (2018)
11. Shen, B., Guo, J., Yang, Y.: MedChain: efficient healthcare data sharing via blockchain. Appl.
Sci. 9, 1207 (2019)
12. Faust, O., Hagiwara, Y., Hong, T.J., Lih, O.S., Rajendra Acharya, U.: Deep learning for health-
care applications based on physiological signals: a review. Comput. Methods Progr. Biomed.
161, 1–13 (2018)
13. Griggs, K.N., Ossipova, O., Kohlios, C.P., et al.: Healthcare blockchain system using smart
contracts for secure automated remote patient monitoring. J. Med. Syst. 42, 130 (2018)
14. Chen, M., Hao, Y., Hwang, K., Wang, L., Wang, L.: Disease prediction by machine learning
over big data from healthcare communities. IEEE Access 5, 8869–8879 (2017)
15. Mozaffari-Kermani, M., Sur-Kolay, S., Raghunathan, A., Jha, N.K.: Systematic poisoning
attacks on and defenses for machine learning in healthcare. IEEE J. Biomed. Health Inf. 19(6),
1893–1905 (2015)
208 S. Saif et al.
16. Sun, W., Zheng, B., Qian, W.: Computer aided lung cancer diagnosis with deep learning algo-
rithms. In: Proceedings of SPIE 9785, Medical Imaging 2016: Computer-Aided Diagnosis,
97850Z, 24 Mar 2016
17. Esteva, A., Kuprel, B., Novoa, R.A., Ko, J., Swetter, S.M., Blau, H.M., Thrun, S.:
Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639),
115–118 (2017)
18. Abdel-Zaher, A.M., Eldeib, A.M.: Breast cancer classification using deep belief networks.
Expert Syst. Appl. 46, 139–144 (2016)
19. Fakoor, R., Ladhak, F., Nazi, A., Huber, M.: Using deep learning to enhance cancer diagnosis
and classification. In: Proceedings of the ICML Workshop on the Role of Machine Learning
in Transforming Healthcare, June 2013
20. Ramsundar, B., Kearnes, S., Riley, P., Webster, D., Konerding, D., Pande, V.: Massively mul-
titask networks for drug discovery. arXiv preprint arXiv:1502.02072 (2015)
21. Li, R., Zhang, W., Suk, H., Wang, L.: Deep learning based imaging data completion for
improved brain disease diagnosis. In: Proceedings of MICCAI 2014, pp. 305–312, Sept 2014
22. Mohsen, H., El-Dahshan, E.S.A., El-Horbaty, E.S.M., Salem, A.: Classification using deep
learning neural networks for brain tumors. Fut. Comput. Inf. J. 3(1), 68–71 (2018)
23. Amin, J., Sharif, M., Yasmin, M., Fernandes, S.: Big data analysis for brain tumor detection:
deep convolutional neural networks. Fut. Gener. Comput. Syst. 87, 290–297 (2018)
24. Bar, Y., Diamant, I., Wolf, L., Lieberman, S., Konen, E., Greenspan, H.: Chest pathology
detection using deep learning with non-medical training. In: 2015 IEEE 12th International
Symposium on Biomedical Imaging (ISBI), New York, pp. 294–297 (2015)
25. Ronao, C.A., Cho, S.B.: Human activity recognition with smartphone sensors using deep
learning neural networks. Expert Syst. Appl. 59, 235–244 (2016)
26. Zhang, P., White, J., Schmidt, D.C., Lenz, G., Rosenbloom, S.T.: FHIRChain: applying
blockchain to securely and scalably share clinical data. Comput. Struct. Biotechnol. J. 16,
267–278 (2018)
27. Azaria, A., Ekblaw, A., Vieira, T., Lippman, A.: MedRec: using blockchain for medical data
access and permission management. In: 2016 2nd International Conference on Open and Big
Data (OBD), Vienna, pp. 25–30 (2016)
28. Peterson, K., Deeduvanu, R., Kanjamala, P., Boles, K.: A blockchain based approach to health
information exchange networks. In: Proceedings of NIST Workshop Blockchain Healthcare,
vol. 1, pp. 110 (2016)
29. Patel, V.: A framework for secure and decentralized sharing of medical imaging data via
blockchain consensus. Health Inf. J. 1–14 (2018)
30. Chun-Wei, T., Chin-Feng, L., Ming-Chao, C., Yang, L.T.: Data mining for internet of things:
a survey. IEEE Commun. Surv. Tutor. 16(1), 77–97 (2014)
31. Artelnics: Neural designer (2015). Available online: https://www.neuraldesigner.com
32. Chollet, F.: Keras (2016). Available online: https://keras.io/
33. Apache Software Foundation: Apache Singa (2016). Available online: https://singa.incubator.
apache.org
34. Skymind: Deeplearning4j (2016). Available online: http://deeplearning4j.org
35. Microsoft: Microsoft cognitive toolkit (2016). Available Online: https://github.com/microsoft/
cntk
36. Apache Software Foundation: Apache MXNet (2016). Available Online: https://mxnet.apache.
org/
37. Artelnics: OpenNN (2014). Avaiable Online: http://www.opennn.net
38. Paszke, A, Gross, S., Chintala, S., Chanan, G.: PyTorch (2016). Avaiable Online: https://
pytorch.org
39. Google: Tensorflow (2016). Available Online: https://www.tensorflow.org
40. Universite de Montreal: Theano (2019). Available Online: http://deeplearning.net/software/
theano/
41. Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L., Muller, P.-A.: Adversarial attacks on
deep neural networks for time series classification. In: IEEE International Joint Conference on
Neural Networks (2019)
Intelligent, Secure Big Health Data Management … 209
42. Bahga, A., Madisetti, V.K.: Blockchain platform for industrial internet of things. J. Softw. Eng.
Appl. 09, 533–546 (2016)
43. Zheng, Z., Xie, S., Dai, H., Chen, X., Wang, H.: An overview of blockchain technology:
architecture, consensus, and future trends. In: Proceedings of the 2017 IEEE International
Congress on Big Data (BigData Congress), Boston, MA, USA, 11–14 Dec 2017, pp. 557–564
(2017)
44. Ripple: Ripple—one frictionless experience to send money globally (2018). Available online:
https://ripple.com
45. Androulaki, E., Manevich, Y., Muralidharan, S., Murthy, C., Nguyen, B., Sethi, M., Singh, G.,
Smith, K., Sorniotti, A., Stathakopoulou, C., et al.: Hyperledger fabric: a distributed operating
system for permissioned blockchains. In: Proceedings of the Thirteenth EuroSys Conference,
Porto, Portugal, 23–26 Apr 2018
46. Zhang, J., Xue, N., Huang, X.: A secure system for pervasive social network-based healthcare.
IEEE Access 4, 9239–9250 (2016)
47. Saif, S., Gupta, R., Biswas, S.: Implementation of cloud assisted secure data transmission in
WBAN for healthcare monitoring. In: Proceedings of International Conference on Advanced
Computational and Communication Paradigms (ICACCP 2017), Advances in Intelligent Sys-
tems and Computing, vol. 705, pp. 665–674 (2018)
Sohail Saif is working as a Full Time Ph.D. Research Scholar at Maulana Abul Kalam Azad Uni-
versity of Technology, West Bengal, India. He completed his B.Tech in Computer Science and
Engineering and M.Tech in Software Engineering from Maulana Abul Kalam Azad University of
Technology, WB in 2014 and 2018, respectively. His areas of research interests are internet of
things, network security and remote healthcare.
Suparna Biswas is an Associate Professor in the Department of Computer Science and Engineer-
ing in Maulana Abul Kalam Azad University of Technology, WB. She completed her ME and
Ph.D. from Jadavpur University, West Bengal in 2004 and 2013 respectively. She was an ERAS-
MUS MUNDUS Post Doctoral Research Fellow in cLINK project in Northembria University,
Newcastle, UK during 2014–2015. Her areas of research interests are internet of things, wireless
body area network, machine learning, network security and remote healthcare. She has authored
a number of research papers published in peer reviewed international journals and conferences
of repute. She is currently PI of a WB DST funded major research project on IoT based secure
remote healthcare.
1 Introduction
World’s leading causes of death include very harmful disease such as Malaria. Malaria
is spread when an infected female Anopheles mosquito bites a person. It is one of the
predominant diseases in the world causing life threatening disease and increasing
the Mortality rate in the countries like India. Different kinds of malaria parasite
including P. falciparum, P. ovale, P. vivax and P. malariae can cause disease to
humans, of which P. falciparum is the deadliest. As per WHO Malaria Report of
2015 [1], roughly 3.3 billion people in different countries are estimated to be in
the risk of being affected with malaria. Also in the report it was mentioned that
around 1.2 billion people are at higher risk. It was estimated that there were around
214 million instances of malaria all over the world in 2015 and about 438,000 deaths
were seen due to malaria. The impact was more in countries like Africa, where
approximately 91% [2] of total demise happened due to malaria which included two
third of all deaths were of children of age below 5 years. Some sign of malaria include
Muscle pain, vomiting, Chills and in some critical instance it leads to comma which
results in Person’s death.
There exist many medicines which make malaria a remediable disease but due to
lack of new equipments and manual counting of blood cells the rate of deaths are
increasing rapidly. The standard method used worldwide for diagnosis of malaria is
light microscopy of blood films. This method though frequently used but comes with
heavy drawbacks. This method requires heavy expertise of the pathologist which
depends on the amount of burden imposed by large scale analysis which is common
in malaria prone area. This method involves counting of parasite and RBC drugs
manually which is a labor-intensive and error-prone process, especially if patients
have to be tested several times a day. However, accurate counts are essential to diag-
nosing malaria accurately, and are an important part of testing for drug-effectiveness,
drug-resistance, and estimating disease severity. This issue can be solved by training
machines to do the work of pathologists. We can the train the machine using many
deep learning algorithms [3–5].
Deep learning, which is the fastest growing area, has been performing excep-
tionally well in medical field these days. We use a deep learning model which is
popularly known as Convolutional neural network (CNN) in our Model [4].
The main feature of CNN model is that it can automatically detects the important
features without any human supervision by training the learning layers once the
model fits the input feature. The CNN model provides us the great visualization
which helps us in understanding the relations. As compared to other models, CNN is
computationally more effective than other models. Other advantage of CNN is that
it is easy to train the models and also have less parameter as compared to networks
which are fully connected with identical amount of hidden units [6–8].
2 Background
CNN are mainly used to categories images, group the images by their similarity and
carry out different recognition operation such as image or object recognition. CNN
application is not limited in any one field. Some of the applications of CNN are that
it can detect different anomalies in the medical images, character text generation,
automation of many devices and many more [9].
Malaria Disease Detection Using CNN Technique with SGD … 213
Nowadays we can see the application of CNN everywhere. It is one of the most
sought-after deep learning architecture. Popularity and effectiveness of convents
increased the interest in Deep Learning. By AlexNet in 2012, the interest in CNN
increased rapidly and has been growing till date.
CNN is the best solution to the entire image related problem. When it comes to
image related problem statement, CNN is the ultimate go-to model because of its
accuracy. CNN can be applied to different models such as recommendation model,
natural language processing and many more. The main advantage CNN has over
other algorithms is that it automatically detects the features which are essential for
classification without teaching the model throughout. For example, given pictures of
two different objects, it automatically detects the features that differentiate the two
classes. CNN model follow some architecture which is shown in the Fig. 1.
First the input image is taken on which we will perform the operations. Convo-
lution and Pooling are performed on input image along with different number fully
connected layers. We get output as softmax while performing multiclass classifica-
tion.
Convolution
field when all these operations take place. The size of the receptive field is same as
that of the filter. Figure 3 shows the convolutional layer.
In case of image related problem 3D convolution is performed. Here an image
has three dimensions namely length, height and breath. The colour of an image or
the RGB channel is represented by the height of the image. In order to perform
actual convolution we need to perform multiple convolution using different filter,
the outcome of each convolution performed is then taken together to form the actual
output of the convolution layer.
Non Linearity
Neural networks like ANN and auto encoder are powerful because of its non-linearity.
Here the sum of weighted input is passed through an activation function to gain
output. Similar technique is used by CNN also. In CNN, the output we obtain from
convolution layer is passed through relu activation function. This implies that the
output that has been mapped to feature map is not just the summation of the matrix
multiplied element but also has relu applied on it. If we consider all the convolution
performed, relu activation function is applied on every network because without that
the network cannot be powerful [11].
Equation 1, defines the Relu activation function mathematically.
y = max(0, x) (1)
Stride is the count of how we slide the convolution filter at each step of the convolution
to be performed. The default value of stride is considered as 1. The bigger the stride,
the smaller is the feature map.
When the size of the stride is increased, the feature map size gets reduced and
may become smaller than that of the input image because the image must contain the
convolution filter. In order to maintain same dimensions of image and that of feature
map we need to have padding around the image [11].
The padding can be of all zeros or else can be of the values already mentioned on
edges of the input image. Now with padding we can achieve a feature map of similar
216 A. Kumar et al.
size of that of image. That’s why to maintain the size of feature map, padding is used
in CNN or else it may shrink with each step performed.
Figure 5 illustrates how the full padding and same padding are applied to CNN.
Pooling
We perform pooling after convolution to reduce the size. It also helps us to lessen
the parameters which in turn reduce the time of training. It helps to down sample the
feature maps by reducing their height and width and keeping the depth or the RGB
values constant.
Max pooling is the most commonly used pooling technique. It works by consid-
ering the maximum value in each pooling window. Pooling has no parameters. It
also performs sliding window technique by selecting the maximum value from each
window. The size of window is specified using the value of the stride [11].
Figure 6 shows the max pooling, in which a window is slides, like a normal
convolution, and get the biggest value as the output.
Hyper parameters
If only the convolution is considered by ignoring pooling then we have take into
consideration four important factors. They include:
• Filter size: filter size of 3 × 3 or 5 × 5 or 7 × 7 is generally used.
• Filter count: It is generally a variable size within the range of 32–1024. The
more the number of filters used, more powerful the network becomes. This has
a limitation also. When the number of filter is increased the over fitting issue
increases because of the increase in the count of parameters.
• Stride: The size of stride always kept 1.
• Padding: Padding is generally preferred.
Fully Connected
Now after performing pooling and convolution we add an extra layer named fully
connected to complete the CNN architecture. The output we obtain after both pooling
convolution is performed is a 3D volume but for fully connected layer we need the
input to it should be a 1D volume. Therefore we need to convert we need to flatten the
3D volume output obtained in pooling layer to 1D volume so that it can be an input
to the fully connected layer. Flattening is a simple converting a 3D volume to a 1D
volume. Figure 7 shows the fully connected layer of convolutional neural network.
Training
The training of CNN is done in the same way as of ANN, back propagation fol-
lowed by gradient descent. The involvement of mathematical operation is due to
convolution.
model.add(Dense(512,activation=’relu’,name=’dense_1’))
model.add(Dense(128,activation=’relu’,name=’dense_2’))
model.add(Dense(1,activation=’sigmoid’,name=’output’))
In SGD, stochastic tells about the system or task that is associated with random
possibility. In this process, instead of whole data set, we select few samples randomly
from dataset. SGD computes the parameter’s gradient using only a single or a less
training examples [12]. Equation 2 shows the updation of each training example.
W := w − n∇ Q i (w) (2)
2.3 RMSprop
The RMSprop optimizer is alike the gradient descent algorithm with momentum.
The RMSprop optimizer limits the oscillations in the upright direction. Therefore,
we can increase our learning rate and our algorithm could take substantial steps
in the horizontal direction converging quickly. The difference between RMSprop
and gradient descent is on how the gradients are calculated [13]. We are calculating
Running average in terms of means square as shown in Eq. 3,
Parameters w (t) and L (t) , where index t indicates the current training iteration,
Parameter updation in ADAM is given by:
m (t+1)
w ← β1 m (t)
w + (1 − β1 )∇w L
(t)
(5)
2
vw(t+1) ← β2 vw(t) + (1 − β2 ) ∇w L (t) (6)
m (t+1)
w
mw = (7)
1 − (β1 )(t+1)
vw(t+1)
vw = (8)
1 − (β2 )(t+1)
mw
wt+1 ← w t − η
(9)
vw + ∈
In Eq. 5 and 6, β1 and β2 are gradient’s forgetting factors and second moment of
gradients. In Eq. 9, ∈ is small scalar used to prevent division by 0.
Deep learning can be instrumental in prevent the wrong diagnostic decision by imple-
menting the classification of cell images. An area of machine learning popularly
known as Deep Learning has executed outstandingly well in fields other than med-
ical because the its applications had been less implemented in medication area due
to absence of expertise in knowledge in that area and due to some privacy concerns
as well. But, in last few years medical sectors have started using deep learning [15].
A well known super class of artificial neural networks, Convolutional neural net-
work (CNN) has become most influential in diverse computer vision operations and
has gained recognition across a different diversity of domains which includes med-
ical science fields. CNN model can learn spatial features through means of back
propagation which involves different building blocks. Figure 9 depicts an example
of CNN model.
CNN is a best deep learning model specially defined for 2-Dimensional facts
such as videos and images. The CNN model provides us the great visualization
which help us in understanding the relations. The main feature of CNN model is that
it can automatically detects the important features without any human supervision
by training the learning layers once the model fits the input feature. As compared
to other models, CNN is computationally more effective than other models. Other
advantage of CNN is that it is easy to train the models and also have less parameters
as compared to networks which are fully connected with similar number of hidden
units. [17–20].
Malaria Disease Detection Using CNN Technique with SGD … 221
The data that has been used in the development of the system were taken from official
website of National Library of Medicine (NLM) which contains 27,558 images of
cells which is further divided into infected and uninfected cells. Figure 10 shows the
sample dataset.
The technique in which an array of static and interactive graphics within a specific
context is used to help us understand and interpret a large amount of data is known
222 A. Kumar et al.
Preprocessing is the process of making transformations on the raw data before the
machine learning or deep learning algorithm are applied on it. Preprocessing of data is
an essential stage in Machine Learning because the standard of data and functional
details can be extracted from it which can affect the quality and accuracy of our
model, therefore, processing of data is of utmost important.
For example, if we train convolutional neural network on raw images then it
will give us poor result. The preprocessing phase also helps to accelerate the whole
model. In our Model, Images are processed into Jupyter Notebook. Before inputting
the image to CNN for training, we normalize the image by dividing it by 255.
Malaria Disease Detection Using CNN Technique with SGD … 223
4 Proposed Model
The Convolutional Neural Network is one of the most effective neural networks to
work with images and make classifications. In our model we have used Keras to
create the CNN model. Figure 12 depicts the basic flow of our model.
Convolution 2D
MaxPool 2D
Pool_Size: It defines the matrix size which defines the number of pixel values that
will be converted to 1 value. We used the pool_size value as 2.
Dropout
Flatten
It flattens the complete n-dimensional matrix to a single array.
Dense
It defines a densely connected neural network layer and I defined the following
parameters:
• Activation: It defines the activation function which we set as relu.
• Units: Number of neurons in a given layer is defined by Units.
Model Training and Result Analysis
Using fit method, we trained the model with x_train and y_train. We have used total
epochs as 50, which is basically 50 iterations of the complete dataset with a batch size
of 50. We have also splitted our data into validation of 0.1, so the model trained on
90% training data and validated on 10% training data. Summary of our Experimental
exemplary is shown in Table 1.
We have evaluated our model with different optimizer and obtained the different
accuracy.
Malaria Disease Detection Using CNN Technique with SGD … 225
Here, stochastic tells about the system or task that is associated with random pos-
sibility. In this process, instead of whole data set, we select few samples randomly
from dataset. SGD computes the parameter’s gradient using only a single or a few
training examples. When we applied SGD optimizer in our model, it gave us the
accuracy of 95.54% on test set and 95.33% on train set.
Accuracy and Log-Loss (also known as Cost Function) parameter were found
during the training of our model and are plotted which is shown in Fig. 13.
The classification Report obtained while using SGD optimizer in our model is
given in Table 2.
The RMSprop optimizer is alike the gradient descent algorithm with momentum.
The RMSprop optimizer limits the oscillations in the upright direction. Therefore,
we can increase our learning rate and our algorithm could take substantial steps in
the horizontal direction converging quickly. The difference between RMSprop and
gradient descent is on how the gradients are calculated.
When we applied RMSprop optimizer in our model, it gave us the accuracy of
95.54% on test set and 95.32% on train set. Accuracy and Log-Loss (also known
as Cost Function) parameter were found during the training of our model and are
plotted which is shown in Fig. 14.
The classification Report obtained while using RMSprop optimizer in our model
is given below in Table 3.
After analyzing different optimizer on our dataset we got different accuracies on our
train and test set which is plotted in Figs. 16 and 17.
The different accuracy is plotted which was obtained by using different optimizer.
On Test and Train set we saw that ADAM optimizer worked very well with our dataset
and gave us good accuracy of 96.62% in Test Set and 96.88% in Train Set.
228 A. Kumar et al.
Accuracy (in %)
96.5
96
95.54 95.54
95.5
95
SGD RMSProp ADAM
Optimizers
96.5
96
95.33 95.32
95.5
95
94.5
SGD RMSProp ADAM
Optimizers
The purpose of the proposed method is to improve the quality of detection of Malaria
which can help microscopists to detect malaria easily and accurately and further can
start the proper medication as soon as possible. The future work is directed towards
improving the performance and enhancing the algorithm and denoising the images
of blood cell for better detection of Malaria. Another direction of future work is
by implementing this model into a single application which can be operated on any
Smartphone to detect malaria easily.
References
1. Malaria Microscopy Quality Assurance Manual, version 2. World Health Organization (2016)
2. World Malaria Report. World Health Organization (2016)
3. O’Meara, W.P., Mckenzie, F.E., Magill, A.J., Forney, J.R., Permpanich, B., Lucas, C., Gasser,
R.A., Wongsrichanalai, C.: Sources of variability in determining malaria parasite density by
microscopy. Am. J. Trop. Med. Hyg. 73(3), 593–598 (2005)
4. Rajaraman, S., Antani, S.K., Xue, Z., Candemir, S., Jaeger, S., Thoma, G.R.: Visualizing
abnormalities in chest radiographs through salient network activations in deep learning. In:
Life Sciences Conference, IEEE, Australia, pp. 71–74 (2017)
Malaria Disease Detection Using CNN Technique with SGD … 229
5. Liang, Z., Powell, A., Ersoy, I., Poostchi, M., Silamut, K., Palaniappan, K., Guo, P., Hossain,
M.A., Sameer, A., Maude, R.J., Huang, J.X., Jaeger, S., Thoma, G.: CNN-based image analysis
for malaria diagnosis. In: International Conference on Bioinformatics and Biomedicine, IEEE,
China, pp. 493–496 (2016)
6. Dong, Y., Jiang, Z., Shen, H., Pan, W.D., Williams, L.A., Reddy, V.V.B., Benjamin, W.H.,
Bryan, A.W.: Evaluations of deep convolutional neural networks for automatic identification
of malaria infected cells. In: International Conference on Biomedical and Health Informatics,
IEEE, USA, pp. 101–104 (2017)
7. Shang, W., Sohn, K., Almeida, D., Lee, H.: Understanding and improving convolutional neu-
ral networks via concatenated rectified linear units. In: International Conference on Machine
Learning, ACM, USA, pp. 2217–2225 (2016)
8. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V.,
Rabinovich, A.: Going deeper with convolutions. In: International Conference on Computer
Vision and Pattern Recognition, IEEE, USA, pp. 1–9 (2015)
9. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn.
Res. 13(1), 281–305 (2012)
10. Saha, S. https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-
networks-the-eli5-way-3bd2b1164a53
11. Majumdar, S.: DenseNet Implementation in Keras. GitHub
12. Shang, F., Zhou, K., Liu, H., Cheng, J., Tsang, I.W., Zhang, L., Tao, D., Jiao, L.: VR-SGD: a
simple stochastic variance reduction method for machine learning. IEEE Trans. Knowl. Data
Eng. (2018)
13. Yazan, E., Talu, M.F.: Comparison of the stochastic gradient descent based optimization tech-
niques. In: International Artificial Intelligence and Data Processing Symposium. IEEE, Turkey
(2017)
14. Zhang, Z.: Improved Adam optimizer for deep neural networks. In: International Symposium
on Quality of Service. IEEE, Canada (2018)
15. Gopakumar, G.P., Swetha, M., Sai Siva, G., Sai Subrahmanyam, G.R.K.: Convolutional neu-
ral network-based malaria diagnosis from focus stack of blood smear images acquired using
custom-built slide scanner. J. Biophotonics 11(3) (2017)
16. Saha, S.: A comprehensive guide to convolutional neural networks—the ELI5 way. Towards
Data Science
17. Prabhu, R.: Understanding of convolutional neural network (CNN)—deep learning
18. Bibin, D., Nair, M.S., Punitha, P.: Malaria parasite detection from peripheral blood smear
images using deep belief networks. IEEE, pp. 9099–9108 (2017)
19. Das, D.K., Maiti, A.K., Chakraborty, C.: Automated system for characterization and classifica-
tion of malaria-infected stages using light microscopic images of thin blood smears. J. Microsc.
257(3), 238–252 (2015)
20. Kumar, A., Sarkar, S., Pradhan, C.: Recommendation system for crop identification and pest
control technique in agriculture. In: International Conference of Communication and Signal
Processing. IEEE, India, pp. 185–189 (2019)
Avinash Kumar is a Final year student of KIIT DU, Bhubaneswar, India. His research interests
area includes Image Processing, Deep Learning and Machine Learning and currently working in
different research domains.
Sobhangi Sarkar is a Final year student of KIIT DU, Bhubaneswar, India. Her research interests
area includes Deep Learning, Image Processing and Machine Learning and currently working in
different research domains.
230 A. Kumar et al.
aforementioned issues. It is estimated that over 700 million people will possess wear-
able devices that will monitor every step they take. Data collected with these smart
devices, combined with other sources like, Electronic Health Records, Nutrition Data
and data collected from surveys can be processed using Big Data Analysis tools, and
fed to recommendation systems to generate desirable recommendations. These data,
after encoding (state) into appropriate format, will be fed to the Actor network, which
will learn a policy for prioritizing a particular recommendation (action). The action,
state pair is fed to the critic network, which generates a reward associated with the
action, state pair. This reward is used to update the policy of the Actor network. The
critic network learns using a pre-defined Expected Reward. Hence, we find that using
tools for Big Data Analytics, and intelligent approaches like Deep Reinforcement
Learning can significantly improve recommendation results for health care, aiding
in creating seamlessly personalized systems.
1 Introduction
Recommender systems have begun to play a key role in industries including enter-
tainment, retail, education, tourism and many more. However, one of the largest
industries exploiting the potential of effective and accurate recommendation systems
is the healthcare industry. The time has come when people have started to unleash
the potential of the state-of-the-art technologies, that we have and can develop, to
improve the most important aspect of their lives, i.e. Health. Facts and figures shown
in Sect. 2.2 are surprising. Studies by the World Health Organization conclude that
there are over 12,400 diseases, disorders and health related ailments that can poten-
tially strike us at any given day. On one hand, around half the population of the
United States is not aware about the potential health related threats, be it obesity,
diabetes, heart failure, etc., and on the other, we have quintillions of bytes of data,
just related to health and fitness being generated on a daily basis. Personalized health
recommendation systems are meant to bridge this gap.
In this chapter, we propose a deep reinforcement learning based framework for
generating personalized health recommendations. Sections 2 and 3 aim to provide
some basic, yet important literature concerning the topic on hand. It discusses various
facts and figures, role of big data and reinforcement learning in recommendation
systems, elaborates on existing recommendation systems and also proposes a need
for a reinforcement learning based recommendations. It also discusses the various
problems that we are trying to address. They include awareness, data not harnessed
to its potential, some security issues and also about the low doctor to patient ratio.
The next section deals with some of the limitations of the existing solutions to the
said problems, which include lack of an all-round recommendation system, system
biases and myopic recommendations.
Deep Reinforcement Learning Based Personalized … 233
Next, we try to use some of the features of Deep Reinforcement Learning, com-
bined with standard machine learning and data mining algorithms and techniques,
along with the potential in Big Data to help address the problems by overcoming
the said limitations. Thereafter, the aforesaid three-layer deep reinforcement learn-
ing based framework is discussed to support our claims. The proposed framework
consists of three layers. The first is the data integration and preprocessing layer. In
this layer, we try to integrate the data collected from various sources and process
it, using big data [1] and data mining techniques, so that it can be fed to the sec-
ond layer. The second layer is the disease probability prediction layer. It consists of
10 legacy machine-learning and deep-learning algorithms to predict the probability
of 10 commonly occurring diseases, and for which some recommendations can be
made. Finally, the third layer is the recommendation generation layer. It consists of
an actor critic model helping us to make sequential decisions, hence making desirable
recommendations. Towards the end, the method to process the outputs of the actor
critic model and put it to use is described. Lastly, some sample recommendations
prescribed by a medical practitioner have been provided for ready perusal.
2 Background
In this era of internet and informatics, the amount of information being consumed and
generated has grown exponentially. Before the advent of this myriad of applications
generating and using information, it was not so difficult to manage information and it
was fairly possible to deliver the right information to the right person. However, con-
sidering the present scenario, it has become extremely difficult to deliver personalized
information to a targeted audience. Here are where powerful recommendations come
into the picture. Previous works in the recommendation systems primarily include
content-based collaborative filtering techniques, deep learning models, factorization
machines, regression models, hybrid mechanisms, etc.
discussed above. The first one is called user-user collaborative filtering, and the other
is called item-item collaborative filtering.
Furthermore, a recommender system can also be built using a combination of
the aforementioned content-based and collaborative filtering based, called a hybrid
recommendation system [3]. A third approach called property based collaborative
filtering can be used to solve some of the persistent issues of data sparsity, over-
specification, slow start, etc. Health Aware REcommendation System or HARE is
an ontology-based model that uses levels of appeal as a basis of providing recom-
mendations.
All of the above-mentioned approaches for recommendation are built from the
perspective of customers to help them choose effectively, something that they may
or may not be looking for. However, we could not find efficient models to provide
recommendations in the discipline of health and bioinformatics. Moreover, there
are some issues with these methods that need to be fixed. These methods fail to
consider the long term effects of the recommendations they make. Especially, in
the health domain, the long-term results far outweigh the short-term successes. The
ability of reinforcement learning based model, collectively working with the power
of neural networks, to work in a dynamic environment; hence overcoming the short-
comings of traditional recommendation systems make them stand apart. Healthcare
recommendations have a dire need to make sequential decisions, rather than sponta-
neous decisions [4]. The proposed framework helps resolve such issues and provide
a unique all-around perspective to effective health recommendations.
For machines to learn and perform well, they need is data; not just any data, relevant,
complete, formatted, and consistent data. According to International Data Corpora-
tion, it is estimated that 2314 exabytes (1 exabyte = 1 billion bytes) of data, relating to
healthcare industry alone will be produced annually by 2020, which is growing at an
unbelievable rate of 48% per annum. Given this, and the highly advanced algorithms
to extract valuable information from this data, coupled with compatible sophisti-
cated hardware, we have an opportunity to give something to the society. Having
mentioned this, the biggest question that arises is the sources and the authenticity of
the sources of this data.
We were not surprised to know about some of the following facts mentioned in the
Stanford Medicine 2017 Health Trends Report, titled “Harnessing the Power of data
in Health”. 84% of the patients are ready to share vital statistics like blood pressure
or basic lab test results and 75% of the people are willing to share information about
the health of internal organs. We have been hammered with buzzwords like IoT, Big
Data, Machine Learning, Deep learning and what not. Well, statistically analyzing,
it is going to be a $34 billion-dollar market for wearable technology, generating
quintillions of bytes of research usable data every day. The exponentially growing
pace of research in the health domain motivates many researchers to make significant
Deep Reinforcement Learning Based Personalized … 235
There are a few buzzwords which have gained momentum in past few decades, one of
them is Big Data. Before defining it technically, let us give you some reasons behind
tossing of this topic. Forbes has reported that approximately 4.15 M YouTube videos
are watched every minute, 456,000 tweets are sent on Twitter, 46,740 photos are
posted on Instagram and on Facebook 510,000 comments are posted and 293,000
statuses are updated. Not only this Forbes has also reported that with our current
pace, we are creating 2.5 quintillion bytes of data, and this pace is only advancing.
Internet of Things (IoT) is one of the major technologies which plays a vital role in
this advancing. Just imagine the volume of data being produced with these activities.
This rapid creation of data that is being developed by social media, telecom, business
applications, and various other domains is leading to the formation of Big Data.
‘Big Data is all about size and volume of data’, this is the biggest myth that
people have for Big Data. But in reality, it is not just limited to huge volume of data
being collected, indeed it is a collection of large volume of data coming from various
sources in different formats. Data was generated previously also, but those were
in proper formats and that’s why the relational databases were capable of storing
them. But due to the varied nature of data, now it is not possible to store them in
traditional formats. Big Data has three varied formats: Structured, unstructured and
Semi structured.
2.3.1 Characteristics
The following fig explains the five V’s of Big Data [5]:
1. Volume: Huge amount of data
2. Variety: Different formats of data from various resources, being integrated
3. Velocity: Pace of generation of data
4. Value: Extraction of useful data
5. Veracity: Inconsistencies and uncertainty in data (Fig. 1).
Apart from storing this huge amount of data, there’s another vital problem associated
with it, which is to find useful information (knowledge) from this data collection. This
236 J. Mulani et al.
The following fig explains the five V’s of Big Data [15]: -
1. Volume: Huge amount of data
2. Variety: Different formats of data from various resources, being
integrated
3. Velocity: Pace of generation of data
4. Value: Extraction of useful data
5. Veracity: Inconsistencies and uncertainty in data
gives the birth to Big Data Analytics. It is the complex process of processing big data
in order to search for any hidden information, interesting patterns, market trends
and preferences of customers which can indeed help organizations making their
marketing strategies. It is a process of refining the raw, unstructured data retrieved
from various sources to useful information. There are various tools available for
performing this task like Hadoop, Spark, Hive, Pig etc.
Present day organizations realize that Big Data is ground-breaking, however
they’re beginning to understand that it’s not so valuable as when it’s matched with
wise computerization. With enormous computational power, Machine Learning (ML)
and Reinforcement Learning (RL) frameworks help organizations oversee, break
down, and utilize their information definitely more effectively than any time in recent
memory. Machine Learning and Reinforcement Learning are also used to find hidden
information and patterns from huge amount of data using complex algorithms to be
faster and accurate.
Their capabilities are impacting almost every field. They have a profound effect
on healthcare, by providing personalized treatment plans and improving diagnostics.
Predictive investigation empowers specialists and clinicians to concentrate on giv-
ing better administration and patient consideration, making a proactive system for
tending to quiet needs before they are wiped out.
try to maximize the rewards in the long run. Policy may or may not be defined as the
plan of action of an agent (Fig. 2).
Markov Decision Process (MDP) is the process for modeling the problems in the
reinforcement learning. It is used for modeling the sequential decision problems
mathematically.
The environment in Reinforcement Learning problem consists of a set of States
S, a set of actions A, transition probabilities p (st+1| st , at ), a probability distribution
of initial states p(s0 ), a reward function r: S A → R (where R is a real number) and
a discount factor γ ∈ [0, 1]. These components are used for formulating Markov
Decision Process. MDP is defined as a tuple (S, A, p, r, γ ). A policy π is used for
mapping the state with corresponding action. π : S → A. The discounted reward with
discount factor γ can also be used. Here the goal of the agent would be to maximize
the expected return as shown in Eq. 1.
∞
Gt = γ k Rt+k+1 (1)
k=0
2.4.2 Q Learning
Q learning uses Action-Value function for a policy π which denotes how good it is
for an agent to take an action a being in the state s. Equation 2 denotes the Q value
function to be used.
The basic version of Q learning maintains the table of Q values for each state-
action pair value. The Bellman equation (Eq. 3) is used for learning the optimal
Q-value function by performing multiple iterations. The optimal policy obtained by
238 J. Mulani et al.
The process of finding Q values for each state-action space cannot be feasible where
the actions and states are continues and high-dimensional. Moreover, in the recom-
mender systems the number states will be very large. Hence, the process of learning
the Q values for each state-action pair can become very slow if the state space size
increases. Therefore, a parameterized values function Q (s, a; θ) is required to approx-
imate the Q values. Here, θ denotes the parameter vectors that is used for defining the
Q values. Various function approximators such as Linear Combination of Features,
Neural Network, Nearest Neighbor, Fourier/wavelet bases can be used.
Deep Q Network (DQN) [6], an algorithm used in Deep Q Learning, uses Neural
Network as the value function approximators. The DQN gives the Q values (Q (s, a))
as the output for each of the actions(a) that can be taken from the given state(s). In
Deep Q Network the dataset is generated by the tuples of form <s, a, r, s > where an
action(a) is taken at state(s) and the immediate reward(r) is observed after reaching
the new state(s’). Experience replay is done by selecting the random tuples from the
stored database in the memory once the sufficient number of iterations are completed.
DQN uses ε-greedy policy for collecting the information of various states in the
memory. The network updates the weights of the neural network based on the loss
function give below.
2
loss = E Q(s, a; θ ) − r (s, a) + Q s , a ; θ − (4)
Here θ − is a previously stored (frozen) parameter value and is the newly derived
parameters. There is also an improvement for DQN called Duel DQN which estimates
state-value function V(s) and the advantage function A (s, a) with shared network
parameters [7].
DQN method tries to learn the state-action value function through the neural network
and then select the actions accordingly. Policy gradient method directly learns the
policy with the parameterized function, (a, s) [8] The value of reward function is
depended on this policy and various algorithms can be applied to maximize the
reward. The reward function for continuous space can be defined as follows:
Deep Reinforcement Learning Based Personalized … 239
J (θ ) = d π (s) πθ (a|s)Q π (s, a) (5)
s∈S a∈A
Policy based methods and Value based methods (Deep Q Learning) have certain
drawbacks. Problem with Policy method based is that it is very hard to find a good
score function that evaluates the policy generated by the algorithm. Similarly for
Value based method, the policy is implicit in the value function approximation.
Hence, it is hard to evaluate the behavior of the model.
Actor-Critic model is a hybrid method that incorporates the features of both,
the policy-based method and value-based methods. Two neural networks, an actor
network that controls the behavior of the agent (policy based) and a critic network
that evaluates the actions taken by the actor (value based) are used in this model.
Figure 3 shows the architecture of Actor-Critic model. Actor interacts with the
environment and updates the θ parameter values of actor network that estimates
the policy. Critic evaluates the actions of actor and updates the parameters of value
function approximations based on the reward obtained.
3 Problems
The problems that we are trying to address can be four-fold. There is a need to address
these problems. These are:
First, is that despite having so much information about people’s previous health
records and knowledge about how it can affect the present health of a person, consid-
ering the environmental conditions as well as the medication he/she is undergoing;
we are not able to use it all to its full potential. Apart from this, the data that we have
may be highly time critical, that means if it is useful now, it may become obsolete
at any point in time. Hence, it is important to make the right use of the data and
generate useful insights from the same.
The second, and the most important perspective is that, even with the advancements
in the technology, most of the people are not fully aware that they are even suffering
from a disease. Apart from this, primarily due to medical jargon, even if they carry
out the tests, once the tests are done, they do not track the results in the future. Being
so busy in the schedule, many people forget about the health threats hanging right
in front of them. By our approach, we provide this end to end solution to collect the
data, interpret it, and make people more aware about their own health and health
issues.
A third perspective can be that, even if we have such high end state-of-the-art medic-
inal treatment techniques and technology, doctors fail to address to so many patients.
We have a doctor to patient ratio of less than 1:1000, making it quite difficult for
doctors to handle such a huge volume of patients in time. So, if we can develop
some smart machines that may substitute a doctor for not so high-risk diseases, that
will enable people to data-driven intelligent decision systems, recommending them
Deep Reinforcement Learning Based Personalized … 241
methods to mitigate the diseases, or in some cases even prevent them from happening
by predicting some illness that can strike, based on the available data and history of
similar patients.
Finally, fourth perspective may be about security concerns. Data collected from
various sources related to healthcare can be used for providing better solutions to
the concerned people for their health-related problems. However, the security of the
data should be having prime importance. One must ensure that the health-related
data is used for the benefit and betterment of the society for providing health related
suggestions. It should not be misused for financial benefits of the company. The
framework that we have proposed ensures the security of the data. The health-related
datasets that we have collected are only used for giving recommendations to improve
health’s of the people. We have tried to avoid inclusion of recommendations that
involve financial benefits of various companies in health sector, doctors and hospitals.
The sole purpose of the framework is to use health related data and the knowledge
of various intelligent algorithms of Machine Learning and AI for the betterment of
society.
It is right that every individual is different, and that no two people can have same
medication even if the diseases they are suffering from is same. However, some
steps other than medication like a good diet, or a better exercise format can also
help conquer the disease. Our objective here is to provide better recommendations to
these aspects that can be generalized and they are beneficial to everyone irrespective
of metabolic differences.
solutions. Moreover, the medical history and family details of a particular person are
also not considered for medicine recommendation and disease identification. There
are diseases that come from family members inherently. Hence, if family history is
not taken into consideration the disease prediction can be false.
Similarly, there are various online systems available that suggest a person the
food to be taken and diet to be followed after getting the information of the person’s
age, sex, weight, height, and other required details. However, these systems lack the
feature of identification of potential diseases.
The existing recommender systems deal with the items and feedback provided by the
users for those items. For building a model, the systems only take into consideration
the feedback of the users for the items that the system has already recommended.
This problem is called a System Bias, where the system only considers the feedback
of the users for the items that a system has recommended. In our case the system
has to recommend a content and detailed information based on the given situation.
Moreover, it is not necessary that the system will only take the already recommended
contents in to the account.
Reinforcement Learning based algorithms provide the facility to control the explo-
ration (taking random action) and the exploitation (taking greedy action). The ε-
greedy policy allows us to control the exploration of an agent. Moreover, exploration
decay parameter is also used while building Deep Reinforcement Learning Models
in order to reduce the exploration of the agent after certain iterations or actions.
In the preceding section, we saw various diseases, along with some shocking statis-
tics. It is very clear that the problem persists. Here is how we can contribute to a
possible solution for the same. So, here is the three-layer framework named “Deep
Reinforcement Learning based Personalized Health Recommendation Framework”.
The first layer is the data preprocessing layer. The second layer is the disease iden-
tification and prediction layer, and the third layer is the recommendation generation
layer. An overview of the same is shown in Fig. 4.
We have discussed earlier about the fact that millions of gigabytes of data being
generated every second. Websites, smartphones, wearable devices, hospital reports,
etc. are found to be the key contributors for the same. Accumulation of such huge
amount of data which is varied, versatile, volumetric, velocious and veracious in
nature is nowadays being referred to as Big Data.
As shown in Fig. 4, data collected from various sources have to be integrated first.
The process of integration is cumbersome, because of the irregular and inconsistent
structure and format of the data gathered from variety of sources. However, it is a
necessary step. In the proposed framework, the integration is done keeping patients
as subjects. Each patient can be assigned a patient ID, unique worldwide, and all
the data concerning that patient can be stored in a semi structured format, giving us
the flexibility to accommodate structured, semi structured as well as unstructured
244 J. Mulani et al.
data, which may be obtained from reports, wearable devices, health records, hospital
patient records, etc.
Moreover, just integration is not sufficient. Quality data mining techniques have
to be employed for preprocessing the data before actually using it. The detailed
description of preprocessing as well as the usage of the data has been discussed
below.
As discussed earlier, we can collect huge and huge volumes of data from a myriad of
sources. All these data, however, are raw and cannot be used directly. The data that
we have, consists of many heterogeneous parameters. Some of the common issues
with all the raw data that we have are:
1. Missing Data: It is not possible that we get all the details about all the people,
especially patients. We have to deal with the missing data. There are several
alternatives as to how to deal with them. Some of them are:
Replace with a constant: If we dig deeper, and think about the reason behind
the missing data, there is a high probability of that person not suffering from that
disease. Hence, no test results about that particular attribute is available, or the
case may be completely opposite. That the person is not aware about any such
test, or even that there is a possibility that he may suffer from such a disease in
the foreseeable future. So considering both the scenarios, we can convert it to
two records, by duplicating it. In the first record, we replace the missing value
with the value of that attribute for a normal person. We use this data to predict
the disease. The second record, we ignore that parameter, or if it is possible to
use some alternative, may be less correlated parameter for the prediction of the
disease can be used. Finally, both the prediction’s chances can be either compared
Deep Reinforcement Learning Based Personalized … 245
and maximum is chosen, or a mean of both the predictions can be taken as the
final result.
Interpolation: Another possibility of the missing data may be that the person did
not undergo a particular test for a particular year. But, the data for preceding and
succeeding time periods are available. Different types of interpolation techniques
can be employed to fill the missing information.
2. Data formats: The data that we plan to collect are from different sources, col-
lected and maintained by different organizations, about different diseases, and
different hospitals and stored under different models (unstructured, structured or
semi-structured). The best way to deal with such data is to convert the data to a
format that aids in accommodation of not just presently available data, but also
that the data generated and collected over years to come. XML, or JSON are the
best formats for the same. Many document-based databases help converting data
from different formats to the said formats.
3. Normalization: Deep learning and machine learning algorithms require data in
the normalized form.
4. Data Integration: The data that we collect and preprocess have to be integrated
in a format that is compliant with machine learning and deep learning algorithms’
input formats. Hence, integration of the data is also an important step before using
the data.
After preprocessing the data, we move to the disease prediction module. The pro-
cessed and integrated data are now fed to the disease prediction layer. In this layer,
we try to employ the most accurate existing machine learning based algorithms to
predict the chances of occurrences of some of the common diseases that we target
for recommendation generation.
6.2.1 Diseases
1. Obesity
Obesity has increased at an alarming rate since the last few decades. A survey in
the USA conducted by The Centers for Disease Control and Prevention (CDC)
reveals that around 39.8% of the population in the US is obese. High obesity
leads to heart attack, Type-2 diabetes and certain types of cancer [10]. CDC has
initiated many campaigns in order to make people aware of it. Research shows
that if you could detect obesity before the age of 5, necessary steps can be taken
to prevent it.
The SVM (Support Vector Machine) helps us the best in finding whether a
person is suffering/will suffer from obesity [11]. It is tested upon the National
246 J. Mulani et al.
reaction. Moreover, the swelling makes the movement of air, to and fro the
lungs difficult, causing troubles while breathing. Annual U.S. expenditures for
Asthma are $56 billion. Around 8.3% of the people are suffering from one or
the other form of Asthma. These numbers clearly justify a need for an effective
method to deliver an intervention to identify severe exacerbations before the
patient actually experiences it.
So, in the paper [15], they have built an efficient prediction system that helps
address this alarming issue by using data prepared by Daily Asthma Diary, on
an Adaptive Bayesian Network algorithm to achieve a sensitivity of 73.8% and
specificity of 71.4%.
7. Dementia
Dementia is a neurodegenerative brain disease that results in causing the death
of nerve cells. The damage of nerve cells interferes with the ability of the cells to
communicate with each other. Dementia may not be termed as a specific disease,
rather, it is usually referred to as a term that describes a group of symptoms
associated with a decline in memory or other skills that hinders the person’s
ability to perform daily tasks. Alzheimer’s disease accounts for 60–80% of the
cases followed by vascular dementia. A Naive Bayes Classifier [16] is advised
to be used to predict the disease using the available data.
8. Thyroid
In India, around 42 million people suffer from thyroid disorders, mainly through
hypothyroidism. Every 1 among 10 adults is suffering through it. Most of the
patients include women. Every 3 out of 10 women suffering from this disease
are unaware of it. It is often confused with obesity.
With the help of an Artificial Neural Network (ANN), we try to figure out
whether the person is suffering from it or not. We have used the Thyroid Disease
Dataset from the UCI Machine Learning Repository for our framework.
9. Urine Infectious
Around 150 million people are reported to be diagnosed with Urinary Tract
Infection (UTI) per year. It is a common disease among women, due to their
urethral anatomy. This leads to some serious danger to life. People suffering
from it should be diagnosed frequently.
It is very hard to diagnose a person suffering from UTI as its most of the symp-
toms are similar to those caused by inflammation, etc. Using Back Propagation
Neural Network, we try to predict this disease, with complex symptoms [17].
The algorithm works on the following parameters;
Anamnesis: Gender, Age, Fever, Sudation, Chill, Low back pain, Suprapubic
Pain, Malaise, Dysuria, Pollakiuria
248 J. Mulani et al.
Actor network, also called as policy network is shown on the left part of Fig. 5. The
actor network generates the action a based on the given state s and tries to learn
the policy by adjusting the parameters of neural network. Here, the actor tries to
approximate the policy to give the recommendations related to health. The details
of these recommendations generated are specified in Sect. 6.3.3. Actor network
receives a state an input from State Representation Module. Based on the input, the
actor predicts the corresponding recommendations to be given. The parameters of
the actor networks are updated from the Q values of each of the state-action pair (s,
a) produced by the critic network. Policy Gradient algorithm is used for updating the
parameters of the actor network (Fig. 6).
Deep Reinforcement Learning Based Personalized … 249
Critic network, also called as target network is shown on the right part of Fig. 5. The
critic network tries to approximate the value function for the system based on the
rewards r (s, a) obtained from the environment, after the actor takes an action a from
the current states s. The reward function mainly depends on the environment in which
the system is being implemented. Positive reward for pertinent recommendations and
negative reward for irrelevant recommendations can be given from the environment.
250 J. Mulani et al.
The Temporal Difference learning based error is used for updating the parameters
of critic network by calculating the TD error from the reward obtained and the Q
values predicted. The output generated by Critic Network Q (s, a) is also used for
evaluating and the actions of actor and updating the parameters of actor network as
mentioned earlier.
The activation function being tanh(x) produces the outputs ranging from −1 to 1. It
will be an array of size equal to the number of output neurons. Each of the output
neurons represents a health-related recommendation. For example, walking, jogging,
playing a particular sport, a recommended diet, a specific set of physical activities to
be carried out, etc. The number of output neurons depend on the scale of application
and the number of diseases that are targeted. The question arises that how will this
array of numbers help in generating actual recommendations.
The following steps are suggested to generate apt recommendations:
Step 1: Categorize
The probabilities and age groups may be categorized as shown in Table 1.
These categories can be altered and adjusted as per the requirements. They are
made to help the end users (the ones for whom the recommendations are generated).
Then we generate the recommendations based on these categories.
Step 2: Generate
After categorizing, the recommendations that can be generated combining the outputs
of Disease Prediction layer, age groups and probability of occurrences as shown in
Table 2. Now, the question arises that how are these recommendations communicated
to the target user. If we have a dedicated medical portal, we can do show pop-ups
when the user is logged in. However, in absence of such a facility, we can use the
push notification services that may include media like e-mail, SMS, etc.
7 Future Improvements
The system proposed here has various scopes of improvements in the future. Some
of the possible improvements that we aim to do have been mentioned below.
The approach presented here uses Actor-Critic model for generating recommenda-
tions related to health. However, with advancements in Deep Reinforcement Learning
Deep Reinforcement Learning Based Personalized … 251
Table 1 Categories of
Categories of probability of occurance of disease
probabilities and age
Probability of occurrence of disease (%) Category
0–25 Very low
26–50 Low
51–75 High
75–100 Very high
Categories of age group
Age group Category
10–20 Adolescent
21–30 Young
31–40 Adult
41–50 Middle aged
51–60 Old
61–80 Veteran
based algorithms, various improved and efficient algorithms can be used for generat-
ing recommendations by interacting with the environment. For example, Hindsight
Experience Replay (HER) uses the mistakes committed by the model to learn a bet-
ter policy [20]. Hence, current model can be combined with HER can be used for
achieving higher accuracy by learning the negative rewards obtained from the wrong
recommendations generated.
7.2 Recommendations
The proposed system mainly gives general recommendations for physical activities
and food. However, various other health related recommendations can be incorpo-
rated in order to give an efficient and personalized recommendations to each of the
patient.
The system aims to collect readily available data from hospitals, wearable devices
and laboratories. However, availability of a slightly more specific data such as family
health history of a particular patient, type of physical activities that a patient is
performing daily and other details about the patient’s daily routines can significantly
improve the disease prediction accuracy of the model. Hence, more specific and
accurate recommendations can be generated based on the disease and the daily routine
of the patient. Information retrieval systems can be employed for collecting data in
a much efficient manner.
8 Conclusion
back. The proposed approach is a step closer towards building such systems by
exploiting the technology we have. In this chapter, we have tried to use the concepts
of Recommendation Systems, Reinforcement Learning, Machine Learning and Big
Data for the same. The increasing health consciousness among people, along with
gigantic growth of data and improvements in technology make this framework a
promising work for the future. Because of its personalized solutions, this can even
be a propitious business model.
References
1. Elgendy, N., Elragal, A.: Big data analytics: a literature review paper. In: Industrial Conference
on Data Mining, pp. 214–227. Springer, Cham (2014, July)
2. Pan, C., Li, W.: Research paper recommendation with topic analysis. In: 2010 International
Conference On Computer Design and Applications, vol. 4, pp. V4–264. IEEE (2010, June)
3. Han, Q., Ji, M., de Troya, I.M.D.R., Gaur, M., Zejnilovic, L.: A hybrid recommender system
for patient-doctor matchmaking in primary care. In: 2018 IEEE 5th International Conference
on Data Science and Advanced Analytics (DSAA), pp. 481–490. IEEE (2018, Oct)
4. Wiesner, M., Pfeifer, D.: Health recommender systems: concepts, requirements, technical
basics and challenges. Int. J. Environ. Res. Public Health 11(3), 2580–2607 (2014). https://doi.
org/10.3390/ijerph110302580
5. Patgiri, R., Ahmed, A.: Big data: the v’s of the game changer paradigm. In: 2016 IEEE 18th
International Conference on High Performance Computing and Communications; IEEE 14th
International Conference on Smart City; IEEE 2nd International Conference on Data Science
and Systems (HPCC/SmartCity/DSS), pp. 17–24. IEEE (2016, Dec)
6. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller,
M.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602. (2013)
7. Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network
architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581. (2015)
8. Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforce-
ment learning with function approximation. In: Advances in Neural Information Processing
Systems, pp. 1057–1063. (2000)
9. Zhao, X., Zhang, L., Ding, Z., Yin, D., Zhao, Y., Tang, J.: Deep reinforcement learning for
list-wise recommendations. CoRR, vol. abs/1801.00209. (2018)
10. Mokdad, A.H., Ford, E.S., Bowman, B.A., Dietz, W.H., Vinicor, F., Bales, V.S., Marks, J.S.:
Prevalence of obesity, diabetes, and obesity-related health risk factors, 2001. JAMA 289(1),
76–79 (2003)
11. Montañez, C.A.C., Fergus, P., Hussain, A., Al-Jumeily, D., Abdulaimma, B., Hind, J., Radi,
N.: Machine learning approaches for the prediction of obesity using publicly available genetic
profiles. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 2743–2750.
IEEE (2017, May)
12. Sharmila, R., Chellammal, S.: A conceptual method to enhance the prediction of heart diseases
using the data techniques. Int. J. Comput. Sci. Eng. (2018, May)
13. Sisodia, D., Sisodia, D.S.: Prediction of diabetes using classification algorithms. Proc. Comput.
Sci. 132, 1578–1585 (2018)
14. Sindhuja, D., Priyadarsini, R.J.: A survey on classification techniques in data mining for ana-
lyzing liver disease disorder. Int. J. Comput. Sci. Mob. Comput. 5(5), 483–488 (2016)
15. Finkelstein, J.: Machine learning approaches to personalize early prediction of asthma exacer-
bations. Ann. New York Acad. Sci. 1387(1), 153–165 (2017)
16. Jammeh, E.A., Camille, B.C., Stephen, W.P., Escudero, J., Anastasiou, A., Zhao, P., Chenore,
T., Zajicek, J., Ifeachor, E.: Machine-learning based identification of undiagnosed dementia in
primary care: a feasibility study. BJGP open 2(2). bjgpopen18X101589. (2018)
254 J. Mulani et al.
17. Ozkan, I.A., Koklu, M., Sert, I.U.: Diagnosis of urinary tract infection based on artificial
intelligence methods. Comput. Methods Progr. Biomed. 166, 51–59 (2018)
18. Chae, S., Kwon, S., Lee, D.: Predicting infectious disease using deep learning and big data.
Int. J. Environ. Res. Public Health 15(8), 1596 (2018)
19. Liu, F., Tang, R., Li, X., Zhang, W., Ye, Y., Chen, H., et al.: Deep reinforcement learning
based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv:
1810.12027. (2018)
20. Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., et al.: Hindsight
experience replay. In: Advances in Neural Information Processing Systems, pp. 5048–5058.
(2017)
Jayraj Mulani is pursuing B. Tech from Institute of Technology, Nirma University; currently
studying in the penultimate year. He was a versatile student at high school being beneficiary of
The Best Student award for an all-round performance at Divine Child School. Being passionate
about learning new things, he has always looked to explore and work on technologies ranging from
development to modelling to machine learning. His areas of interest include data science, image
processing, recommendation techniques, machine learning, deep learning, reinforcement learning
and information retrieval. He is a computer science enthusiast and aims to pursue higher educa-
tion from one of the best universities across the globe. His interest in recommendation systems,
combined with the alarming health related issues people face, has led him and his friends to think
of this personalized health recommendation system. He hopes that it will change the way to gener-
ate personalized recommendations and this technology becomes domain independent. He strongly
believes in solving data-driven problems using the state-of-the-art tools ruling the Indian market.
Sachin Heda is a third-year undergraduate pursuing B. Tech from Institute of Technology, Nirma
University. Since childhood he has engaged himself in solving real life problems. He is an enthu-
siastic learner and has high conceptual clarity. He was awarded with the Head Boy of his School.
Being a computer science enthusiast, he had explore and work on many technologies including
data science, recommendation techniques, big data. He is an extrovert person and thinks that now
people are becoming more aware about their health. He and his friends think that this personalized
health system will bring a huge change (positive) in the lives of people.
Prof. Jitali Patel is working as an Assistant Professor in Computer Science and Engineering
Department at Institute of Technology, Nirma University. She obtained post-graduation degree
ME(CE) from Dharmsinh Desai University in the year 2010. She has an experience of more than
10 years in the field of Teaching. She has taught Artificial Intelligence, Information Retrieval, Data
Mining, Object Oriented Programming and Data Structure. Her area of interest and research are
Machine Learning and its applications. She has published more than 10 peer review research arti-
cles
Deep Reinforcement Learning Based Personalized … 255
Prof. Jigna Patel is working as an assistant professor in Computer Science and Engineering
Department at Institute of Technology, Nirma University. She obtained post graduate degree ME
from Dharmsinh Desai University in the year 2008. She has experience of more than 10 years
in the field of teaching. She has taught Theory of Computation, Cyber Security, Artificial Intel-
ligence, Big Data Analytics, Principles of Programming Language, Mathematical Foundation for
Computer Science and C Programming. Her area of interest and research are Data Warehousing,
Data Mining and Big Data Analytics
Using Deep Learning Based Natural
Language Processing Techniques
for Clinical Decision-Making with EHRs
R. Zhu (B)
Information Retrieval and Knowledge Management Research Lab, Department of Electrical
Engineering and Computer Science, York University, Toronto, Canada
e-mail: [email protected]
X. Tu (B)
School of Computer Science, Central China Normal University, Wuhan, China
e-mail: [email protected]
J. Huang (B)
Information Retrieval and Knowledge Management Research Lab, School of Information
Technology, York University, Toronto, Canada
e-mail: [email protected]
present the status quo of the healthcare industry applications, and provide several
possible directions of future research.
1 Introduction
Health problems remain as the central issue for human lives. Healthcare is a concept
which usually refers to a set of services exercised by health professionals that can
help the patients to improve or maintain health by body check, diagnosis, treatment,
restoration, and prevention etc. Because of the nature of the healthcare sector, it
can produce data in many different forms and structures, such as DNA sequences,
medical scans, and electronic health records (EHRs), at a large scale and an unprece-
dented speed by the continuously growing number of patients, medical facilities, and
healthcare providers. However, the data provided by these healthcare units are often
not well-processed nor well-understood due to its features of high volume, velocity,
variety and complexity. Thus, the adoption of deep learning-based natural language
processing technologies into healthcare studies is increasingly common over the past
few decades.
Deep learning, as a subgroup of machine learning, is a class of algorithms that
can extract features directly from a set of given raw inputs using hidden layers
without human intervention. In the past few years, this class of algorithms have
caught attention of researchers due to its promising and robust results across a wide
variety of tasks and domains. It is now widely accepted that deep learning approaches,
such as convolutional neural networks (CNN), recurrent neural network (RNN) etc.,
can perform well and achieve robust results given different data structures and under
different circumstances. In healthcare, for example, it is more common to apply
convolutional neural network to examine radiology images, whereas for text-based
medical notes, recurrent neural network is more commonly seen.
The Electronic Health Record (EHR) systems, as a component of the healthcare
industry, has gradually drawn more attention recently. The EHR refers to a system
that collects all patients’ health information to store in digital format. Researchers
and healthcare professions consider EHRs to be a very important source of medical
data to provide insights to domain problems. However, as the data types of EHR vary
extensively, and that the system contains a great amount of free text, it has been chal-
lenging for traditional models to tackle these problems. Thus, deep learning models
which can extract features without human interventions are particularly well suited
to solve the EHR problems. The useful information in EHR system include but are
not limited to, patient demographics, lab results, medical scan images, prescriptions,
medical history, and clinical notes, etc. Hospitals initially adopt the EHR system to
store all patients’ data for tracking care, as well as to achieve administrative and
Using Deep Learning Based Natural Language … 259
billing purposes. In fact, these EHRs can provide undiscoverable insights to capture
disease trends, make chronical decisions and draw medical conclusions.
Among all the information that an EHR system contains, text-based clinical notes
are one of the most important resources of the patient’s EHR, however, doctors,
nurses, physiotherapists and pharmacists usually complete each section separately
using different medical wordings and representations. This has created difficulties
for information alignment. Hence, the biggest challenge in the current EHR system
is that all the EHRs are structurally different. It is already time consuming to align
the information in medical profiles, not even to create a pipeline for these EHR
notes processing, term extractions, embeddings or aligning over years, and across
hospitals or other units of healthcare providers. Therefore, the current goal of EHR
management is to increase clinical efficiency by empowering the physicians with
better and more user-friendly EHR systems. Moreover, the system has to help by
lowering the clinical diagnosing costs and minimizing the possibilities of medical
misdiagnoses.
In this chapter, we present the current status of deep learning-based methods that
are adopted in healthcare sector. Then we discuss the unique challenges, and give
the directions of both clinical and technical opportunities for future work. In second
section, we lay out an overview of the existing deep learning methods and cover
the backgrounds and motivations of applying deep learning approaches to medical
domain. In Sect. 3, we examine and categorize the current deep learning-based NLP
techniques into three major purposes: representation learning, information extraction
and clinical predictions. Meanwhile, we compare the experimental results of these
deep learning-based NLP models in medical domain and focus on the novelty and
diversity of these techniques, as well as their evaluation metrics. In Sect. 4, we present
some other related application themes and issues of deep learning in healthcare,
following by the discussion on legal & ethical considerations, as well as the industry
applications. Finally, in Sect. 5, we acknowledge that the recent progress made by
these proposed methods have already made a promising good start. Last but not least,
we provide several possible and promising directions of future research.
Natural language processing is a hot research field in computer science and artificial
intelligence focusing on the interactions between human languages and computers.
Specifically, it suggests how to represent, process and analyze the large amount of
natural language data.
Neural networks are powerful learning models in general. Deep learning
approaches have gained impressive successes in image and speech recognition. In
the past few years, natural language processing has also taken great advantage of
the deep learning algorithms and methods to achieve great advances. There has seen
an attention shifted from traditional machine learning models, such as Support Vec-
tor Machines (SVM) and Logistic Regressions, towards the deep neural network
260 R. Zhu et al.
models such as CNN and RNN. These deep learning approaches eliminate the time-
consuming work on hand-crafted features and replace them with automatic feature
learning.
In this section, we present the main deep learning approaches and architectures
applied in natural languages processing research field. The major approaches include
distributed representations which is the foundation of deep learning models, CNN,
RNNs and Transformer-based Neural Networks.
In the past, local representation is often used to store memories and represent entities
with single element directly. It is an easy to understand and easy to implement
structure but very inefficient, as each unit is associated with only one represented
thing [1]. Distributed representation, however, provides an effective and efficient
way of using more than one representational elements to represent each unit. As the
representations of different units overlap in the neural network, it is possible for the
network to respond to a new input based on its generalization capability. The network
is able to output significant features automatically by pretraining all raw data inputs.
In most cases of our research, scholars do not have large enough annotated data to
use as features to classify tasks. Therefore, we need an unsupervised approach like
distributed representation to pretrain data and to embed words with similar meanings
to similar vectors.
2.1.2 Word2Vec
Word2Vec, created by Tomas Mikolov [2–4], refers to a group of models taking large
text corpus with hundreds of dimensions as inputs to produce word embeddings. It
maps each unique word in the text corpus to a corresponding vector in the space.
And these embedded word vectors are positioned in a way that similar context-based
words should be located nearer to each other. These models are structured as neural
Using Deep Learning Based Natural Language … 261
networks of two layers, aiming to rebuild the context of words among the entire
text corpus. Although Word2Vec is not a deep neural network (DNN), it serves
to transform texts into numerical values that DNN can understand without human
intervention.
In general, Word2Vec trains the inputs against all other neighboring words in
context in one of the two ways, Skip-Gram or Continuous Bag of Words. Skip-Gram
is a way of predicting target context with a given word. Specifically, the goal of the
Skip-Gram model is to achieve the maximization of the average log probability by,
1
T
log p(wt+ j |wt ) (1)
T t=1 c≤ j≤c, j=0
where vw and vw are the representations of vector inputs and outputs, W is the amount
of words in the vocabulary.
Continuous Bag of Words (CBOW) is a way of predicting a target word with
given context. It is built on top of the bag of word concept. Bag of Words (BoW)
is a method of simplifying representation in NLP. It represents a text as a bag of
its words without restrictions on word orderings or grammar. It is often used for
training a classifier to classify documents or texts by counting frequencies of each
word as a feature. CBOW is a way of representing an unbounded number of features
with fixed size of vectors when the number of features is unknown in advance. The
CBOW works in a very similar way as the approach of BoW that can sum or average
the embedded vectors of the corresponding vectors while ignoring the word order
information:
1
k
CBOW( f 1 , f 2 , . . . , f k ) = v( f i ) (3)
k i=1
1
k
WCBOW( f 1 , f 2 , . . . , f k ) = k ai v( f i ) (4)
i=1 ai i=1
262 R. Zhu et al.
Chalapathy et al. [5] propose an RNN approach with bidirectional long short-term
memory model and conditional random field decoding to generate word embed-
dings, namely GloVe [6] and Word2Vec [3]. In order to practice concept representa-
tion and extraction, this proposed bidirectional Long Short-Term Memory (LSTM)-
Conditional Random Field (CRF) model allows every single word within a sentence
to be fed in and mapped to a random word embedded vector first. Thus, the targeted
word embeddings and model could be briefly described. Then, the model applies
word embedding training methods of GloVe, Skip-Gram and CBOW to learn the
entire data collection in order to generate vector representations. These sequences of
vectors are thus fed into the RNN based LSTM model which is good at processing
sequential data to produce a class of medical concepts. As the LSTM tend to favor
the most recent input data, Chalapathy et al. computes both forward and backward
state of hidden representation to eliminate the possible biases.
Besides GloVe, the other most popular way of learning medical concept represen-
tations among current researchers in the field is to generate the distributed embed-
dings with Skip-Gram [4, 7]. Distributed representations of words in a vector space
are able to group similar words together effectively to improve the performance of
learning algorithms of natural language processing. The Skip-Gram technique [4]
introduced by Mikolov et al. is an efficient method for learning vector representations
of words from large quantity of unstructured text data. And it is able to predict the
context, thus to capture relations between words.
Learning high quality representations has never been an easy task. In the past few
years, contextualized word embeddings have achieved impressive results and have
been adopted in many of the recent deep learning NLP models. These contextualized
word embedding approaches are considered to be ideal pre-training word representa-
tion models that can accurately capture complicated characteristics of syntactical and
semantic word use, and understand how these uses change across different natural
language contexts.
ELMO, or Embeddings from Language models, is one of the deep contextual-
ized word representation models. The word vectors are pretrained on a large text
corpus and resulted as learnt functions of all the internal layers or internal states
of a deep bidirectional language model (biLM). In other words, the results of these
vectors, which are stacked above each input word or term for each end task, are
linearly combined in the end to produce a final output. This unique design can boost
the model performance significantly. When these functions serve as an “add up”
onto the existing models, there can be significant improvements in the existing prob-
lems of the NLP domain. The biggest difference in between commonly seen word
embeddings and the ELMO model is that ELMO word representations use the entire
sentence as input. These embeddings are computed on top of the biLMs with char-
acter convolutions, represented as a linear function of internal network states. The
representations generated from the model are contextual, deep, and character-based
Using Deep Learning Based Natural Language … 263
meaning that each word representation depends on the entire context, combines all
layers of the deep neural network, and allows network to form robust representations
for out-of-vocabulary tokens to be trained. The algorithm of the model is presented
as follow:
L
E L Moktask = E Rk ; task = γ task s task
j h kLjM (5)
j=0
−
→L M ← −L M
Rk = xkL M , h k, j , h k, j | j = 1, . . . , L = h k,
LM
j | j = 0, . . . , L (6)
size window running over the sentence or text corpus, the filter function automatically
extracts k grams features from the learning experience.
The network connections between notes in an RNN form a directed graph along
a temporal sequence, which allows it to exercise temporal dynamic behavior. Due
to the characteristics of RNN that can process sequences of inputs, it is common
for NLP tasks such as information extractions and speech recognitions to use RNN
architectures.
The simple recurrent neural network architecture is sensitive to the sequential order-
ing of elements. Mikolov explored it in 2012 and applied it in use of language model-
ing [9]. As suggested in the paper, a basic RNN model is a network of nodes organized
in a set of successive layers where each node in the respective layer is connected with
a one way directed connection with every single node in the next successive layer.
These nodes are consisted of input nodes group, output nodes group and the hidden
nodes group. Specifically, each neuron in the layers has a real valued activation which
can change with the time, while each connection contains a real valued weight that
could change with the time as well. By forming a directed graph along a temporal
sequence with the connected nodes, the RNN model can exhibit a dynamic behavior
by using the internal memory state to process the input sequences.
It is worth noting that the basic RNN structure introduced above is less effective
in training due to the problem of vanishing gradients. As the gradients in later steps
cannot reach earlier input signals and diminish fast in the backpropagation process,
the basic RNN model can hardly capture the long range dependencies.
LSTM was the first to introduce the gating mechanism to solve the vanishing gradi-
ents problem. It is one of the most successful types of RNN architecture in research
field. The feedback connections that distinguish LSTM models from standard feed-
forward neural networks allow it to not only process single data points, but also entire
sequences of data. A classic LSTM model architecture consists of a cell of memory
unit, and three regulator gates, namely an input gate, a forget gate and an output gate.
Theoretically speaking, the memory unit serves to keep track on the elements in the
input sequence and their dependencies. The input gate is in charge of allowing to
what extent the new value input can flow into the memory cell. The forget gate is in
charge of allowing to what extent the new value could remain in the memory cell.
Using Deep Learning Based Natural Language … 265
And the output gate is in charge of controlling to what extent the value in the memory
cell can be used in the activation function, or often known as the logistic function.
In other words, in the proposed model, LSTM splits the state vector into memory
cell, which aims to preserve memory and error gradients across time and working
memory. Several smoothing mathematical functions that are capable of simulating
logical gates are controlling these cells of memories. Within each level of input state,
there is a gate applied to make decisions on to what extent should the new input be
incorporated into the memory cell; and to what extent should the currently existing
content within the memory cell be forgotten.
To solve the problem of vanishing gradient or long range dependencies, Cho et al.
[10] proposed LSTM and GRU respectively as gating-based architectures based
on Hochreiter and Schmidhuber’s [11] theories presented earlier in the years. The
GRU architecture uses update gate and reset gate as two vectors to decide which
information inflow should be passed onto the output level. Compared to LSTM, GRU
is able to train the models while keeping the long-ago memories during the training,
without washing them off throughout the time or removing the irrelevant information
for future predictions. Also, it does not involve separate memory component and also
contains significantly fewer gates. In fact, the architecture of the gated recurrent unit
model is similar to a long short term memory model with forget gate.
768 hidden layers and 12 attention heads; while the B E RTLarge model is a much
bigger model consisting of 24 transformer blacks, 1024 hidden layers and 16 attention
heads [12].
GPT-2 proposed by OpenAI stole the thunder arose by BERT just a bit after.
The model is a large transformer-based language model with 1.5 billion parameters
trained on a dataset of 8 million web pages [13]. It is easy to train to make predictions
on the upcoming word, given all previous words in the context, in 40 GB of Internet
text. Indeed, it is capable of generating astonishing and promising results which
elevated the NLP study furthermore. Particularly, it demonstrates the unprecedented
capability of generating synthetic text samples. And the results have shown that it
generally outperforms other language models which trained on the same domain
without training on the domain specific datasets. The developers decided not to
release data or parameters of the biggest model, therefore it will not be elaborated
here further.
It is worth noting that compared to the RNN and LSTM models, the Transformer-
based neural networks (NN) are more hardware friendly. The existing problem with
RNN LSTM model is that they are difficult to train, since the memory bandwidth
bound computation is a must in process. It is the headache for many hardware design-
ers since training the network takes up a lot of resources in the cloud whereas the
cloud is not scalable in nature. Thus, the applicability of these solutions is limited. For
example, running the LSTM model requires four linear layers which takes up great
amount of memory bandwidth to be computed for each cell and for each sequence of
the time step. Whereas for Transformer-based approaches, only a 2D convolutional
based NN with causal convolution is required for the test, and the generated results
can be even better.
In the past few years, we have seen a rising trend of applying deep learning-based NLP
techniques to medical information processing. The current goal of EHR management
is to increase clinical efficiency by empowering the physicians with better and more
user-friendly EHR systems. Moreover, the system has to help lowering the clinical
diagnoses costs and minimizing the medical misdiagnoses possibilities. In fact, in
the past few years, NN-based representation learning has gained promising results
in many fields, and many natural language processing applications of representation
learning have been developed. Hinton [14] introduced distributed representation for
symbolic data in his paper in 1986. The idea is to form a word embedding layer by
learning the distributed representation for each word in the given text. Meanwhile,
Bengio et al. [15] presented it in the context of statistical language modeling, named
neural net language models [16]. To measure how good the learnt representation are
normally depends on how expressive the representation can capture features of the
huge number of inputs behind [17]. In 2006, Hinton [18] initiated a breakthrough
of greedy layerwise unsupervised pretraining in representation learning that many
other scholars followed to same track quickly after [19–23]. The proposed method
uses unsupervised feature learning to learn each level of the features separately, and
then it consolidates the results from the previous layer. Specifically, by adding up
the weights of each layer to the next, the model builds the deep neural network
by learning representations in an unsupervised way. Thus, a final deep supervised
predictor is generated from the raw data inputs directly.
In order to study the current techniques that have achieved those purposes stated
above, we classify the existing deep learning-based NLP techniques into three major
groups, representation learning, information extraction and clinical predictions. And
these three are considered to be the key technologies and applications adopted in the
current EHR system.
representations of the raw data inputs for feature extraction, and to build predictors
or classifiers. The tasks of representation learning can be supervised or unsuper-
vised. The supervised tasks involve feature learning with labeled inputs, whereas the
unsupervised tasks are the ones with unlabeled input data. In general, as the studies
conducted both in academia and in industry grown rapidly, representation learning
has been nourished by all new discoveries and gained empirical successes overall.
Generally, there are three different ways of using the learnt word embeddings in
the existing literature. First, the scholars choose to train the entire model directly as a
supervised task with randomly initialized embedding matrix from end to end. This is
an easy to adopt method, however, it completely skips the word embedding learning
process, and thus could cause problems such as overfitting. Second, some scholars
pick part of their data to learn word embeddings, and freeze them while training the
rest of the model. Third, most of the conducted research in the past few years choose
to use and train the entire word embeddings from end-to-end. As the deep learning
approaches can perform better with larger amount raw input data, and most of the
recent studies fall into this category, we are going to focus on the literature of the
third approach only in this chapter.
In the medical domain, representation usually involves learning a list of medical
codes or notes which serve for administrative purpose or diagnosis and medication
needs in patient’s EHR system. Unlike sentences which contain an ordered sequence
of words, medical codes in patient’s profiles are randomly ordered. For the purpose
of using these codes and notes as inputs to the machine learning models, represen-
tation learning is adopted to turn them into meaningful representations. Skip-gram,
GloVe, CBOW, stacked autoencoders and BERT are commonly used NLP tech-
niques to learn the distributed embeddings nowadays. In this section, the trending
deep learning based NLP methods will be discussed in the following three subcat-
egories: representations for learning medical concepts, representation for learning
patients, as well as representations learning for clinical abbreviations disambiguation
and abbreviation.
Both the medical codes and clinical notes contain plenty of valuable information
for physicians to do medical predictions and decision-makings. In a regular patient’s
EHR profile, the unstructured data would take up a considerable proportion of his/her
file. Doctors, nurses, physiotherapists and pharmacists each take in charge of one
section of the general profile and fill in the relative information in unstructured format,
known as free texts. The difficulty here for patients to approach to these notes is to
understand those medical jargons and medical instructions. Whereas for researchers,
these free texts are valuable information for producing effective clinical predictions,
but they are also difficult to process. In reality, due to the different structures of the
EHR systems across institutions, as well as the wide variety of medical jargons used
by different healthcare providers, extracting useful information from the big clinical
notes data pool remains as an unsolved problem.
Using Deep Learning Based Natural Language … 269
The heterogeneous nature of the medical data elements and the high volume of
unstructured data make clinical care and medical analytics studies difficult. Most
of the existing literatures learn features and representations by applying ontology
mappings, or by exploiting information directly from the raw data inputs, for example
medical notes or codes. Although the higher level of medical features such as disease
phenotypes can reduce feature aspects to some extent, they may still not be able to
understand the meaningful information embedded in patient data in the entire EHR
system.
Medical concept learning from patient’s medical notes is a dominant research
subfield. Researchers and scholars all understand that many existing approaches to
concept representation in medical domain still face data inefficiency challenges. They
still depend heavily on hand crafted features and extensive domain knowledge that are
difficult to define. To solve the problem, Choi et al. [24] take advantage of the medical
codes’ encoded relationships, which are inherently in multilevel structure in EHR
system, to construct their novel approach. Specifically, they propose a Multilevel
Medical Embedding (MiME) architecture to learn the embeddings of the EHR data
in multilevel, and to make clinical predictions based on the inherent EHR structures
without the help of external labels. The prediction function is evaluated on two
separate tasks of prediction, namely the prediction of heart failure and the prediction
of sequential disease. As a result, the proposed MiME consistently outperform all
other baseline models with significant percentage of improvement.
Escudie et al. [25] demonstrate the possible way of learning low dimensional rep-
resentations of patient’s visits using deep neural network to predict International Clas-
sification of Diseases (ICD) diagnosis categories when these codes are not provided.
The deep neural network approach adopted in this paper takes both structured/semi-
structured data and unstructured free-texts notes in MIMIC-III as inputs. These learnt
codes are pertinent to medical domain, meanwhile they can directly be used as inputs
to DL or ML algorithms for future patient’s health status prediction and prevention.
Choi et al. [26] proposed a different data driven approach to leverage EHR
data directly for medical concept learning. Specifically, the method maps medi-
cal concepts to similar concept vectors close to each other depending on temporal
co-occurrence relationships among raw data inputs. Furthermore, it is capable of
transforming heterogeneous patient’s medical data in EHR system to clinically mean-
ingful features. Hence, the patient vectors are constructed at the same time from the
related clinical concept vectors. As a result, their proposed representation manages
to generate patient representations by learning representations of medical concepts.
In their paper [26], the authors presented the method based on Skip-gram [4, 7]
to learn multi-dimensional vectors, and to capture the latent relationships between
diagnoses, medications and procedures with multi-dimensional real-valued vectors.
De Vine et al. [27] in their paper utilize the UMLS concepts to learn representations
from patient records of free-texts and abstracts of journals. Basically, rather than
directly learn representations from terms in free-texts, they propose a variation of
neural language modeling to learn concepts from structured ontologies and to extract
information from free-texts by preprocessing the medical texts mapping words to
medical concepts in the UMLS. Then, the Skip-gram model is adopted to learn
270 R. Zhu et al.
bills and costs, and applied medicines etc. Patient usually receives his/her own EHR
report with a list of demographic codes serving for each hospital’s administrative
purposes, a bunch of medical jargons with medical codes, as well as lab tested val-
ues. The most common medical codes include but are not limited to CPT Codes
(Current Procedural Terminology), HCPCS Codes (Healthcare Common Procedure
Coding System), ICD Codes (International Classification of Diseases), ICF Codes
for Disabilities, Diagnostic Related Grouping (DRG), NDC Codes (National Drug
Codes), CDP Codes (Code on Dental Procedures and Nomenclature), and DSM-IV-
TR Codes for Psychiatric Illnesses. However, all the medical codes that seem to be
common knowledge for health providers are difficult for the public to understand
the meanings behind. Hence, it is necessary for researchers and scholars to use these
codes as inputs to feed into the models to generate the perceivable information for
the public, as well as to produce credible clinical predictions.
In general, there are two approaches for physicians to make clinical decisions
with medical codes extracted from patient’s medical profiles. A more straightforward
approach is a static one to predict the medical outcomes by feeding models a single set
of inputs for only one time. For example, Choi et al. [26] propose to feed in the EHR
data directly for models to learn heterogeneous concepts and patient representations
based on co-occurrence patterns. This effective method of medical concept as well
as the patient representation learning uses single inputs to generate the results of a
possible heart failure (HF). Meanwhile, it serves to link up relevant concepts and to
boost the performance of predictive modeling.
A more complicated approach is dynamic to predict the medical outcomes by
feeding models a sequence of inputs. The models are capable of producing clinical
decisions after each EHR raw input is fed in or after the entire sequence of EHR
data points are learnt. For example, Choi et al. [31] leveraged a large dataset in EHR
system to develop a temporal predictive model, Doctor AI, for learning observed
medical conditions and uses, which will be discussed further in Sect. 3.3.
In 2016, Choi et al. [32] approached this issue by proposing an algorithm named
Med2Vec and structuring a dataset which consists of patient visit records, diagnosis
codes (ICD9), lab test results(LOINC) and drug usage(NDC). Since the Skip-Gram
can predict the context and capture the relationship between words by learning word
representation vectors, it is necessary to convert the medical codes used in the study
into an ordered form of (target, context) pairs. Thus, they define the (target, context)
pairs at each patient’s profiles level, instead of the sequence of medical codes level.
By doing so, Choi et al. aim to learn medical concepts representations effectively
and efficiently. Besides, the authors were also able to make predictions to patient’s
neighboring visits by representing his/her medical records as binary vectors, and to
further feed into a two-layered neural network.
Another popular representation learning technique applied to medical concept and
event extraction is bag of words (BOW). Li et al. [33] propose an embedding learning
method that incorporates word’s distributional characteristics into medical event
extraction. Their model uses BOW features as baseline, and the results generated
from the word embedding feature learning are promising since the n-gram effectively
enriches the context information.
272 R. Zhu et al.
Tang et al. [34] apply feature learning procedures such as bag-of-words (BOW)
and part-of-speech (POS) to their study which are different from the GloVe and Skip-
Gram approaches. In their experiment, Tang et al. adopt a neural language model to
generate word embedding vectors from the biomedical corpus, and the experimental
results are a bit better than the existing works.
Gong et al. [23] evolve the BOW representation learning method by altering
it to bag of events (BOE) in their study. The BOE stands for the number of events
occurred in the first 24 h of their stay. In their paper, the authors aim to map database-
specific representations to a shared list of medical concepts. Hence, the model can
transfer itself across databases. Meanwhile, the Item ID feature is constructed as a
new identifier pair consisting of each patient’s unique (ID, text value). Lastly, the
representations are converted to the UMLS concepts by a frequently used tool for
identifying UMLS concepts.
Indeed, no matter if the medical codes are fed as a single set of inputs or a
sequence of inputs, they are also common sources of data similar to common medi-
cal notes to serve for medical concept representation and the final clinical decision-
making processes. However, since the medical decision-making process is compli-
cated, researchers and scholars should never consider only one single type of data as
inputs to generate effective clinical predictions.
vectors, from the conversion of all medical concepts in his/her profile to medical
concept vectors, to a single representation vector.
Dligach et al. [36] consider an alternative way of learning patient representation
by applying text variables only with a deep neural network. In their proposed work,
Dligach et al. use billing codes, for example ICD 9 or CPT, as a source of supervision
to learn patient vectors first. Then, they train the proposed model together with a set
of UMLS Concept unique identifiers (CUIs) generated from the clinical notes in
patient’s profile to predict all billing codes might be associated with the patients. The
results from the experiments prove that these learnt representations with the new
method are good enough to reach the currently existed performances on standard
comorbidity detection tasks.
Zhang et al. [37] propose a computational framework named Patient2Vec to learn
patient representations while overcoming the interpretability problem of the deep
learning architectures. It learns each patient’s personalized deep representation of
longitudinal EHR data. For purpose of evaluating their proposed method, they uti-
lize it to predict the future hospitalizations with EHR data from real hospitals. More-
over, they also compare the method’s performance on clinical predictions to other
baseline models. The proposed architecture consists five parts: the learning vector
representations of medical codes with Skip-Gram, the learning within-subsequence
self-attention with one-side convolution operation with a filter and a nonlinear acti-
vation function, the learning subsequence-level self-attention with a bidirectional
GRU-based RNN, construction of the aggregated deep representation by adding
patients characteristics such as demographic information and static medical condi-
tions, and the prediction of outcome with a linear and a softmax later. Indeed, the
Patient2Vec model is able to produce meaningful structures of vector space and to
outperform baseline models with a significant percentage.
Denaxas et al. [38] propose a method of learning word embeddings for disease
diagnoses and medical procedures using global vector (GloVe) base on the national
UK EHR system. They leverage the learnt patient representation to evaluate their
performance on identifying patients who are more likely to be hospitalized due to
the congestive heart failure. Specifically, they adopt GloVe model to four different
corpuses created on their own to learn the word embeddings for concepts. After that,
they evaluate the learnt and normalized patient-level embeddings by predicting heart
failure onset to be tasks of supervised binary classification with linear SVM classi-
fiers. The experimental results are able to produce marginally improved performance
on clinical predictions compared to the current conventional one-hot models. Thus,
it can potentially enable us to build robust EHR-based disease risk prediction models
in the near future.
Wei et al. [39] propose an end-to-end based clinical decision support system
which is able to generate and retrieve relevant information and literature for target
patients with distant supervision. The experiment uses GloVe to train Wikipedia
texts and Word2Vec to train biomedical texts in order to train model for ICD codes
prediction from raw text inputs. Note all the raw input data are drawn from the
MIMIC-III data collections. Then, the Deep Relevance Matching Model (DRMM)
is adopted as a semantic matching model to learn the terms. After that, user’s query
274 R. Zhu et al.
and the candidate documents are split into different paragraphs, while the word
embeddings are replaced with convolution embeddings of paragraph level. Lastly,
cosine similarity is computed to calculate the direct semantic similarity scoring to
output the final results. Their experiment shows a promising result with substantial
improvement in the information retrieval tasks.
Zhu et al. [40] introduced both supervised and unsupervised methods to evaluate
patient similarity also with temporal properties matching of patient’s longitudinal
data in EHR system. In fact, the authors suggest to define a unique medical context by
those medical events that are happened before and after it, and thus use a fixed-length
representation of vectors to express medical concept embedding and to make further
predictions. With the embedded matrix of patients representations, the supervised and
unsupervised methods are adopted to measure similarity. Specifically, the supervised
approach adopts a CNN architecture to learn an optimal representation of patient’s
medical record in the EHR system and to map the convolutional filters towards the
fixed-length of feature vector; whereas the unsupervised approach applies the RV
and dCov coefficients respectively to learn the linear and nonlinear relationships
between patients. As a result, these experiments run on testing data outperform the
baseline models significantly. They also suggest possibilities of future study towards
the same direction.
Liu et al. [41] tackle the medical events and patient representation problem differ-
ently by distinguishing long time scale medical events with strong temporal patterns
from short time scale medical events with disordered co-occurrences. Thus, to accom-
modate clinical events happened in different time scales, Liu et al. propose a model to
learn hierarchical representations of the sequence of events, that are adaptive to dif-
ferent time range events and can capture core temporal dependencies. To be detailed,
the proposed model splits the entire sequence of medical events into several groups
of events with an adaptive event sequence segment module using RNN first. Second,
the model learns the event sequences’ hierarchical representations with two different
mechanism, namely the event attention with aggregating GRU event group function
and temporal attention with GRU sequence representation. The experimental results
outperform most of the existing models and suggest promising results of predictions
on deaths and ICU admissions.
tend to use their own abbreviations to denote certain diagnoses, treatments or medi-
cations in patient’s medical profile. Thus, it becomes one of the most difficult tasks
to study the ambiguous abbreviations in EHR system, especially in the intensive care
unit (ICU) where medical notes are taken in high pressure of workload and limited
time. Therefore, a deeper understanding of the abbreviations in clinical notes would
not only help medical researchers to understand diseases better, but also to enhance
the healthcare service quality more effectively and efficiently.
In [42], Liu et al. initiate to learn word embeddings for clinical abbreviation
expansions by exploiting task-oriented resources. They explore the domain for two
purposes: (1) to effectively reduce misinterpretation of the clinical abbreviations by
normalizing all abbreviations used in ICU documentations; (2) to allow the public to
understand the abbreviations in the medical free texts better. Specifically, based on
the intuition introduced by Harris in 1954 [2], Liu et al. exercise word embedding
or distributional semantic representation to learn the meanings of an abbreviation
in the given medical context without labelled input data. They use Word2Vec [3] to
learn word embeddings first. After that, they used traditional approach to use regular
expressions to detect all medical abbreviations in the ICU notes, the possible candi-
dates of abbreviation expansions are then generated from specific domain knowledge
base [42]. They compute the expansion of abbreviations, which is a multi-word phrase
in most cases, by defining Candidate Ci as the group of the words of the candidate
list, following by similarity computation. Although Liu et al. did not apply deep
learning to continue their experiment, their method still significantly outperforms all
base line methods and achieves 82.27% accuracy.
Wu et al. [43] examines the use of neural word embeddings applied in clini-
cal abbreviation disambiguation and develops two new word embedding methods,
named LR_SBE and MAX_SBE, to generate word sense disambiguation represen-
tations from a large unlabelled medical corpus. Li et al. [44] proposed the method of
Surrounding based embedding feature (SBE) in 2014 which serves as a foundation
Wu et al.’s the next step of the study. The target SBE word representation is learnt
by consolidating the embedded row vectors of all neighboring words that are existed
in the given k size of the window. Built on top of the SBE, Wu et al. assume that the
direction would help to learn better word representations. Similarly, the MAX-SBE
representation takes the same approach by learning to take the maximum value of the
embedding dimensions of the surrounding words. The authors present the intuition
as the higher score of a latent feature gets, the higher importance should the word
win.
data volume, the development of the information extraction still remains in narrowed
and restricted domains because of the relatively higher degree of difficulty.
Indeed, information extraction in the medical domain has always been one of the
most important tasks, especially after the adoption of the electronic health record
system. In healthcare, extracting information from patient’s EHR profile involves
learning and extracting medical information from the doctor’s notes, ambulance
records and prescriptions etc. to be used in machine learning or deep learning models.
It is not only necessary for all related health practitioners to know the patient’s health
conditions more efficiently and thoroughly, but also important for the researchers,
scholars or policy makers in the healthcare sector to understand the diseases and
patient groups better in order to provide better diagnoses, treatment, intervention
and even prevention.
In the EHR system, text-based clinical notes are one of the most important and
informative resources to study about the patients. However, as we mentioned before,
the biggest challenge here is that doctors, nurses, physiotherapists and pharmacists
normally complete each section of the same patient’s file separately without referring
to each other’s notes. On one hand, it is time consuming and unnecessary for them
to flip through the documents; on the other hand, each health practitioner has his/her
own preferred medical jargons that it is hard to align in nature. Therefore, these
different medical wordings and representations create difficulties for information
alignment in the patient’s EHR system. In other words, the most challenging task in
the current EHR system is to align all the EHRs that are structurally different. In fact,
it is already time consuming to align the information in medical profiles, not even to
create a pipeline for these EHR notes processing, term extractions, embeddings or
aligning over years and across hospitals or other health institutions.
Thus, it is necessary for current research and studies to find out what the entities
mean according to different context. In general, there are three ways of doing infor-
mation extraction on medical corpuses. A traditional way of extracting information
is to follow rule-based approach and do it manually. However, this method is not only
financially too costly but also time consuming. The second way of doing information
extraction is using traditional machine learning approaches. These methods are more
efficient than rule-based approach but also involve certain degrees of human involve-
ment, therefore they can be costly as well. Thus, a more effective and efficient way of
extracting information had been developed recently, known as deep learning-based
approach. Indeed, many recent papers and studies on medical information extraction
tasks start using this third approach, deep learning, as they can greatly benefit from
lower cost and no human interventions at all.
The current deep learning (DL) approaches to entity recognition are categorized
into three major groups. The classical rule-based approach usually applies key-
word matching and assign document level labels to the study; RNN approach which
requires large datasets of annotated entities; while the Transfer Learning approach
Using Deep Learning Based Natural Language … 277
which uses language modeling to extract the biomedical name entity recognition
(NER). To eliminate the limitation of using large amount of annotated entities as
prerequisites for training, most of the recent studies adopt the second and the third
approaches for entity recognition.
Gligic et al. [45] introduce a novel approach with transfer learning which can
overcome the problem of neural network model’s dependency on large labelled data
and data scarcity issue for name entity recognition. Specifically, neural networks as
a more robust structure is adopted to experiment on all datasets released by I2B2
(2007–2012). Then, both CBOW and Continuous Skip-Gram (CSG) are adopted to
train embeddings to feed in three term classification architectures, namely context
free feedforward NN, context aware feedforward NN, and a RNN based LSTM
model. The introduced method extracts information on medications, dosages, modes,
frequencies, durations and reasons individually first, and studies the relationship
between them with a sequence to sequence Bidirectional RNN model comprising
one hundred GRUs versus a Bidirectional LSTM encoder-decoder framework.
Sachan et al. [46] tackle the problem by using unlabelled text data to achieve better
results of the NER models. Specifically, they train a bidirectional language modeling
(BiLM) on unannotated data from PubMed abstracts as a transfer learning approach
to pretrain the NER model weights with same architectures of BiLM. The results
generated from this training above are initializing better parameters for the NER
models and improving F1 scores as the speed of convergence with less data inputs.
Hence, the transferred weights of the proposed model along with the pretrained
word embeddings allow the authors to practise end-to-end learning as well as the
supervised NER tasks.
Gorinski et al. [47] take a different perspective by comparing the three dominant
systems, (1) rule-based EdIE-R, (2) a bidirectional Long Short-Term Memory com-
bined with deep learning-based conditional random field, EdIE-N, and (3) transfer
learning based SemEHR with GATE Bio-YODIE. They evaluate these three archi-
tectures on performances on name entity recognition from patient’s stroke records
in the brain imaging reports. By trainings on common data set, the experiment is
able to identify the advantages and disadvantages of these three different systems.
Moreover, it can also construct rules and empirically evaluate the performance of
each system. As a result, they believe although machine learning approaches can be
easier to adopt, the rule-based handcrafted system remains as the most accurate and
trustworthy source of labeling EHR contents automatically.
Other related research about entity extraction on genomic data include Yin et al.
[48], Huang et al. [49, 50], and An et al. [51]. An et al.’s work [51], constructed on top
of [48–50], propose a new metric to evaluate the novelty and relevancy of a medical
term in information retrieval based on the aspect-level performance measure provided
by TREC Genomics Track. The experimental results show that the proposed geNov
metric is superior than the existing metrics in discovering the novelty, redundancy
and relevancy in the ranking process. Moreover, it is considerably sensitive to novelty
and relevancy of a medical term, and the proposed three parameters are highly tunable
according to different evaluation requirements.
278 R. Zhu et al.
Both entity recognition and relationship extraction are standard tasks in natural lan-
guage processing. In biomedical research, besides name entity recognition, it is also
necessary to extract biomedical entities relationships from texts. Many of the existing
literature apply feature-based pipeline models to do relationship extractions which
could cause problems such as error propagation, extracting subtasks without interac-
tions, and heavy work needed on feature engineering. To overcome the issues stated
above, deep learning based natural language processing techniques are commonly
applied.
Li et al. [52] present a competitive and effective neural joint model for practicing
relationship extractions with minimalizing the work on feature engineering. This
novel approach uses CNN first to encode the word characteristics to a corresponding
character-level representation. Then, the generated character-level representation,
word embeddings and part-of-speech embeddings are input into the RNN based
BiLSTM model to learn entity representations and the related context for medical
entities recognition. After that, the relationship representation along shortest depen-
dency path (SDP) of the two target entities is learnt by a second BiLSTM model get
relationship classifiers. The parameters of the LSTM units in both BiLSTM RNN
models are shared, therefore those parameters used in the first part can affect the
second in training in entity recognition and relation classification tasks. Mehryary
et al. [53] also propose an extraction approach based on LSTM model with syntactic
dependency graphs (SDP) and Skip-Gram model to get word embeddings.
Similar to Li et al.’s work, Quan et al. [54] presented a multichannel CNN to
exercise automatic relation extraction in medical domain in 2016 to tackle drug-
drug interaction (DDI) extraction and protein-to-protein interaction (PPI) extraction
problems. This proposed method also eases the complicated feature learning work
by CNN base automated feature learning technique. CBOW was used in the study
to capture information from the entire medical corpus on Medline, while all other
word embeddings are borrowed from Pyysalo et al.’s study [55]. As a next step,
five versions of the word embeddings from PubMed, PMC, MedLine and Wikipedia
are consolidated within the multichannel word embedding input layer. In fact, the
multichannel word embedding used in the model outperformed the current best DDI
Extraction models by 5.1%. In the convolutional layer, the generated embeddings are
filtered to produce n-grams of extracted information by adjusting the window sizes.
Thus, the max pooling layer would be able to distinguish the most important local
features while reducing feature dimensions effectively. Last but not least, the softmax
layer does the final relationship classification based on the information consolidated
from all above.
Cheng et al. [56] focus on medical information extraction from patient’s EHR
system for disease phenotyping. The authors construct a temporal matrix represen-
tation, with time on one dimension while events on the other, for each patient in
the EHR system. Then, the deep learning approach adopts a four layered CNN to
extract phenotypes and predict future medical events. Specifically, the first layer of
the framework consists of temporal matrices of EHR. The second layer then performs
Using Deep Learning Based Natural Language … 279
a one side convolution to extract the features from the first layer. Similar to the works
presented above, the max pooling layer on the third level eliminates certain sparse
data points to leave the most important ones stayed. Lastly, a fully connected layer
with softmax activation function is in structure to output the predictive results.
Zeng et al. [57] learn the relationship between medical notes in EHR and the iden-
tification of distant recurrences of breast cancer closely in their paper. To overcome
the challenge of relying on manual charts reviews to discover the possibilities of
breast cancer’s distant recurrences, they design a hybrid model to work with clinical
narratives and structured data from EHR system only. Specifically, the model first
extracts medical narratives features with MetaMap while retrieving the structured
EHR medical data from the system directly. Second, a linear kernel type of support
vector machine is adopted as a prediction model to learn and identify patients that are
potentially distant recurrences in breast cancer. The model consists of four baseline
classifiers that are adopted here to learn different types of the features, both struc-
tured and unstructured, and to achieve the best results. Generally speaking, the model
gives promising results by combining feature elements extracted from unstructured
clinical text-based notes and from structured data in EHR system to diagnose distant
recurrences of the breast cancer.
Galko et al. [58] learn a broad scale of relationship extraction by retrieving relevant
passages from publicly available data and BioASQ tasks. In other words, they achieve
to retrieve passages in a question answering form. To be detailed, they use the neural
network word embeddings to propose a weighted scheme for cosine distant retrieval.
The paper first projects the terms into semantically meaningful vector spaces which
are learnt from Word2Vec or GloVe. Thus, both the query questions and the retrieving
passages are all represented in fixed-length of vectors. Then, each term in the space
is able to link each other with cosine distance functions. Lastly, with the given
representation and similarity measurement, the passages are clustered and ranked
in list to generate the final results. The proposed method has proved to outperform
traditional models with this cosine distance text matching scheme significantly, and
future work in this direction is possible to be applied on broader range of topical
domains.
Li et al. [59] also utilize the convolutional neural network and distributed semantic
representation to exploit binary event relation extraction tasks. Specifically, the study
employs CNN to model raw data inputs with word embeddings from medical texts by
convolutional layer and max pooling layer. As a result, the most important features
are generated automatically from the Max Pooling layers and thus contribute directly
back to relation extraction tasks.
Among the existing literatures of deep learning NLP techniques applied in medical
information extraction, event extraction has an important standing in the research
subfield. A medical event refers to a change that has been made in patient’s medical
records. Those medical events can be insightful and useful for discovering abnormal
280 R. Zhu et al.
clinical decisions and applications that could cause serious problems such as patient’s
negative reactions to certain treatments and medications.
Rahul et al. [60] apply bidirectional recurrent neural network (RNN) to sequential
labeling for medical events extractions and the understanding of unstructured clin-
ical texts in EHR system. The proposed RNN model avoids using time-consuming
handcrafted features generated by NLP toolkits, and is able to extract higher level of
features directly from the sentences in text corpus to achieve comparable F1-scores
on Multi Level Event Extraction (MLEE) corpuses. The input layer uses embed-
ded representations of words to learn a higher level feature representations, layer by
layer, until it gets the final classification. Specifically, for the input feature layer, the
proposed method extracts two types of features from each single word in the text
corpus, namely the word and the entity. Then, in the embedding layer, each feature
input is mapped to a dense feature vector for the next layer to use. In the bidirec-
tional RNN layer, each word is learnt by both forward and backward RNN to capture
representations of the past and of the future. In this way, the entire context is learnt
within the neural network.
Similarly, Jagannatha et al. [61] in 2016 try to tackle the EHR semantic under-
standing problem by sequence labeling for medical events extractions. Conditional
Random Fields (CRF) is used as a baseline model to compare results from the exper-
iments. Initially, for the purpose of ensuring unbiased representations of infrequent
words, the system trains word vectors from large data corpus in the embedding layer
with Skip-Gram techniques. Then, as the words are assigned to the representations
of corresponding vectors, they are also input into the double chained long short term
memory model for training in both forward and backward directions. The output of
the bidirectional long short term memory layer, an output of the combined repre-
sentations of both words and the related context, is then input into a feedforward
neurons with Softmax functions producing those rates of probability. Meanwhile,
the paper also utilizes another recurrent neural network based algorithm GRU [62]
in the same structure to train the input data as the LSTM structure. The experiments
have shown that RNN models in general are valuable techniques to extract medical
events from the large amount of EHR corpus. The improvements achieved by these
models, especially GRU, suggest that the capability of RNN models to remember
information across different ranges and dependencies of contexts is very important
for effective information extraction.
Natural language generation is one of the NLP tasks focusing on natural language
generation from structured data sources such as the knowledge base or a linguistic
logical form. The technique can be applied on either long or short tasks, which may be
content summarizations and news reports, or product descriptions on online shopping
website respectively. In the past few years, DL approaches have made huge progress
towards the language generation tasks. Ideally, the natural language generalization
is trained as end to end NN models consisting of an encoder and a decoder. The
Using Deep Learning Based Natural Language … 281
encoder will serve to produce the hidden representation of the source text while the
decoder aims to generate the target text.
Choi et al. [63] apply the natural language generalization techniques to tackle the
problem of data scarcity. They propose a deep learning based generative adversarial
network (GAN) model to synthesize data in EHR systems. The model is capable
of learning distributions of the count-valued and binary-valued variables with two
neural networks. The first one serves to generate fake records while the other serves
to distinguish which records are real and which records are fake. The advantage of
this system is that the GAN model is able to generate patient level records which
are needed for the study while keeping the patient’s personal information in privacy.
However, the GAN system proposed by Choi et al. could only produce discrete data,
and fails to produce free-text records of the EHR system which are valuable for the
research community.
Lee [64] introduce an end-to-end DL encoder-decoder algorithm to build syn-
thetic chief complaints from the electronic health record discrete variables include
age, gender and exercised diagnosis. These generated synthetic chief complaints take
advantage of the optimization process of the model, which allow them to eliminate the
comparably uncommon medical abbreviations and misspellings, while protecting the
patients’ privacy with de-identification characteristic by preserving no personally-
identifiable information (PII). Those chief complaints are preprocessed with LSTM
model to downsize the matrix of word embedding. The encoder is constructed with
a single feed forward network layer to compress records to LSTM cell, while the
decoder is also a single layered LSTM model following be Vinyals et al. [65]. Fol-
lowing the same concept, the word embedding matrix is adopted to transform the
complaints from integer sequences to dense vector sequences, while the softmax
activation function and a feedforward layer are applied to deliver the final output of
the predicted word probabilities.
Besides natural language generalization, text summarization is also a common
problem and research subfield in natural language processing. Text summarization
refers to the creation of brief, accurate and fluent summary of a longer piece of text
corpus. Being able to summarize text automatically will help not only to discover
and extract relevant information easier, but also to consume them more efficiently.
In general, there are two different ways of summarizing texts in natural language
processing tasks. The first is extraction-based summarization. The extraction-based
techniques refer to the set of algorithms and models that can pull out key terms and
phrases from the source document and can join them fluently into a summary in the
end. The second approach is an abstraction-based summarization. This approach is
based on the techniques of re-paraphrasing and shortening the pieces of information
contained in source documents. In other words, the abstractive summarization meth-
ods are able to create or rewrite new terms, phrases and sentences like human beings
to relay the most important information from the source documents. Thus, algorithms
for extractive summarization is still relatively more popular as the abstraction based
ones are more difficult to develop and adopt.
Liu et al. [66] apply the extractive summarization technique to the data in EHR
system in medical domain. They used an unsupervised pseudo-labeling approach to
282 R. Zhu et al.
study how to make use of the intrinsic correlation between different data in EHR.
Their proposed method is capable of generating pseudo-labels while training the
supervised models without any external sources of annotated data. For purpose of
finding a subset that can give the best summary of the entire document of patient’s
information, they train supervised model without direct human annotations. Then the
intrinsic correlation between medical notes and the patient is used to find pseudo-
labels and produce summaries to find out the answers to three research questions
they proposed [66]. As the model proceeds to these questions, the system answers
the RQ1 by learning the clinical entities relate to specific disease. For RQ2, the model
generates binary label vectors for notes and applies Integer Linear Programming to
train data with pseudo labels while optimizing the results. For RQ3, the medical
records are summarized by a supervised neural model, the two layered Bidirectional
GRU. In general, the study confirms the effectiveness of the proposed model in
text summarization task by showing it outperforming other existing unsupervised
baselines. It can also be improved in future to further help physicians to understand
medical histories of patients better while reducing clinical costs even more.
Datta et al. [67] released a scoping review of the existing medical NLP techniques
applied in cancer study earlier in 2019. It aims to provide a valuable resource of
the cancer frames annotations as well as the related natural language processing
tools on general purpose. The paper summarizes the trending NLP techniques, that
are able to learn useful features related to cancer, from the EHR system with a
wide range of data collections. By reviewing 79 papers, the authors create frame
semantic principles with pertained information including cancer diagnosis, tumor
descriptions, cancer procedure, breast cancer diagnosis, prostate cancer diagnosis
and pain in prostate cancer patients. [67] The study reviews that most of the recent
work have a specialization on information extraction towards treatment and breast
cancer diagnosis, meanwhile cancer diagnosis amounts the top one focus of all the
reviewed papers, with a quantity of 36 out of 79.
patient to predict the likelihood of certain disease, diagnosis or outcomes. The prob-
ability scores provided by physicians are heavily correlated to one’s clinical experi-
ences. Therefore, the computer-based technologies providing data to physicians act
as the best human physician’s assistant in the prediction process. These technologies
include but are not limited to understanding medical codes, reading behind clinical
notes, interpreting time-series data, and handling medical scans. Among the existing
literature, the clinical prediction tasks are split into two subfields, the general clinical
predictions and the specific disease targeted medical predictions.
Zeng et al. [68] reviews and compares the traditional NLP methodologies as well as
the DL-based NLP techniques used in disease phenotyping of the EHR system in
the past few years. This paper gives a thorough review on the current applications
of the EHR-based computational phenotyping, as well as the NLP-based computa-
tional phenotyping methods. On one hand, traditional keyword search and rule-based
approaches give promising results for the prediction task, however, these methods
require human to compute manually which is very costly. Supervised machine learn-
ing models that are able to perform data pattern and structure classification are instead
popular among the researchers because of their capability. On the other hand, as the
DL methods grow to be more important in the natural language processing field,
more studies begin to conduct deep learning approaches due to its power of gen-
erating novel phenotypes. As a result, despite of posing some opportunities and
challenges remained in the field, this paper also concludes that a combination of
multiple sources of the data information from the EHR system would produce better
performance in general.
Rajkomar et al. [69] present a deep learning-based patient’s representation with
his/her entire medical record in the EHR system using Fast Healthcare Interoperabil-
ity Resources (FHIR) format. With the help of sequential format and the procedure
of patient de-identification, they study 46,864,534,945 data points generated from a
sample size of 216,221 patients in their adulthood who had been hospitalized for 24 h
minimum in the two American academic medical centers. Since the patient records
contain different data points in length and in density, the authors proposed three
different deep learning models to tackle the issue. The first model used is the LSTM
based on RNN; the second is an attention-based time-aware NN model; and the third
one is a NN built with boosted time-based decision stumps. All experimental results
show that the proposed method outperforms all other traditional predictive models
used in the current clinical studies, and is able to predict multiple clinical events
happened across multiple medical centers accurately without harmonizing data with
specific sites. In future, this method could have the potential to extend to a variety
of scenarios due to its promised accuracy and scalability.
Zhang et al. [70] present a novel meta learning approach to predict clinical risks
from longitudinal patient EHRs, named MetaPred. The MetaPred uses a list of related
risk prediction tasks to teach and train the meta-learner how to learn a good predictor
284 R. Zhu et al.
for predicting target risks where patient data is in limited. Specifically, the MetaPred
framework is built on the model agnostic meta learning strategy to generate risk
predictor from specific domain. Meanwhile, the meta-learners can directly serve as
inputs into the risk prediction function, while those limited data can help to boost the
model performance further with fine-tuning. The risk prediction models adopted in
the experiments are either computed on CNN or LSTM based RNN. The experimental
results conducted on real patients’ data provided by Oregon Health and Science
University show that the CNN and the RNN based MetaPred predictor can outperform
all other predictors trained with limited samples significantly.
Hosseini et al. [71] introduce a heterogeneous information network named Het-
eroMed to run predictions on accurate and robust clinical diagnoses with high-
dimensional data and abundant relationships within the EHR data. The suggested
model can get higher level semantic relationships between words and terms in EHR
system for disease diagnoses with heterogeneous network embedding, while han-
dling the missing values and heterogeneous data directly. Furthermore, it can also
empower its joint embedding framework to accommodate the representations of
medical events to the goal of disease diagnoses. As the very first study to model clin-
ical data and disease diagnoses with Heterogeneous Information Network (HIN),
the HeteroMed achieves significantly better results over other existing literatures in
diagnoses codes extraction and disease prediction.
Avati et al. [72] present a scoring rule and a generalization of continuous ranked
probability score (CRPS) to make survival rate predictions, named Survival-CRPS,
as well as two variants of right and interval-censored. Aside from that, in order
to evaluate the quality of event predictions over time, Survival-AUPRC evaluation
metric is proposed to compute a precision-recall like curve. To prove the efficiency of
the introduced techniques, this paper runs experiments on EHR data with a multilayer
deep RNN model to test the accuracy of the patients’ survival rate as the prediction
model. The model intakes a sequence of features to predict the mortality probability
over time. And the results from the extensive experiments prove that the proposed
RNN method dominates the success of large-scale survival predictions with log-
normal parameterization.
Chung et al. [73] takes back the scope of the research from great population back
to individualized and reliable patient-centric prediction model. The proposed frame-
work aims to extract useful information from the EHR system to make promising
predictions and to provide tailored clinical services of disease diagnoses, treatment,
intervention and prevention with time-series data. The framework consists two parts:
(1) a globally developed section which could capture trends across various groups of
patients; (2) an individualized section to model tailored services for each patient. To
combine the two sections together, a RNN model to capture global patients trends
versus a Gaussian Processes probability model to capture individual patient’s char-
acteristics are built together on top of a deep RNN foundation to make clinical
predictions more accurate and credible.
Heo et al. [74] propose the input-dependent uncertainty notion to attention mech-
anism in their work, realizing that the attention mechanisms can sometimes be unre-
liable when they are generated from weakly supervised networks. Indeed, the newly
Using Deep Learning Based Natural Language … 285
proposed notion can build attention to each feature by learning the input noise level
effectively. The general framework of the study is based on stochastic attention
mechanism. Then the attentions are generated by the stochastic mechanism with
input-adaptive Gaussian noise and variance inference. After that, an attentional RNN
model with both timesteps attentions and feature attentions is adopted learn the pre-
diction possibilities. As a result, the uncertainty-aware attention mechanism shows
significantly better performance on the training datasets than baseline models.
Wang et al. [75] take a different way to consolidate supervised learning and the
reinforcement learning methods together in their study to generate recommendations
to patient treatments. They present this novel architecture of Supervised Reinforce-
ment Learning with Recurrent Neural Network (SRL-RNN) to act as an off-policy
actor-critic framework to deal with the complicated relationships between different
types of data in the EHR system. The indicator signal and the evaluation signal then
co-supervise the actor in SRL-RNN to generate effective prescriptions and low rate
of mortality. In the real world of the medical domain, there is always a limit in fully
observed states. Because of this characteristic, the RNN is further adopted here to
tackle this problem of Partially-observed Markov Decision Process (POMDP). The
paper conducts experiments on MIMIC-III data. And the results have shown that the
proposed architecture is able to provide ideal accuracy rate in doctor’s prescriptions
matching as well as to lower the estimated mortality rate.
Pham et al. [76] present this end-to-end approach of DNN, DeepCare, which can
interpret clinical records, save patient’s all medical histories, infer the current medi-
cal conditions and make possible clinical predictions base on the given information.
DeepCare is built on the LSTM recurrent neural network that can store memories
of the applicable experiences. At each micro data level, DeepCare uses the LSTM
model to read the given input, to update the memory cell and to represent the care
episodes as output in the system. It also functions to suggest medical interventions for
helping patients with current illness, future clinical risks. At the macro health state
level, the DeepCare also learn and aggregate the recorded health states by applying
multiscale temporal pooling to get them fed into the deep dynamic neural networks
for future estimations. The experiments are done on two chronical diseases, diabetes
and mental health, to prove that the proposed method is capable of modeling dis-
ease progression, recommending possible clinical interventions, improving general
modeling and making clinical predictions accurately.
Ma et al. [77] believe the importance of incorporating prior medical knowledge
into risk prediction tasks. Therefore, in this study, they initiate a new deep learning
approached PRIME framework using posterior regularization method to incorporate
all prior knowledge into the predictive models. Specifically, this paper introduced
PRIMEr and PRIMEc models based on LSTM and CNN respectively to practice
the prediction steps. Besides, the prior knowledge applied in risk prediction model
are totally without human intervention while doing disease distribution, in other
words, the knowledge doesn’t need to be processed by human to set boundaries. By
modeling log linear to prior knowledge, the PRIME framework could even learn the
importance of each piece of prior knowledge automatically.
286 R. Zhu et al.
Suresh et al. [78] focus on real-time clinical predictions in the data from intensive
care units (ICUs) in MIMIC III. Different from previous studies, this work integrates
ICU based data from all different sources to focus on learning insightful represen-
tations for clinical interventions predictions. Particularly, the authors compared the
two most commonly used approaches to exploit clinical decisions, the LSTM and
CNN, on 5 tasks of clinical interventions: invasive ventilation, non-invasive venti-
lation, vasopressors, colloid boluses, and crystalloid boluses [78]. The experiments
have shown great results when comparing to other state-of-the-art literatures.
Choi et al. [31] propose a RNN model for clinical prediction. This RNN-based
model takes historical data of ICD codes in EHR system as raw inputs to perform
multilabel predictions over a period of time. Specifically, the proposed model predicts
possible future visits of the patients, possible future diagnoses practiced by physi-
cians, and possible future use of medications. Choi et al. apply skip-gram embedding
to ICD codes as inputs to initialize a scheme for the recurrent neural network model.
These high dimensional input vectors are projected to a lower dimensional space
through RNN by gated recurrent units. Finally, the patient’s potential next visit is
predicted by a rectified linear unit (ReLU), while diagnoses codes and medication
codes are predicted by a softmax layer. Thus, the medical concepts and the patients
are better represented in the proposed architecture. Meanwhile, the experiments ran
in this paper suggest the potential of adopting RNN based models to other medical
systems by transfer learning, as well as the opportunity of medical systems with
insufficient patient data to improve clinical predictions towards their smaller client
base.
Lasko et al. [79] propose a computational phenotype discovery method in EHR
clinical data. Since the nature of medical data in EHR system is unstructured, noisy
and sparse, the method adopts a deep learning approach with longitudinal probability
densities inferred from Gaussian process regression to study these clinical data. As a
result, the study produces continuous phenotypic features accurately to indicate the
multiple population subtypes among data collection.
Liang et al. [80] propose a deep belief networks-based model to tackle the
computer-aided medical decision making (CAMDM) issues, with a focus on clin-
ical decision-making support and medical data analyses in the traditional Chinese
medicine in mid 2019. The model adopts an unsupervised learning algorithm of
seven layered deep belief network (DBN) to get feature representations following
by a supervised learning model of support vector machine (SVM) on top of the deep
belief network. The experimental results suggest that the novel deep learning DBN +
SVM model outperforms simple decision tree and SVM models in computer-aided
medical decision-making tasks.
Earlier in 2014, Liang et al. [81] used a convolutional deep belief network to train
the electrical medical records to support clinical decision making. The experiments
were run on a dataset of hypertension retrieved from HIS system, and a dataset on
Chinese medical diagnosis and treatment prescriptions in manually converted EHR
system. The experimental results are able to perform significantly better than the
conventional shallow models in discovering previously unknown medical concepts.
Using Deep Learning Based Natural Language … 287
Except for the studies conducted on general disease, clinical or patient’s trends pre-
dictions, many of the existing literature narrow down their research fields to some
specific disease predictions. Among all those work, diabetes disease is very popular
in the past few years.
Mei et al. [82] take the raw data from the EHR system as inputs to construct their
proposed “Deep Diabetologist” model with RNN for EHR sequential data model-
ing. The goal of their study is to generate personalized clinical predictions, on hypo-
glycemia medicines specifically, for the diabetic patients. The data preprocessing was
done by linking patient IDs together with the event IDs. Then, the RNN medication
prediction model is adopted to generate the probabilities, following by a hierarchi-
cal RNN model of medication prediction to follow those time steps. Compared to
other baseline models, the hierarchical RNN model outperforms them significantly,
while provide more useful insights for future physicians and researchers to conduct
secondary studies.
While most of the existing work takes raw data from the EHR system as inputs
to their proposed model, Sousa et al. [83] studies the chronic diabetic disease with
financial records from health plan providers solely to make predictions on the disease
evolution. They believe the financial data is a way of aligning towards the interna-
tional standard where the records can encode medical procedures. The proposed
experiment is exercised on a self-attentive RNN model, where the most relevant
sentences are expected to be selected. Specifically, the input embedding layer is
pretrained with Word2Vec Skip-Grams. Then, the model’s embedding layer is con-
nected to two fully connected layers of BiLSTM model along with self-attention
mechanism. The experimental results generated from the study show it as an effec-
tive way of predicting diabetes, however, a full paper on the task is still waiting to
be published.
Section 3 gives a clear picture of the current status of the published literature adopting
deep learning-based natural language processing techniques to medical domain. It
is also worth reviewing the existing problems and challenges in the research that are
remained to be solved. The challenges include but are not limited to,
(1) Data Volume: The foundation for deep learning-based techniques to perform
well is to have a huge amount of data. In healthcare sector, limited accessibility
of primary healthcare in certain areas and the fact that most patients perceive
medical records as one of the most private information and are not willing to
share are largely affecting the volume of data for research.
(2) Data Variability: The task of collecting a wide and unbiased variability of data
is difficult.
288 R. Zhu et al.
(3) Data Quality: Unlike the data generated in other domain, healthcare data are
by nature “dirty” data which are heterogenous, unstructured, noisy, incomplete
and ambiguous. Data preprocessing for deep learning models is challenging and
sometimes very time-consuming.
(4) Uncertainty: Diseases or viruses are developing and evolving in an uncertain pat-
tern all the time. Therefore, designing the deep learning based natural language
processing techniques to tailor this temporal data characteristic is important.
(5) Causal Inference: Identifying a reasonable and rational causal relationship
between viruses and diseases or treatments and patient’s body reactions are
never easy. Kale et al. [84] initially proposed a DNN to approach the causal
inference issue by analyzing the relationships between hidden feature represen-
tations to generated outputs.
(6) Interpretability: Interpretability has remained as one of the biggest challenges
in DL approaches in medical domain although they have delivered promising
results. Scholars often refer the deep learning methods as a black box that it is
hard to interpret how and why the proposed algorithms can perform so well.
Since all the medical results generated are closely related to life and death
problem, it is still hard to convince healthcare providers to practice exactly
what the machines recommend humans to do.
(7) Legal and Ethical Issues: As Choi et al. [63] discussed in their 2018 paper,
data privacy and synthetic patient’s EHR are rising issues in medical domain. In
many countries, patients’ EHR records are confidential data that are not allowed
to be shared across health institutions nor across industries. Research institu-
tions including government usually find it hard to study ongoing diseases as
the real patients’ data are at the hands of the hospitals. Thus, as the owner of
the data, hospitals usually need to form their own teams to conduct research in
specific medical domain. However, this legal and ethical data sharing restriction
would not only limit the scale of the study but also limit the diversity of the data
resulting in less efficient experimental results. There have been a few papers
focusing on this issue. For example, Choi et al. [63] try to find ways of solv-
ing the problem of limited data availability by proposing a novel deep learning
approach, medical GAN (medGAN), to generate synthetic patient records for
medical research. Given the real patients’ medical records, the proposed model
generates high-dimensional discrete variables by combining autoencoders and
GANs. Furthermore, they use minibatch average values to avoid collapse of the
mode, and to improve the efficiency of machine learning with batch normaliza-
tion and shortcut connections. To sum up, the presented approach demonstrates
the ability to produce synthetic patient records of comparable performance to
real data collections on many medical prediction tasks.
Using Deep Learning Based Natural Language … 289
This book chapter provides a thorough review of deep learning-based NLP techniques
applied towards the clinical research. It presents the current status of deep learning-
based NLP models and their recent changes, as well as reviews the existing techno-
logical adoption in specific medical NLP tasks. Unlike other existing surveys, we use
a novel structure of methods categorization to split the published deep learning-based
NLP techniques for clinical decision making into three major task-oriented groups:
representation learning, information extraction and clinical predictions. Meanwhile,
from the experimental results presented in these literatures, we believe that it is
still early for the deep learning approaches to revolutionarily change the healthcare
industry due to its embedded challenges and problems such as uncertainty, inter-
pretability and ethical issues. However, recent advances and improvements made by
the proposed deep learning-based NLP models have suggested a promising start.
Thus, further research and studies towards various directions in the medical domain
are necessary. Some possible research directions towards future work include but are
not limited to,
(1) Feature enrichment: To solve the data volume and variability problem, we should
enrich models by capturing as many features as possible to get a well represen-
tation of patients and build more robust models to process the growing number
of features.
(2) Privacy control: Government could possibly collect and provide inference to
medical data at the federal level to protect patient’s privacy while allowing
necessary research projects to conduct efficiently.
(3) Incorporating expert knowledge into current deep learning approaches: Due to
all the challenges presented above and the limited amount of diverse data, human
experts will continue to play a dominant role in the healthcare sector in the near
future. Therefore, incorporating the invaluable expert knowledge into current
deep learning processes can not only produce better results, but also train the
machines to learn in a more accurate way. And,
(4) Improving model interpretability: The performance of the DL models and the
interpretability of the model performance are equally important in the healthcare
sector. It is a serious ethical problem for healthcare providers to adopt a system
if they do not understand. Therefore, in the future studies, it is necessary to find
logical and reasonable explanations about how and why the black box of the
DNN can perform well on given tasks.
Acknowledgements This work is supported by the Natural Sciences and Engineering Research
Council (NSERC) of Canada, an NSERC CREATE award in ADERSIM,1 the York Research Chairs
(YRC) program and an ORF-RE (Ontario Research Fund-Research Excellence) award in BRAIN
Alliance.2
1 http://www.yorku.ca/adersim.
2 http://brainalliance.ca.
290 R. Zhu et al.
References
24. Choi, T., Xiao, C., Stewart, W.F., Sun, J.: MiME: multilevel medical embedding of electronic
health records for predictive healthcare. arXiv. https://arxiv.org/pdf/1810.09593.pdf
25. Escudie, J.-B., Saade, A., Coucke, A., Lelarge, M.: Deep representation for patient visits from
electronic health records. arXiv. https://arxiv.org/pdf/1803.09533.pdf
26. Choi, E., Schuetz, A., Steward, W.F., Sun, J.: Medical concept representation learning from
electronic health records and its application on heart failure prediction. arXiv. https://arxiv.
org/abs/1602.03686 (2017)
27. De Vine, L., Zuccon, G., Koopman, B., Sitbon, L., Bruza, P.: Medical semantic similarity
with a neural language model. In: Proceedings of the 23rd ACM International conference
on Information and Knowledge Management-CIKM ‘14, 3–7 Nov 2014, Shanghai, China,
pp. 1819–1822. ACM, New York, NY, USA
28. Choi, E., Chiu, C.Y., Sontag, D.: Learning low-dimensional representations of medical con-
cepts. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5001761/pdf/2381736.pdf (2016)
29. Minarro-Gim ́enez, J.A., Mar ́ın-Alonso, O., Samwald, M.: Exploring the applica-
tion of deep learning techniques on medical text corpora. Studies in health technology and
informatics (2013)
30. Liu, J., Zhang, Z., Razavian, N.: Deep EHR: chronic disease prediction using medical notes.
arXiv. https://arxiv.org/pdf/1808.04928.pdf (2018)
31. Choi, E., Bahadori, M.T., Schuetz, A., Stewart, W.F., Sun, J.: Doctor AI: predicting clinical
events via recurrent neural networks. arXiv. https://arxiv.org/abs/1511.05942 (2016)
32. Choi, E., Bahadori, M.T., Searles, E., Coffey, C., Thompson, M., Bost, J., Tejedor-Sojo, J.,
Sun, J.: Multi-layer representation learning for medical concepts. In: Proceedings of the 22nd
ACM SIGKDD International Conference Knowledge Discovery and Data Mining—KDD
’16’, 13–17 Aug 2016, San Francisco, CA, USA, pp. 1495–1504. ACM, New York, NY, USA
(2016)
33. Li, C., Song, R., Liakata, M., Vlachos, A., Seneff, S., Zhang, X.: Using word embedding
for bio-event extraction. In: Proceedings of the 2015 Workshop on Biomedical Natural Lan-
guage Processing (BioNLP 2015), Beijing, China, 30 July 2015, pp. 121–126. Association
for Computational Linguistics, Stroudsburg, PA (2015)
34. Tang, B., Cao, H., Wang, X., Chen, Q., Xu, H.: Evaluating word representation features in
biomedical named entity recognition tasks. Biomed. Res. Int. 2014, 1–6 (2014). https://doi.
org/10.1155/2014/240403
35. Miotto, R., Li, L., Kidd, B.A., Dudley, J.T.: Deep patient: an unsupervised representation to
predict the future of patients from the electronic health records. Sci. Rep. 6, 26094 (2016).
https://doi.org/10.1038/srep26094
36. Dligach, D., Miller, T.: Learning patient representations from text. ARXIV. https://arxiv.org/
pdf/1805.02096.pdf
37. Zhang, Z., Kowsari, K., Harrison, J.H., Lobo, J.M., Barnes, L.E.: Patient2Vec: a personalized
interpretable deep representation of the longitudinal electronic health record. arXiv. https://
arxiv.org/pdf/1810.04793.pdf
38. Denaxas, S., Stenetorp, P., Riedel, S., Pikoula, M., Dobson, R., Hemingway, H.: Application
of clinical concept embeddings for heart failure prediction in UK EHR data. arXiv. https://
arxiv.org/pdf/1811.11005.pdf
39. Wei, X., Eickhoff, C.: Embedding electronic health records for clinical information retrieval.
arXiv. https://arxiv.org/pdf/1811.05402.pdf
40. Zhu, Z., Yin, C., Qian, B., Cheng, Y., Wei, J., Wang, F., Measuring patient similarities via a
deep architecture with medical concept embedding. arXiv. https://arxiv.org/pdf/1902.03376.
pdf
41. Liu, L., Li, H., Hu, Z., Shi, H., Wang, Z., Tang, Z., Zhang, M.: Learning hierarchical rep-
resentations of electronic health records for clinical outcome prediction. arXiv. https://arxiv.
org/pdf/1903.08652.pdf
42. Liu, Y., Ge, T., Mathews, K., Ji, H., McGuinness, D.: Exploiting task-oriented resources
to learn word embeddings for clinical abbreviation expansion. In: Proceedings of the 2015
Workshop on Biomedical Natural Language Processing (BioNLP 2015), Beijing, China, 30
July 2015. Association for Computational Linguistics, Stroudsburg, PA, pp. 92–97 (2015)
292 R. Zhu et al.
43. Wu, Y., Xu, J., Zhang, Y., Xu, H.: Clinical abbreviation disambiguation using neural word
embeddings. In: Proceedings of the 2015 Workshop on Biomedical Natural Language Process-
ing (BioNLP 2015), Beijing, China, 30 July 2015. Association for Computational Linguistics,
Stroudsburg, PA, pp. 171–176 (2015)
44. Li, C., Ji, L., et al.: Acronym disambiguation using word embedding. In: Proceedings of the
29th AAAI Conference on Artificial Intelligence (2014)
45. Gligic, L., Kormilitzin, A., Goldberg, P., Nevado-Holgado, A.: Named entity recognition in
electronic health records using transfer learning bootstrapped neural networks. arXiv. https://
arxiv.org/pdf/1901.01592.pdf
46. Sachan, D.S., Xie, P., Sachan, M., Xing, E.P.: Effective use of bidirectional language modeling
for transfer learning in biomedical named entity recognition. arXiv https://arxiv.org/pdf/1711.
07908.pdf (2018)
47. Gorinski, P.J., Wu, H., Grover, C., Tobin, R., Talbot, C., Whalley, H., Sudlow, C., Whiteley, W.,
Alex, B.: Named entity recognition for electronic health records: a comparison of rule-based
and machine learning approaches. arXiv. https://arxiv.org/pdf/1903.03985.pdf
48. Yin, X., Huang, X.J., Li, Z., Zhou, X.: A survival modeling approach to biomedical
search result diversification using wikipedia. IEEE Trans. Knowl. Data Eng. (TKDE) 25(6),
1201–1212
49. Huang, X., Zhong, M., Si, X.: York University at TREC 2005: genomics track. In: Proceedings
of the Fourteenth Text REtrieval Conference (TREC), Gaithersburg, Maryland, USA, 15–18
Nov (2005)
50. Huang, X., Hu, Q.: A bayesian learning approach to promoting diversity in ranking for biomed-
ical information retrieval. In: Proceedings of the 32nd Annual International ACM SIGIR
Conference on Research and Development in Information Retrieval (SIGIR), pp. 307–314.
Boston, MA, USA, 19–23 July (2009)
51. An, X., Huang, X., geNov: a new metric for measuring novelty and relevancy in biomedical
information retrieval (Special Issue on Biomedical Information Retrieval). Nov 2017, 68(11),
2620–2635 (2017)
52. Li, F., Zhang, M., Fu, G., Ji, D.: A neural joint model for entity and relation extraction
from biomedical text. BMC Bioinformatics 18, 1 (2017). https://doi.org/10.1186/s12859-
017-1609-9
53. Mehryary, F., Bjo¨rne, J., Pyysalo, S., Salakoski, T., Ginter, F.: Deep learning with minimal
training data: TurkuNLP entry in the BioNLP shared task 2016. In Proceedings of the 4th
BioNLP Shared Task Workshop, 13 Aug 2016, Berlin, Germany, pp. 73–81. Association for
Computational Linguistics, Stroudsburg, PA (2016)
54. Quan, C., Hua, L., Sun, X., Bai, W.: Multichannel convolutional neural network for biolog-
ical relation extraction. Biomed. Res. Int. 2016, 1–10 (2016). https://doi.org/10.1155/2016/
1850404
55. Pyysalo, S., Ginter, F., Moen, F., Salakoski, T.: Distributional semantics resources for biomed-
ical text processing. In: Proceedings of the Languages in Biology and Medicine (LBM ’13),
pp. 39–44, Tokyo, Japan, Dec 2013 (2013)
56. Cheng, Y., Wang, F., Zhang, P., Hu, J.: Risk prediction with electric health record: a deep
learning approach. SDM 2016. https://astro.temple.edu/tua87106/sdm16.pdf (2016)
57. Zhang, Z., Roy, A., Li, X., Espino, S., Clara, S., Khan, S., Luo, Y.: Using clinical narratives
and structured data to identify distant recurrences in breast cancer. arXiv. https://arxiv.org/
pdf/1806.04818.pdf
58. Galk´o, F., Eickhof, C.: Biomedical question answering via weighted neural network passage
retrieval. arXiv. https://arxiv.org/pdf/1801.02832.pdf
59. Li, H., Zhang, J., Wang, J., Lin, H., Yang, Z.: DUTIR in BioNLP-ST 2016: utilizing convolu-
tional network and distributed representation to extract complicate relations. In: Proceedings
of the 4th BioNLP Shared Task Workshop, 13 Aug 2016, Berlin, Germany, pp. 93–100.
Association for Computational Linguistics, Stroudsburg, PA (2016)
60. Rahul, P.V.S.S., Sahu, S.K., Anand, A.: Biomedical event trigger identification using bidi-
rectional recurrent neural network based models. arXiv. https://arxiv.org/abs/1705.09516v1
(2017)
Using Deep Learning Based Natural Language … 293
61. Jagannatha, A.N., Yu, H.: Bidirectional RNN for medical event detection in electronic health
records. In: Proceedings of the Conference Association for Computational Linguistics. North
American Chapter. Meeting. See https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5119627/
62. Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine
translation: encoder-decoder approaches. arXiv e-prints. 2014 Sep. 1409:arXiv:1409.1259
(2014)
63. Choi, E., Biswal, S., Malin, B., Duke, J., Stewart, W.F., Sun, J.: Generating multi-label discrete
electronic health records using generative adversarial networks. arXiv. https://arxiv.org/abs/
1703.06490v1 (2017)
64. Lee, S.: Natural language generation for electronic health records. arXiv. https://arxiv.org/
pdf/1806.01353.pdf
65. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator.
In: Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on 2015 Jun
7, pp. 3156-3164. IEEE (2015)
66. Liu, X., Xu, K., Xie, P., Xing, E.: Unsupervised pseudo-labeling for extractive summarization
on electronic health records. arXiv. https://arxiv.org/pdf/1811.08040.pdf
67. Datta, S., Bernstam, S.V., Roberts, K.: A frame semantic overview of NLP-based information
extraction for cancer-related EHR notes. arXiv. https://arxiv.org/pdf/1904.01655.pdf
68. Zeng, Z., Deng, Y., Li, X., Naumann, T., Luo, Y.: Natural language processing for EHR-based
computational phenotyping. arXiv. https://arxiv.org/pdf/1806.04820.pdf
69. Rajkomar, A., Oren, E., Chen, K., Dai, A.M., Hajaj, N., Liu, P.J., Liu, X., Sun, M., Sundberg,
P., Yee, H., et al.: Scalable and accurate deep learning for electronic health records. arXiv
preprint. arXiv:1801.07860 (2018)
70. Zhang, X.S., Tang, F., Dodge, H., Zhou, J., Wang, F.: MetaPred: meta-learning for clinical
risk prediction with limited patient electronic health records. arXiv. https://arxiv.org/pdf/1905.
03218.pdf
71. Hosseini, A., Chen, T., Wu, W., Sun, Y., Sarrafzadeh, M.: HeteroMed: heterogeneous infor-
mation network for medicaldiagnosis. arXiv., https://arxiv.org/pdf/1804.08052.pdf
72. Avati, A., Duan, T., Jung, K., Shah, N.H., Ng, A.: Countdown regression: sharp and calibrated
survival predictions. arXiv. https://arxiv.org/pdf/1806.08324.pdf
73. Chung, I., Kim, S., Lee, J., Hwang, S.J., Yang, E.: Mixed effect composite RNN-GP: a
personalized and reliable prediction model for healthcare. arXiv. https://arxiv.org/pdf/1806.
01551.pdf
74. Heo, J., Lee, H.B., Kim, S., Lee, J., Kim, K.J., Yang, K., Hwang, S.J.: Uncertainty-aware
attention for reliable interpretation and prediction. arXiv. https://arxiv.org/pdf/1805.09653.
pdf
75. Wang, L., Zhang, W., He, X., Zha, H.: Supervised reinforcement learning with recurrent neural
network for dynamic treatment recommendation. arXiv. https://arxiv.org/pdf/1807.01473.pdf
76. Pham, T., Tran, T., Phung, D., Venkatesh, S.: DeepCare: a deep dynamic memory model for
predictive medicine. arXiv. https://arxiv.org/abs/1602.00357v2 (2016)
77. Ma, F., Gao, J., Suo, Q., You, Q., Zhou, J., Zhang, A.: 2018 risk prediction on electronic health
records with prior medical knowledge. In: KDD ’18: The 24th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining, 19–23 Aug 2018, London, United
Kingdom. ACM, New York, NY, USA, p. 10. https://doi.org/10.1145/3219819.3220020
78. Suresh, H., Hunt, N., Johnson, A., Celi, L.A., Szolovits, P., Ghassemi, M.: Clinical intervention
prediction and understanding with deep neural networks. In: Machine Learning for Healthcare
Conference, pp. 322–337 (2017)
79. Lasko, T.A., Denny, J.C., Levy, M.A.: Computational phenotype discovery using unsupervised
feature learning over noisy, sparse, and irregular clinical data. PLoS ONE 8, e66341 (2013).
https://doi.org/10.1371/journal.pone.0066341
80. Liang, Z., Liu, J., Ou, A., Zhang, H., Li, Z., Huang, X.: Deep generative learning for automated
EHR diagnosis of traditional Chinese medicine. Comput. Methods Progr. Biomed. 174, 17–23
(2019)
294 R. Zhu et al.
81. Liang, Z., Zhang, G., Huang, X., Hu, Q.: Deep learning for healthcare decision making
with EMRs. In: Proceedings of 2014 IEEE International Conference on Bioinformatics and
Biomedicine (BIBM), pp. 556–559
82. Mei, j., Zhao, S., Jin, F., Xia, E., Liu, H., Li, X.: Deep diabetologist: learning to prescribe
hypoglycemia medications with hierarchical recurrent neural networks. arXiv. https://arxiv.
org/pdf/1810.07692.pdf
83. Sousa, R.T., Pereira, L.A., Soares, A.S.: Predicting diabetes disease evolution using financial
records and recurrent neural networks. arXiv. https://arxiv.org/pdf/1811.09350.pdf
84. Kale, D.C, Che, Z., Bahadori, M.T., Li, W., Liu, Y., Wetzel, R.: Causal phenotype discovery
via deep networks. AMIA Annual Symposium Proceedings https://www.ncbi.nlm.nih.gov/
pmc/articles/PMC4765623/ (2015)
85. Ghassemi, M., Naumann, T., Schulam, P., Beam, A.L., Ranganath, R.: Opportunities in
machine learning for healthcare. arXiv. https://arxiv.org/pdf/1806.00388.pdf (2018)
86. Lyu, X., Huser, M., Hyland, S.L., Zerveas, G., Ratsch, G.: Improving clinical predictions
through unsupervised time series representation learning. arXiv https://arxiv.org/pef/1812.
00490.pdf (2018)
87. Nickel, M., Kiela, D.: Poincar\’e embeddings for learning hierarchical representations. arXiv
preprint arXiv:1705.08039 (2017)
88. Greenland, S., Robins, J.M., Pearl, J.: Confounding and collapsibility in causal inference.
Stat. Sci., pp. 29–46 (1999)
89. Miotto, R., Wang, F., Wang, S., Jiang, Z., Dudley, J.T.: Deep learning for healthcare: review,
opportunities and challenges. Brief. Bioinform. 375, 4 (2017). https://doi.org/10.1093/bib/
bbx044
90. Wei, C.-H., Harris, B.R., Kao, H.-Y., Lu, Z.: tmVar: a text mining approach for extracting
sequence variants in biomedical literature. Bioinformatics 29, 1433–1439 (2013). https://doi.
org/10.1093/bioinformatics/btt156
91. Liu, S., Tang, B., Chen, Q., Wang, X.: Effects of semantic features on machine learning-based
drug name recognition systems: word embeddings vs. manually constructed dictionaries.
Information 6, 848–865 (2015). https://doi.org/10.3390/info6040848
92. Mohan, S., Fiorini, N., Kim, S., Lu, Z.: Deep learning for biomedical information retrieval:
learning textual relevance from click logs. In: Proceedings of the BioNLP 2017 Workshop,
Vancouver, Canada, 4 Aug 2017, pp. 222–231. Association for Computational Linguistics
Stroudsburg, PA (2017)
93. Ohno-Machado, L.: Realizing the full potential of electronic health records: the role of natural
language processing. J. Am. Med. Inform. Assoc. 18, 539 (2011). https://doi.org/10.1136/
amiajnl-2011-000501
94. Bruijn, Bd, Cherry, C., Kiritchenko, S., Martin, J., Zhu, X.: Machine-learned solutions for
three stages of clinical information extraction: the state of the art at i2b2 2010. J. Am. Med.
Inform. Assoc. 18, 557–562 (2011). https://doi.org/10.1136/amiajnl-2011-000150
95. Yoon, H.-J., Ramanathan, A., Tourassi, G.: Multi-task deep neural networks for automated
extraction of primary site and laterality information from cancer pathology reports. In:
Advances in big data, INNS 2016, 23–25 Oct 2016, Thessaloniki, Greece; Angelov, P.,
Manolopoulos, Y., Iliadis, L., Roy, A., Vellasco, M. (eds.)Advances in Intelligent Systems
and Computing, vol. 529. Springer, Cham (2016)
96. Beaulieu-Jones, B.K., Greene, C.S.: Semi- supervised learning of the electronic health record
for phenotype stratification. J. Biomed. Inform. 64, 168–178 (2016). https://doi.org/10.1016/
j.jbi.2016.10.007
97. Bowman, S.: Impact of electronic health record systems on information integrity: quality and
safety implications. Perspect. Health Inf. Manag. 10, 1c (2013)
98. Beaulieu-Jones, B.K., Wu, Z.S., Williams, C., Byrd, J.B., Greene, C.S.: Privacy-preserving
generative deep neural networks support clinical data sharing. bioRxiv https://doi.org/10.
1101/159756 (2017)
99. Letham, B., Rudin, C., McCormick, T.H., Madigan, D., et al.: Interpretable classifiers using
rules and bayesian analysis: building a better stroke prediction model. Ann. Appl. Stat. 9(3),
1350–1371 (2015)
Using Deep Learning Based Natural Language … 295
100. Robins, J.M.: Robust estimation in sequentially ignorable missing data and causal inference
models. Proc. Am. Stat. Assoc. 1999, 6–10 (2000)
101. Robins, J.M., Rotnitzky, A., Scharfstein, D.O.: Sensitivity analysis for selection bias and
unmeasured confounding in missing data and causal inference models. In: Statistical models
in epidemiology, the environment, and clinical trials. Springer, pp 1–94 (2000)
102. Papernot, N., McDaniel, P., Sinha, A., Wellman, M.: Towards the science of security and
privacy in machine learning. arXiv. https://arxiv.org/abs/1611.03814v1 (2016)
103. Xu, Z., Chou, J., Zhang, X.S., Luo, Y., Isakova, T., et al.: Identification of predictive sub-
phenotypes of acute kidney injury using structured and unstructured electronic health record
data with memory networks. arXiv. https://arxiv.org/pdf/1904.04990.pdf
104. Chou, E., Nguyen, T., Beal, J., Haque, A., Fei-Fei, L.: A fully private pipeline for deep learning
on electronic health records. arXiv. https://arxiv.org/pdf/1811.09951.pdf
105. Banerjee, I., Gensheimer, M.F., Wood, D.J., Henry, S., Chang, D., Rubin, D.L.: Probabilistic
prognostic estimates of survival in metastatic cancer patients (PPES-Met) utilizing free-text
clinical narratives. arXiv. https://arxiv.org/pdf/1801.03058.pdf
106. Kayali, I.: Expert system for diagnosis of chest diseases using neural networks. arXiv. https://
arxiv.org/pdf/1802.06866.pdf
107. de la Torre, J., Valls, A., Puig, D.: A deep learning interpretable classifier for diabetic retinopa-
thy disease grading. arXiv. https://arxiv.org/pdf/1712.08107.pdf
108. Holzinger, A., Malle, B., Kieseberg, P., Roth, P.M., M¨uller, H., Reihs, R., Zatloukal, K.:
Towards the augmented pathologist: challenges of explainable-ai in digital pathology. arXiv.
https://arxiv.org/pdf/1712.06657.pdf
Runjie Zhu is currently a Ph.D. student at Electrical Engineering and Computer Science Program
at Lassonde School of Engineering, York University. Her research interests are in information
retrieval, natural language processing, with a specialization in biomedical information retrieval,
Electronic Health Records, Clinical Decisions and Predictions.
Xinhui Tu is currently an Associate Professor at the School of Computer, Central China Normal
University. He received his Ph.D., master’s and bachelor’s degrees from Central China Normal
University in 2012, 2006 and 2001, respectively. His current research interests include information
retrieval and natural language processing. He has published more than 30 papers in the leading
journals and conferences, such as SIGIR, CIKM, etc.
Jimmy Huang School of Information Technology holds a York Research Chair Professorship.
His research focuses on information retrieval, AI and big data analytics with complex structures
and their applications to Web & healthcare. He has published 230+ papers in top-tier venues (e.g.
ACM Transactions on Information Systems, IEEE Transactions on Knowledge & Data Engineer-
ing, ACM SIGIR, CIKM, KDD, ACL, IJCAI and AAAI). The outcome of his contributions in
developing and applying probabilistic modeling techniques to large-scale data analysis had sig-
nificant impacts on both academia and industry. He was and will be General Chairs for the 19th
CIKM and 43rd SIGIR.
Deep Learning for Medical Image
Processing
Diabetes Detection Using ECG Signals:
An Overview
1 Introduction
Biosignals are biological signals extracted from the human body or in general human
beings. Commonly referred biosignals are electrical in nature, but there are nonelec-
trical biosignals also. Some examples of biosignals are electrocardiography (ECG)
which measures electrical activity of the heart, electroencephalography (EEG) which
measures brain activity, photoplethysmography (PPG) which depicts the volumetric
changes of an organ. ECG signal is employed for the noninvasive diagnosis of dia-
betes. ECG is used by clinicians to electrically measure rhythm of the heart, attaching
electrodes to the skin surface. ECG depicts the complete electrical patterns of the
heart including atrial depolarization and ventricular repolarization. Heart rate vari-
ability (HRV) data is extracted out of ECG signal. HRV is a simple, but powerful
signal which clearly reflects the condition of cardiovascular system.
From the initial days, biosignals are processed and analysed mainly through
extracting features and then classifying them. These processes are performed by
developing computer-aided design (CAD) systems. The features are manually
selected and need to be optimal since identification of suitable features requires
domain knowledge. The performance of above-mentioned approaches is not satis-
factory as the complexity of the data increases. The analysis of complex, high dimen-
sional, real-world data can be effectively done using deep learning. Deep learning
is done using deep learning architectures typically made up of very large number of
hidden layers and containing millions of neurons interconnected in a structure simi-
lar to a 2D matrix. These complex networks are capable of handling and analysing
complex, very large sized and very high dimensional data. Raw data (or data under-
gone very little signal processing) can be directly fed into these networks. Each
layer of the network produce at its output, representations which are automatically
designed by the deep learning network, using a general learning method (in place of
manually decided feature extraction in the case of machine learning based typical
neural networks, which are very small sized and of very simple structure compared
to deep learning networks). Though deep learning networks are commonly used
for two-dimensional image analysis problems, it can be very effectively used for
one-dimensional data also. We review application of deep learning based methods
to one-dimensional HRV data. The main bottle neck of applying deep learning to
biosignal in general is the present non-availability of very large sized training data
belonging to medical domain which is required for training deep learning networks
having gigantic number of parameters.
Diabetes mellitus, which is usually called diabetes, is a long-term metabolic dis-
order wherein the body is incapable of metabolizing glucose (sugar) properly. This
creates a very high level of glucose in the blood (this condition is known by the term
hyperglycaemia). Insulin is a hormone that is necessary for the body cells to absorb
Diabetes Detection Using ECG Signals: An Overview 301
blood glucose (produced from the carbohydrates in the food we intake) and to store
glucose for future needs. The condition of diabetes is either because of the incapa-
bility of the body to generate sufficient insulin or because of the state where body
cells do not react to the generated insulin. Medically, there is no cure for diabetes.
Hence it should be properly controlled. Below are the different types of diabetes.
Type 1 diabetes is the name of the diabetes found in children. In type 1 diabetes,
the immune system of the body destroys its own beta cells resulting in deficiency
of insulin. Type 2 diabetes is the common type of diabetes that develops in adults
usually above the age of 40. The cells generally become insensitive to the insulin
produced or the cells are unable to use the produced insulin properly. This is known as
insulin resistance. Gestational diabetes is the glucose intolerance developed during
pregnancy period. Out of these three types, type 2 diabetes is the most commonly
prevalent type. In this chapter, we mean type 2 diabetes by the word diabetes.
A 2017 statistics estimates that 8.8% of people worldwide have diabetes. It is
rising more alarmingly in underdeveloped countries. According to National Diabetes
Statistics Report 2017 (pertaining to United States), about 9.4% of U.S. population
has diabetes in 2015. Of these, about 23% were not aware or did not report having
diabetes (diabetes was undiagnosed for them).
As per the statistics of International Diabetes Federation, India has a diabetes
population of 6.9 crores. India is the country having the second largest diabetes
population in the world. Kerala is one among the states having the largest number
of diabetes affected people in India. As per the new statistics of Indian Medical
Association (IMA), in Kerala every year, 138 people are being newly diagnosed
by diabetes out of a population of 1000 people. Some of the consequences due to
diabetes have been briefed by World Health Organization (WHO) as follows. In
2015, approximately 1.6 million deaths globally were directly caused by diabetes.
Almost 50% of these deaths happen earlier to 70 years of age because of increased
blood sugar.
Diabetes causes damage to nerves known as diabetic neuropathy. Diabetes
increases the possibility of heart ailments and stroke. About 50% of diabetes inflicted
people die due to heart related complications. Diabetes can lead to amputation of
limb caused by neuropathy in feet. Another problem caused is diabetic retinopathy
wherein the nerve problem caused by diabetes can cause heavy damage to blood
vessels in retina which may affect eye vision (10% of diabetic people), may lead to
blindness (2% of diabetic people) also. Death comes in the form of kidney failure
in average 15% of diabetic people. Thus, over time, uncontrolled diabetes leads to
serious damage of many vital organs of the body like heart, blood vessels, kidneys
(nephropathy), nerves, feet and eyes. Diabetes deaths are mainly due to complications
caused by the disease.
Hyperglycaemia in less severe condition is known as impaired glucose tolerance.
This condition is characterised by high risk of large blood vessel disease and may
lead to complications like myocardial infarction. The impaired glucose tolerance
condition does not considerably lead to microvascular disease similar to the condition
of diabetes induced hyperglycaemia.
302 G. Swapna et al.
All the above data and reports underline the necessity and challenges in the devel-
opment of effective diabetic detection and management methods. Some of the symp-
toms of hyperglycaemia due to diabetes are enormous urine excretion, high levels of
thirst, hunger and fatigue. Reduction in weight and impairment in vision are likely to
happen. In terms of diagnosis, major challenge is the fact that these symptoms are not
that marked at the onset of diabetes. Symptoms get pronounced only after diabetes
worsens to the extent of leading to complications. To minimize such complications,
early detection of diabetes is important. Methods should be developed that will help
to prevent or delay diabetes. Effective ways should be developed for diagnosis and
treatment of this disease. Further challenge is developing methods which are capable
to predict much early diabetes in a cost effective way so that corrective steps and
treatment can be given in time to avert diabetes, thus also saving the person from the
serious complications to which diabetes if undetected or not properly managed can
lead to.
Here, we review methods that are related to non-invasive diagnosis methods of
diabetes with high accuracy using HRV signals derived from ECG signals. Heart rate
value based diabetes detection has been observed to be computationally efficient than
the decision theoretic approach and hence has been heavily explored. Deep learning
methods are now being increasingly used in healthcare analytics. Initially, machine
learning techniques were extensively used for HRV based diabetes detection. Deep
learning architectures have the potential to improve the accuracy of diabetes detection
by capturing minute variations in ECG. Further big stride possible in future is the
prediction of diabetes if sufficiently large amount of training and testing data are
made available.
In this chapter, Sects. 2 and 3 provide discussion of the relevant medical aspects of
diabetes and its detection methods. Sections 4 and 5 detail the machine learning and
deep learning methods used by researchers for diabetes detection. Section 6 gives
the detailed literature survey of works using ECG-derived-HRV as input for diabetes
detection. A sample architecture and implementation details are described in Sect. 7.
The limitations and challenges of deep learning methods are discussed in Sect. 8.
The chapter concludes with Sect. 9.
2 Diabetes
Glucose homeostasis is the natural regulation mechanism of the body by which the
blood glucose (blood sugar) levels are maintained within a narrow range. Diabetes
refers to a group of conditions which indicates that blood glucose balance in the body
has gone out of control. For proper functioning of the body, the blood glucose values
have to strictly fall between a very narrow range (70 ml/dl and 110 mg/dl) (ml is
millilitre and dl is decilitre). The pancreatic endocrine hormones namely insulin and
Diabetes Detection Using ECG Signals: An Overview 303
glucagon make this happen. Insulin and glucagon are the vital hormones secreted
by pancreatic islet cells in response to the level of blood sugar, but in an opposite
manner.
The beta cells of the pancreas secrete insulin. Glucose is the main source of energy
for the body cells. But glucose is a large molecule which cannot be passed through
the cell membrane through simple diffusion mechanism. Insulin enables glucose
transport into the cells. There is a very low base level of insulin always secreted.
When we take food, carbohydrates are converted to glucose and most of it is sent
to the blood. When blood glucose is high, then a proportional amount of insulin is
produced. When insulin is present, the cells of the body can absorb glucose out of the
blood thus leading to the reduction of blood glucose level. The cells use the absorbed
glucose for getting energy for carrying out their assigned functions. When the blood
glucose decreases to the normal level, then the amount of insulin secreted also goes
down to the base minimum. Thus high blood glucose serves as a signal to pancreas
to release insulin to the blood. Suppose the level of blood glucose remains high even
after cell absorption, then insulin facilitates the storage of the excess glucose in the
cells of the liver in the form of a substance known as glycogen by the process called
glycogenesis.
The alpha cells of the pancreas secrete glucagon whose action is opposite to that
of insulin. Glucagon production is inversely proportional to the amount of blood
glucose. If blood glucose is high, no glucagon is produced. If blood glucose is low
(for example when there is long gap after taking food), large amount of glucagon
is secreted. Glucagon induces liver to release its stored glucose by converting the
glycogen to glucose by the process called glycogenolysis. Thus, the level of blood
glucose is increased. Glucagon also induces liver and some muscle cells to produce
glucose from other nutrients such as protein. The above mentioned processes are
summarized in Fig. 1.
Type 1, 2 and gestational diabetes are the commonly seen categories of diabetes. The
type 1 is mainly found in children. This is characterized by the incapability of the
body to generate insulin, mainly because of the autoimmune damage of beta cells
in the pancreas which produces insulin. The people having this diabetes have to live
their whole life with the support of insulin injections; otherwise complications will
occur due to the increased blood glucose. Type 1 diabetes people commonly show
symptoms of fast weight loss, polydipsia (abnormally high thirst), polyuria (large
amount of urine production) and the associated nocturia (tendency to urinate more
times during night). There will be presence of ketone bodies in urine (condition
known as ketonuria).
304 G. Swapna et al.
Fig. 1 Mechanism of
maintaining desired blood
glucose levels
Table 1 Important
Different features Type 1 Type 2
distinguishing features of
type 1 and 2 diabetes Age of the start of <40 years >50 years
disease
Duration of Weeks Months to years
symptoms
Body weight Normal or low Above normal
Ketonuria Present Absent
If insulin treatment Can lead to rapid Does not pose
is not given death immediate threat
to life
Complications at No Around 25%
the time of
diagnosis
Family history of Need not be there More likely to be
diabetes there
Type 2 diabetes is the state of decreased sensitivity to the action of insulin. Diabetic
patients need external insulin support for maintaining the proper balance of blood
glucose. If not treated properly, the diabetes is likely to progress. This is the most
prominent type of diabetes prevalent (Table 1).
Diabetes Detection Using ECG Signals: An Overview 305
Uncontrolled diabetes over a long duration can lead to many complications. Type
2 diabetes doesn’t show noticeable symptoms at the initial stage. Because of this,
about 25% of the people show evidences of diabetic complications at the time of
diagnosis only.
70% of the deaths in diabetes are due to cardiovascular diseases. A statistics from
USA indicate that diabetic people have 1.7 times higher cardiovascular death rates
than their non-diabetic counter parts among people aged 20 and above. The chance
of diabetic people affected by myocardial infarction and stroke are 1.8 and 1.5 times
higher when compared to non-diabetic people. The effects of cardiovascular risk
factors like smoking and hypertension gets magnified by the presence of diabetes.
Macrovascular (large blood vessel) disease caused by diabetes lead to fatal com-
plications like angina, stroke, myocardial infarction, cardiac failure, intermittent
claudication (cramping pain in leg) etc. Diabetic people suffer from atherosclerosis
(deposit of fatty material in the inner walls of the arteries) much earlier with much
severity than non-diabetic people. Diabetes also affects the small blood vessels in
the body. This condition is also known as microvascular disease (also known as dia-
betic microangiopathy) and it leads to thickening of the basement membrane of the
capillaries and further leads to increase in the vascular permeability throughout the
body.
Retinopathy induced by diabetes is the most common form of vision related
impairment in adults. Capillary occlusion (blockage) due to hyperglycaemia
increases local vascular endothelial growth factor (VEGF) in retina. The occlusion
of a lot of capillaries leads to the growth of new vessels in retina. There will be
swellings called microaneurysms in capillary vessels in retina which leak fluid and
blood resulting in retinal haemorrhages. The most serious form of diabetic retinopa-
thy is called proliferative retinopathy which if left untreated causes extensive visual
damage in the form of retinal detachment and frequent haemorrhages.
306 G. Swapna et al.
Diabetic nephropathy refers to the damage caused to the kidneys which may
finally lead to kidney failure. Kidney is made up of microscopic units called nephrons
which filter out impurities from the blood. Diabetes induced hyperglycaemia affects
the proper filtering functions performed by the nephrons. Diabetic nephropathy is a
prominent reason for long-term kidney disease and end-stage renal disease (ESRD)
wherein the kidneys do not work properly. ESRD is the last stage in diabetic nephropa-
thy where the person cannot survive without dialysis.
It is found that diabetic neuropathy is an important cause of morbidity and mor-
tality in diabetes. In peripheral neuropathy, peripheral nerves are affected resulting
in problems like deficiencies in motor and sensory functions. Weakening of the
proximal muscles (muscles close to the body’s midline), abnormality in gait, pain
in limbs and feet can happen. In autonomic neuropathy, parasympathetic or sym-
pathetic nerves may be affected in many visceral systems. There are innumerable
clinical features of autonomic neuropathy affecting different systems of the body like
cardiovascular systems (e.g. resting tachycardia), gastrointestinal systems (e.g. con-
stipation, abdominal fullness, nocturnal diarrhoea), pupillary systems (e.g. reduced
reflexes to light, reduction in pupil size) etc. All the above described complications
are shown in Fig. 2.
Overeating, under activity and obesity may lead to diabetes in the case of middle-
aged people according to the epidemiological studies conducted. People with a body
mass index (BMI) larger than 30 kg/m2 are 10 times more prone to getting type 2
diabetes. Middle-aged and elderly people are also at greater risk of diabetes.
Ethnic origin is another major risk factor of diabetes. It is found that in USA, only
5.5% of the Alaskan people are affected by diabetes, while it is 7.1% for non-hispanic
white people and 13% for non-hispanic black people. The highest value of 33% is
for native Americans in USA. These disparities observed based on ethnicity may be
due to a variety of unknown and known factors like life style, BMI related etc.
Proper treatment, effective blood glucose monitoring and control are very essential
in preventing diabetes causing complications. Popular treatment is through the oral
intake of effective drugs in order to maintain proper blood glucose level for diabetic
people. Another mode of treatment is by insulin injection subcutaneously applied
commonly to upper arms, thighs and buttocks with a disposable plastic syringe and
a sharp needle. They are normally given in multiple doses several times a day. In
acute cases, especially to those belonging to type 1 diabetes, continuous subcuta-
neous insulin therapy (or insulin pump) is administered. A further improvement of
Diabetes Detection Using ECG Signals: An Overview 307
insulin pump which incorporates a closed loop system is known as artificial pan-
creas. Artificial pancreas is an integrated system working in closed loop consisting
of insulin pumps along with continuous glucose monitoring systems (CGMS). The
CGMS system can be considered to include interstitial glucose measurement done
every 5–15 min, a personal glucose monitor which uses the glucose information to
calculate the amount of insulin to be delivered into the body by the insulin pump and
finally the insulin pump that delivers insulin.
It is important to adopt a healthy lifestyle by doing regular physical activity
and maintaining proper BMI. Healthy diet is very important. Alcohol consumption,
smoking and stress have to be avoided. Many of the important medical aspects
discussed in this paper are taken from book Davidson’s Principles and practice of
Medicine [1].
308 G. Swapna et al.
As said initially, blood glucose level has to be maintained between 70 and 110 mg/dl
in the fasting condition. If it is below 70, then the condition is hypoglycaemia. If food
is taken within two or three hours, then the glucose level can exceed 110. Irrespective
of the amount of food one has taken, blood sugar should not exceed 180 in the normal
case. If it is more than 180, the condition is hyperglycaemia indicative of diabetes.
All the commonly used methods for detecting diabetes are invasive in nature. It
generally involves extracting blood sample from the person and testing it for the
possible anomaly. Popular invasive tests for diabetes detection and its acuteness are
explained below. Table 2 also highlights the importance of these tests in diabetes
detection.
OGTT is mainly done to check for gestational diabetes in pregnant woman. A pre-
scribed amount of sugar contained drink is given to the person under test. Blood sam-
ples are tested at the prescribed time intervals. Blood glucose measurement greater
than 200 indicates the presence of diabetes. If diabetes is undetected in pregnant
woman, it may lead to complications.
HbA1c blood test gives the average blood sugar value for the past three months.
HbA1c means glycated haemoglobin. Haemoglobin is a protein contained in red
blood cells whose task is to carry oxygen throughout the body. Haemoglobin is
glycated when haemoglobin combines with blood glucose. HbA1c greater than 6.5%
indicates diabetes.
Diabetes can cause severe autonomic impairments. Diabetes induced high blood
glucose/sugar (hyperglycaemia) causes cardiovascular malfunction and precapillary
damage. This damage will affect the endothelial cells’ normal working and blocks
the normal route of passage of nitric oxide (NO) [2]. NO is essential for vasodilation.
Diabetes-induced-hyperglycaemia causes reduced activation of phosphorylation cas-
cade, leading to less endothelial NO synthase which is required to synthesize NO.
Diabetes, thus leads to reduction in the availability of NO. The endothelial cell dam-
ages due to diabetes cause the blood vessels to be vasoconstricted and it affects the
normal blood circulation.
Hyperglycaemia results in the production of free oxygen radicals which acti-
vate NO (derived from endothelium) and protein kinase C which boosts vasocon-
strictive prostanoid production [3]. Hyperglycemia leads to endothelial damages,
increases the activity and aggregability of the platelets [3, 4]. Eventually, monocytes,
leukocytes and platelets are strongly adhered to endothelium. Blood coagulability is
increased and fibrinolitic activity is decreased.
Thus, fatty material is increasingly deposited on the inner side of the blood vessel
wall due to the high blood glucose condition. The deposit leads to production of
blocks and hardening of blood vessels (atherosclerosis), obstructing flow of blood
through the blood vessels. Two major types of cardiovascular disease are coronary
artery disease and cerebral vascular disease. Coronary artery disease (ischemic heart
disease) is caused by thickening of blood vessels that go to the heart by deposits
of fatty material. Heart’s blood flow is thus decreased or blocked leading to a heart
attack. Increased blood sugar levels not only damage blood vessels, but also change
the level of blood lipid. Diabetic people are at least twice more probable to develop
310 G. Swapna et al.
heart disorders or stroke than non-diabetic people. Heart attacks in people with
diabetes are more serious (more likely to result in death).
60–70% of diabetic patients have some form of neuropathy caused by diabetes.
Diabetic neuropathy can be further grouped as autonomic, focal, peripheral and
proximal neuropathy. Our focus is on the diabetic neuropathy affecting the nerves
connected with the functioning of the heart (neuropathy known by cardiovascular
autonomic neuropathy (CAN)). Heart rate and blood pressure are affected by CAN.
High glucose level associated with diabetes causes serious problems in different
organs of the body. All the autonomic microvascular damages also cause decrease
in local reflexes. CAN leads to diminished HRV indicative of diabetic neuropathy
[5]. Diabetes induced CAN may cause ECG alterations like ST-T changes, sinus
tachycardia, heart rate variability changes, long QTc etc. It was also confirmed that
QT, QTc and ST dispersions are predictors of death in diabetic patients [6, 7]. Among
these ECG alterations, we are concentrating on the HRV signal which can be used
for diabetes diagnosis since HRV is indicative of cardiac disorders developed due to
diabetes.
ECG represents the role of autonomic nervous system (ANS) in regulating heart’s
natural rhythm. The generation method of ECG signal is as follows. The origin
of the heartbeat is in a form of an electric impulse from sino-atrial (SA) node. This
contracts both atria and then activates atrioventricular (AV) node and spreads through
both ventricles. The complete activity is represented in the ECG waveform (Fig. 3).
SA node functions as the heart’s pacemaker. The cardiac impulse generated here
is influenced by the parasympathetic and sympathetic nervous systems. Cardio-
acceleration is caused by enhanced activity of sympathetic nervous system (SNS)
or decreased parasympathetic nervous system (PNS) activity. Cardio-deceleration is
caused by decreased SNS or increased PNS activity. Thus the status of the ANS is
clearly understood from HRV signals. The SNS and PNS are the two branches of
the ANS which together control the heart rate. Thus HRV can give a clear picture
about sympathetic-parasympathetic balance. The instantaneous heart rate, together
decided by the SNS and PNS, is strongly influenced by different kinds of neural,
myocardial and hormonal factors [11].
The analysis of the non-invasive HRV data has innumerable applications in clin-
ical areas of cardiology, physiology and pharmacology. HRV related cardiological
impairment analysis is of real significance. They are simple and non-invasive, can
detect impairments which have not gone to the stage of showing clear symptoms. If
detected, the patient can further go in for detailed clinical tests. Research showed that
the non-invasive HRV measurements are also reproducible if done under standard
conditions [12, 13].
312 G. Swapna et al.
Heart rate signal contains the RR interval information ordered in time. The vari-
ation of RR intervals is known as HRV. The variations in the ANS due to hypergly-
caemia can be represented well by HRV signals. Shape is an irrelevant feature for the
discrete HRV signal. The HRV data available (i.e. instantaneous heart rate against
time axis) can be analysed by different methods. It can serve as an excellent and
accurate non-invasive technique to understand the state of the ANS which regulates
the cardiac activity and heart rate.
Before deep learning techniques emerged, biosignals were analysed mainly using
machine learning (ML) techniques. ML applies artificial intelligence (AI) to systems
to make them capable of automatic learning without explicit rule-based programming
and without human assistance. In anomaly detection case, ML algorithm finds a
mathematical function by itself that produce the correct outcome (anomaly present
or absent) from the input training data (data from diagnostic tests like ECG, HRV),
understanding the hidden patterns in input data. With this learned mathematical
function, it should be able to predict the output state for a new set of input data with
high accuracy.
Extensive domain knowledge of the human system and its intricate mechanism
coupled with deep understanding of the biosignal variations happening during the
anomaly is imperative to decide what type of features has to be extracted from the
biosignal and analysed. So the initial step required is the selection of desirable fea-
tures which can be effectively used for the purpose of anomaly detection. Then these
features are extracted and fed to classifiers to detect the presence of anomalies. In
the case of diabetes detection using HRV, the initial research used different meth-
ods like time, frequency, nonlinear methods etc. All these methods gave different
ranges for the parameters for the normal and abnormal signals. These distinctive
ranges enabled classifiers to classify with accuracy above 85%. The nonlinear meth-
ods were specifically suited to biosignals like ECG which are inherently nonlinear
and nonstationary in nature. The important methods of HRV analysis for diabetes
detection using ML techniques are discussed below briefly. The features belonging
to the below described domains are then passed through suitable classifiers.
Time domain measures involve statistical operations that involve calculating the
mean and variance of the RR interval of HRV data. Important time domain param-
eters are average of heart rate, RMSSD and SDNN. Parameters like RMSSD are
indicators of high frequency changes affecting heart rate and thus reflect the state of
parasympathetic activity. The shortcoming of time domain measurements is that they
Diabetes Detection Using ECG Signals: An Overview 313
are very easily prone to outliers and artifacts. Hence, elimination of these artifacts
has to be necessarily done for the data analysis.
The traditional frequency domain techniques are incapable to provide exact time
localization in a typical nonstationary biosignal. To overcome these, better techniques
were developed. The wavelet analysis, which shows very good performance, involves
comparison of the signal with a selected wavelet of limited duration and finding
parameters. HRV analysis can thus be effectively performed making use of wavelet
transform and also be used to obtain the time related information of various frequency
bands [15].
Nonlinear methods are much suited for analysing the nonlinear and nonstationary
biosignals like ECG. Some of the important nonlinear parameters used for HRV
analysis are approximate entropy (ApEn), higher order spectrum (HOS), detrended
fluctuation analysis (DFA), correlation dimension (CD), recurrence quantification
analysis (RQA) features and empirical mode decomposition (EMD) features.
DFA (Peng et al.) is very useful in assessing the fractal scaling characteristics of
HRV data [16]. The fluctuation inherent in the data is represented by parameter α
314 G. Swapna et al.
log[C(r )]
C D = lim (1)
r →0 log(r )
The normal people produce a higher CD value when compared to the diabetic
signal because normal RR signal has higher RR variability.
ApEn is a measure of disorder in HR signal [18]. The value of ApEn is larger for more
complex or irregular data (the normal case) and vice versa for cardiac impairment
(diabetic) cases.
Recurrence plot (by Eckmann et al.) is a graphical aid to identify concealed reoc-
currences in time domain signal which may not be pronounced [19]. It measures the
nonstationarity of the time-series. Several important parameters can be calculated
from recurrence plot. Example of these parameters are laminarity (LAM), mean
diagonal line length, recurrence rate (RR), determinism (DET), entropy and trap-
ping time (TT).
HOS is very useful in the dynamical analysis of nonlinear, nonstationary and non-
gaussian biosignals. HOS (also called polyspectra) represents the cumulants and
moments of order three and above. HOS can be effectively used for the analysis of
HRV signals. Several useful HOS features can be extracted from HRV data and fed
to different classifiers for the purpose of diabetes detection.
Diabetes Detection Using ECG Signals: An Overview 315
EMD will split the input signal into intrinsic mode functions (IMFs). The IMF gener-
ated features are well suited to effectively capture the nonlinearity and nonstationarity
characteristics of biosignals like HRV.
A variety of time, frequency, wavelet, nonlinear based features along with classifiers
have been used for detecting diabetes in previous works. Our concentration in this
chapter is on deep learning. Deep learning is an improvisation of machine learning
and it is particularly suited to high dimensional data and for complex artificial intel-
ligence problems. The shortcomings of machine learning led to development of deep
learning [20].
All the explicit feature-related processes found in the conventional machine learn-
ing networks are implicitly performed in deep learning networks. Deep networks
self-learn from the data and its efficiency is much better compared to the traditional
feature extraction networks.
Deep learning networks use cascaded layers of nonlinear processing units. These
units do the task of feature extraction and transformation. The output of one unit is
fed as input to the succeeding unit. The learning can be performed in a supervised or
unsupervised manner. They normally use some kind of gradient descent method for
training using back propagation method. Popular deep learning networks are briefly
explained below.
(Fig. 5). This property of LSTM made it of wide use in complex tasks like language
modelling. Generally, it is of wide use in areas where long time series data analysis
is required.
Memory block in LSTM can be considered as a complex processing centre built of
memory cells. The input and output gates are multiplicative gates which can permit
or block the flow of cell activation through the memory unit to nodes coming further
in the path. A set of modifiable multiplicative gates manage the entire processes
happening in the memory block. Peephole connections and forget gate are the new
additions to the LSTM architecture as research progressed. The forget gate can be
used in place of CEC (constant error carousel). These three gates also assist the
memory cell to store the information ranging across many time steps.
GRU is an improved variety of LSTM having less number of parameters. GRU enable
each recurrent unit to capture dependencies corresponding to different time scales
in an adaptive manner. GRU has gating units that modulate the flow of information
inside its memory, but unlike LSTM, it doesn’t have separate memory cells. The
memory consumption and computational cost of GRU is much smaller than that of
LSTM.
information of signals like ECG (The details of experimental analysis and topology
of work using CNN and CNN-LSTM are explained in Sect. 7).
In the case of hybrid architectures like CNN-LSTM, CNN is made up of convo-
lutional1D and maxpooling1D layers alone. Maxpooling layer’s output is passed as
input to subsequent network.
yi = C N N (xi ) (2)
The input and output of the CNN is xi and yi respectively. Each data type of xi has
an associated class label. yi is the output vector of the maxpooling layer in CNN. yi is
fed to the next deep learning network placed after CNN. The deep learning network
can be of RNN, LSTM and GRU.
6 Literature Survey
HRV signals are earlier analysed using the above described time, frequency and non-
linear based parameters. Evidences suggest that heart does not oscillate periodically
under normal conditions [25]. Thus, nonlinear techniques, capable of extracting and
analysing nonlinear features from HRV signals, are also widely used. Nonlinear
features like Lyapunov exponent (Rosenstien et al.), 1/f slope (Kobayashi et al.),
approximate entropy (ApEn) (Pincus), detrended fluctuation analysis (DFA) (Peng
et al.) can be extracted from the HRV signals for further analysis [16, 18, 26, 27]. The
range of the feature values gives indication of the possible anomaly. HRV signals
classification is also done by nonlinear techniques [28, 29]. Nonlinear techniques are
employed for the cardiac signal analysis for developing cardiac arrhythmia detection
algorithms [30, 31].
coronary disorder turned out to be diabetic patients too [37, 38]. This is because
diabetes results in early development of coronary disease and atherosclerosis. All
these results proved that HRV analysis can be used to identify diabetes.
Diabetes-induced-CAN can be very damaging. Hence, early detection of CAN
due to diabetes is very important. Ahsan et al. showed the HRV analysis using features
likes sample entropy (SampEn) and Poincare plots are very useful in detecting CAN
present in diabetic people [39].
Kirvela et al. performed frequency and time domain analysis of HRV (extracted
from 24 h duration ECG recordings) [40]. All analysis parameters (both time and
frequency) were significantly reduced in diabetic HRV samples compared to those
from normal people. Mackay measured heart rate variation at different levels of
breathing modes for normal and diabetic patients. It was observed that heart rate
variation was markedly lower in diabetic people [41].
Jelinek et al. researched on the consequences of QT dispersion on normal and
diabetic people also ensuring that people belonging to both classes had no previous
history of cardiac diseases [42]. Heart rate variability was measured through a param-
eter named tone-entropy (T-E) where tone (T) is the representation of sympatho-vagal
balance and entropy (E) is the representation of the autonomic regularity. T-E was
observed to be reduced in diabetic people. On similar group of people on simi-
lar conditions, Awdah et al. observed that time domain parameters like St. George
index, RMSSD, SDRR etc. were reduced in diabetic cases in comparison to normal
cases [43]. Chemla et al. used the method of autoregressive frequency modelling for
studying of the effect of HRV signals in diabetes affected people [44].
Schroeder et al. found out that time domain parameters of RMSSD, SDNN and RR
interval were lower in diabetic people. They also observed that as diabetes progresses,
proper autonomic function of the body will be badly affected [45]. Seyd et al. did
time and frequency analysis of HRV [46]. The time domain parameters like mean RR
interval, TINN, RMSSD, SDNN, NN50 count, HRV triangular index were reduced
in diabetic patients than normal people. It was observed that there is considerable
difference in power across different frequency ranges between diabetes people and
normal people when frequency domain analysis was done.
Trunkvalterova et al. proved that multiscale entropy (MSE) is capable of detect-
ing very small aberrations in the cardiovascular systems of patients having type 1
diabetes. In their work, they used the estimator parameter of SampEn and linear mea-
sures like RMSSD [47]. Faust et al. analysed time, frequency and nonlinear features
derived from HRV signals and showed that nonlinear methods gave better results in
the diagnosis of diabetes compared to time domain and frequency domain methods
[48]. Jian et al. applied principal component analysis (PCA) to HOS bispectrum
magnitude plots obtained out of HRV signals. These were fed to SVM classifier to
obtain diabetes detection accuracy value of 79.93% [49].
Acharya et al. arrived at an innovative diabetic integrated index (DII) making use
of nonlinear features derived from HRV signal [50]. They obtained diabetes detection
accuracy of 86% using adaboost classifier. Swapna et al. used HOS based features for
diabetes detection with an accuracy of 90.5% [51]. Acharya et al. obtained accuracy
of 90% extracting four nonlinear features using adaboost classifier [52]. Acharya
320 G. Swapna et al.
Table 3 A summary of machine learning methods used for detecting HRV parameters that were
significantly different in diabetic patients (DM = Diabetes Mellitus)
Authors Methods/features Observed activity for extracted
features for DM
Pfeifer et al. [5] Time domain
Kirvela et al. [40] Frequency domain, time HRV reduced
domain
Singh et al. [33] Frequency domain, time Reduced LF power
domain
Awdah et al. [43] Time domain Reduced
Flynn et al. [55] DFA Reduced short-term correlation
in DM
Chemla et al. [44] FFT, Autoregressive spectral Decreased
analysis
Schroeder et al. [45] Time domain Decreased
Seyd et al. [46] Time, frequency domain Decreased
Trunkvalterova et al. [47] Nonlinear methods (multiscale Decreased MSE
entropy (MSE))
Faust et al. [48] Time, frequency, nonlinear Decreased
Acharya et al. [50] Nonlinear (RQA, CD) Accuracy is 86%
Swapna et al. [51] HOS Accuracy is 90.5%
Jian et al. [49] HOS Accuracy is 79.93%
Acharya et al. [52] Nonlinear features Accuracy is 90.0%
Acharya et al. [53] DWT Accuracy is 92.02%
Pachori et al. [54] EMD related features Accuracy is 95.63%
et al. used entropies, energy skewness and kurtosis to achieve diabetes detection
accuracy of 92.02% employing decision tree (DT) classifier [53]. Pachori et al. used
EMD on HRV signals along with Morlet wavelet kernel function to achieve the very
high accuracy of 95.63% [54]. Table 3 summarises all the above works.
These are some of the works connecting deep learning analysis methods and ECG.
CNN based deep learning methods were used to analyse ECG to detect coronary
artery disease (Acharya et al.), myocardial infarction (Acharya et al.), classify heart-
beats (Acharya et al.) [56–58]. Sujadevi et al. analysed ECG to detect atrial fibrillation
[59].
Diabetes Detection Using ECG Signals: An Overview 321
Table 4 Deep learning methods used for diabetes detection (with HRV as input)
Authors Methods/features Accuracy
Swapna et al. [60] Deep learning (CNN-LSTM) Accuracy is 95.1%
Swapna et al. [61] Deep learning (CNN-LSTM) followed by SVM Accuracy is 95.7%
Regarding diabetes detection using ECG signals, Swapna et al. employed hybrid
deep learning CNN-LSTM network with HRV as input to achieve a very high accu-
racy value of 95.1% which is comparable to maximum accuracy achieved so far [60].
Swapna et al. improved the above diabetes detection accuracy to 95.7% by adding
SVM classifier after the CNN-LSTM network [61]. Accuracy details are given in
Table 4.
The hybrid architecture for diabetes detection is discussed in detail in [60, 61]. The
workflow of hybrid architecture is shown in Fig. 6. Deep learning architecture is
implemented using powerful software framework of TensorFlow [62] in the case of
Fig. 6 The architecture of proposed system of [60, 61] (with and without SVM)
322 G. Swapna et al.
processing of data. Another issue is that as the data volume is very high, it may not be
possible to store the entire data in memory or disk. Many training/testing algorithms
are designed assuming that the data is available in its entirety in memory. Because
of this, such algorithms cannot be run successfully. This is known as the curse of
modularity. Distributed computing and parallelization can be resorted to tackle this
challenge. Further, there are challenging issues of high dimensionality of the data,
highly diverse nature of data and high variation in the probability of occurrence of
classes in data which if not handled, will deteriorate the performance of the machine
learning network. In machine learning, proper selection of features is crucial using
domain knowledge. As the dataset grows in dimension as well as in sample size, it is
extremely difficult to create relevant features. Feature selection is also very difficult
in high dimensional data. These issues in handling and analysing big data led to the
situation of deep learning networks occupy the stage instead of traditional machine
learning networks.
Concentrating on applying deep learning techniques to ECG-derived-HRV data
for the purpose of diabetes detection, the best performed models [60, 61] applied it
on real-time data and these works can be considered as the foundation stone towards
future work in this direction. Further improvements in accuracy can be tried by giving
larger sized input data into the developed architecture compared to the data given in
the above works.
Present advanced ECG measurement equipment take very less duration (less than
5 min) to extract ECG signal for analysis. On the other hand, there are Holter monitors
which do a continuous (for at least 24–48 h) monitoring of ECG signal of a person
to check for possible abnormalities which cannot be known by the short-term ECG
monitoring. Machine learning techniques are sufficient to handle short-term ECG
data. Deep learning networks and algorithms are suitable for relatively short-term
data also considering the fact that analysis results can be obtained very quickly in
real time. The second case of analysis of large amount of data (continuous ECG
signal with duration more than 24 h), say from Holter monitors, also requires big
data analytics and deep learning algorithms. If long duration ECG data are available
to researchers, deep learning architectures like LSTM and hybrid systems like CNN-
LSTM are available which are capable of analysing the non-invasive data for the
future possibility of being affected by diabetes. Hence if real time big data is made
available to deep learning networks, the scenario will shift fast from the problem of
detection of a disease to that of prediction of a disease in near future.
9 Conclusion
The body of the diabetes affected person is either incapable of producing suffi-
cient insulin or resistant to the produced insulin leading to unbalanced high blood
sugar. Autonomic impairments which are nonsymptomatic, but can only be clinically
detectable, are evident only after many years have passed after the onset of diabetes.
Thus, HRV can be used as an early sign of the impending diabetic neuropathy and
324 G. Swapna et al.
can be used for diabetes detection with high accuracy. HRV analysis is thus a simple,
non-invasive and reproducible detection method of diabetes. Deep learning methods
can be used to detect diabetes with very high accuracy. Distributed deep learning
systems can give results very fast that can turn real time analysis of biosignals a
reality. So it can be said for sure that the future of biomedical engineering belongs
to the featureless, deep learning based systems which can do big data analytics with
no necessity of domain knowledge.
References
1. Ralston, S.H., Penman, I.D., Strachan, M.W., Hobson, R.P.: Davidson’s Principles and Practice
of Medicine, 23rd edn. Elsevier
2. Viktor, S., Steven, I., Marina, D.I., Aleksander, N., Vojislava, M.: Impact of diabetes on heart
rate variability and left ventricular function in patients after myocardial infarction. Facta Univ.
Ser.: Med. Biol. 12(3), 130–134 (2005)
3. Di Carli, M.F., Janisse, J., Grunberger, G., Ager, J.: Role chronic hyperglycemia in the patho-
genesis of coronary microvascular dysfunction in diabetes. J. Am. Coll. Cardiol. 41, 1387–1393
(2003)
4. Gresele, P., Guglielmini, G., Deangelis, M., et al.: Acute short-term hyperglycemia enhances
heart stress-induced platelet activation in patients with type 2 diabetes mellitus. J. Am. Coll.
Cardiol. 41, 1013–1020 (2003)
5. Pfiefer, M.A., Cook, D., Brodsky, J., Tice, D., Reenan, A., Swedine, S., et al.: Quantitative
evaluation of cardiac parasympathetic activity in normal and diabetic man. Diabetes 339–345
(1982)
6. Sawicki, P.T., Dahne, R., Bender, R., Berger, M.: Prolonged QT interval as a predictor of
mortality in diabetic nephropathy. Diabetologia 39(1), 77–81 (1996)
7. Okin, P.M., Devereaux, R.B., Howard, B.V., Welty, T.K.: Assessment of QT interval and QT
dispersion for prediction of all-cause mortality and cardiovascular mortality in American Indi-
ans: the Strong Heart Study. Circulation 101, 61–66 (2000)
8. Barrett, K.E., Barman, M.S., Boitano, S., Brooks, H.: Ganong’s Review of Medical Physiology.
McGraw-Hill Companies
9. Stern, S., Sclarowsky, S.: The ECG in diabetes mellitus. Am. Heart Assoc. (AHA) J. (2009)
10. Sokolow, M., Mcllroy, M.B., Chiethin, M.D.: Clinical Cardiology. VLANGE Medical Book
(1990)
11. Constant, I., Laude, D., Murat, I., Elghozi, J.L.: Pulse rate variability is not a surrogate for
heart rate variability. Clin. Sci. 97, 391–397 (1999)
12. Kleiger, R.E., Bigger, J.T., Bosner, M.S., Chung, M.K., Cook, J.R., Rolnitzky, L.M., et al.:
Stability over time of variables measuring heart rate variability in normal subjects. Am. J.
Cardiol. 68, 626–630 (1991)
13. Ge, D., Srinivasan, N., Krishnan, S.M.: Cardiac arrhythmia classification using autoregressive
modeling. Biomed. Eng. Online 1(1), 5 (2002)
14. Akselrod, S., Gordon, D., Madwed, J.B., Snidman, N.C., Shannon, D.C., Cohen, R.J.: Hemo-
dynamic regulation: investigation by spectral analysis. Am. J. Physiol. 249(4 Pt 2), H867–H875
(1985)
15. Gamero, L.G., Vila, J., Palacios, F.: Wavelet transform analysis of heart rate variability during
myocardial ischaemia. Med. Biol. Eng. Comput. 40, 72–78 (2002)
16. Peng, C.K., Havlin, S., Hausdorf, J.M., Mietus, J.E., Stanley, H.E., Goldberger, A.L.: Fractal
mechanisms and heart rate dynamics. J. Electrocardiol. 28(Suppl), 59–64 (1996)
17. Grassberger, P., Procassia, I.: Measuring the strangeness of strange attractors. Phys. D 9,
189–208 (1983)
Diabetes Detection Using ECG Signals: An Overview 325
18. Pincus, S.M.: Approximate entropy as a measure of system complexity. Proc. Natl. Acad. Sci.
U.S.A. 88, 2297–2301 (1991)
19. Eckmann, J.P., Kamphorst, S.O., Ruelle, D.: Recurrence plots of dynamical systems. Europhys.
Lett. 4, 973–977 (1987)
20. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press. http://www.
deeplearningbook.org (2016)
21. Poultney, C., Chopra, S., Cun, Y.L., et al.: Efficient learning of sparse representations with an
energy-based model. In: Advances in Neural Information Processing Systems, pp. 1137–1144
(2006)
22. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks.
Science 313(5786), 504–507 (2006)
23. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust
features with denoising autoencoders. In: Proceedings of the 25th International Conference on
Machine Learning, pp. 1096–1103. ACM (2008)
24. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780
(1997)
25. Goldberger, A.L., West, B.J.: Application of non-linear dynamics to clinical cardiology. Ann.
N. Y. Acad. Sci. 504, 195–213 (1987)
26. Rosenstien, M., Collins, J.J., De Luca, C.J.: A practical method for calculating largest Lyapunov
exponents from small data sets. Phys. D 65, 117–134 (1993)
27. Kobayashi, M., Musha, T.: 1/f fluctuation of heart beat period. IEEE Trans. Biomed. Eng. 29,
456–457 (1982)
28. Acharya, U.R., Kannathal, N., Krishan, S.M.: Comprehensive analysis of cardiac health using
heart rate signals. Physiol. Meas. J. 25, 1130–1151 (2004)
29. Acharya, U.R., Paul Joseph, K., Kannathal, N., Lim, C.M., Suri, J.S.: Heart rate variability: a
review. Med. Biol. Eng. Comput. 44(12), 1031–1051 (2006)
30. Chua, K.C., Chandran, V., Acharya, U.R., Lim, C.M.: Computer-based analysis of cardiac state
using entropies, recurrence plots and Poincare geometry. J. Med. Eng. Technol. 2(4), 263–272
(2008)
31. Acharya, U.R., Suri, J.S., Spaan, J.A.E., Krisnan, S.M.: Advances in Cardiac Signal Processing.
Springer Verlag GmbH Berlin Heidelberg (2007)
32. Wheeler, T., Watkins, P.J.: Cardiac denervation in diabetes. Br. Med. J. 4, 584–586 (1973)
33. Singh, J.P., Larson, M.G., O’Donell, C.J., Wilson, P.F., Tsuji, H., Lyod-Jones, D.M., Levy, D.:
Association of hyperglycemia with reduced heart rate variability: the Framingham heart study.
Am. J. Cardiol. 86, 309–312 (2000)
34. Villareal, R.P., Liu, B.C., Massumi, A.: Heart rate variability and cardiovascular mortality.
Curr. Atheroscler. Rep. 4(2), 120–127 (2002)
35. Stamler, J., Vaccaro, D., Neaton, J.D., Wentworth, D.: Diabetes, other risk factors, and 12-year
cardiovascular mortality for men screened in the multiple risk factor intervention trial. Diabetes
Care 16, 434–444 (1993)
36. Coutinho, M., Gerstein, H.C., Wang, Y., Yusuf, S.: The relationship between glucose and
incidence cardiovascular events: a meta-regression analysis of published data from 20 studies
of 95783 individuals followed for 12.4 years. Diabetes Care 22, 233–240 (1999)
37. Melchior, T., Kober, L., Madsen, C.R., et al.: Accelerating impact of diabetes mellitus on
mortality in the years following an acute myocardial infarction. Eur. Heart J. 20, 973–978
(1999)
38. Braunwald, E., Antman, E., Beasley, J.W., et al.: ACC/AHA guidelines for the management
of patients with unstable angina and non-ST-segment elevation myocardial infarction. J. Am.
Coll. Cardiol. 36, 970–1062 (2000)
39. Khandoker, A.H., Jelinek, H.F., Palaniswami, M.L: Identifying diabetic patients with cardiac
autonomic neuropathy by heart rate complexity analysis. Biomed. Eng. Online 8, 1–12 (2009)
40. Kirvela, M., Salmela, K., et al.: Heart rate variability in diabetic and non-diabetic renal trans-
plant patients. Acta Anaesthesiol. Scand. 40(7), 804–808 (1996)
326 G. Swapna et al.
41. Mackay, J.D.: Respiratory sinus arrhythmia in diabetic neuropathy. Diabetologia 24(4),
253–256 (1983). https://doi.org/10.1007/BF00282709
42. Jelinek, H.F., Flynn, A., Warner, P.: Automated assessment of cardiovascular disease associated
with diabetes in rural and remote health practice. In: The National SARRAH Conference,
pp. 1–7 (2004)
43. Awdah, A., Nabil, A., Ahmad, S., Reem, Q., Khidir, A.: Time-domain analysis of heart rate
variability in diabetic patients with and without autonomic neuropathy. Ann. Saudi Med. 22,
5–6 (2002)
44. Chemla, D., Young, J., Badilini, F., Maison, B.P., Affres, H., Lecarpentier, Y., Chanson, P.:
Comparison of fast Fourier transform and autoregressive spectral analysis for the study of
heart rate variability in diabetic patients. Int. J. Cardiol. 104(3), 307–313 (2005)
45. Schroeder, E.B., Chambless, L.E., Liao, D., Prineas, R.J., Evans, G.W., Rosamond, W.D., et al.:
Diabetes, glucose, insulin, and heart rate variability: the Atherosclerosis Risk in Communities
(ARIC) study. Diabetes Care 28(3), 668–674 (2005)
46. Seyd, P.T.A., Ahamed, V.T., Jacob, J., Joseph, P.: Time and frequency domain analysis of heart
rate variability and their correlations in diabetes mellitus. World Acad. Sci. Eng. Technol. 2(3)
(2008)
47. Trunkvalterova, Z., Javorka, M., Tonhajzerova, I., Javorkova, J., Lazarova, Z., Javorka, K.,
Baumert, M.: Reduced short-term complexity of heart rate and blood pressure dynamics in
patients with diabetes mellitus type 1: multiscale entropy analysis. J. Physiol. Meas. 29(7)
(2008)
48. Faust, O., Acharya, U.R., Molinari, F., Chattopadhyay, S., Tamura, T.: Linear and non-linear
analysis of cardiac health in diabetic subjects. Biomed. Signal Process. Control 7(3), 295–302
(2012)
49. Jian, L.W., Lim, T.C.: Automated detection of diabetes by means of higher order spectral
features obtained from heart rate signals. J. Med. Imaging Health Inform. 3, 440–447 (2013)
50. Acharya, U.R., Faust, O., VinithaSree, S., Ghista, D.N., Dua, S., Joseph, P., Thajudin, A.V.I.,
Janarthanan, N., Tamura, T.: An integrated diabetic index using heart rate variability signal
features for diagnosis of diabetes. Comput. Methods Biomech. Biomed. Eng. 16, 222–234
(2013)
51. Swapna, G., Acharya, U.R., VinithaSree, S., Suri, J.S.: Automated detection of diabetes using
higher order spectral features extracted from heart rate signals. Intell. Data Anal. 17(2), 309–326
(2013)
52. Acharya, U.R., Faust, O., Kadri, N.A., Suri, J.S., Yu, W.: Automated identification of normal and
diabetes heart rate signals using nonlinear measures. Comput. Biol. Med. 43(10), 1523–1529
(2013)
53. Acharya, U.R., Vidya, S., Ghista, D.N., Lim, W.J.E., Molinari, F., Sankaranarayanan, M.:
Computer-aided diagnosis of diabetic subjects by HRV signals using discrete wavelet transform
method. Knowl.-Based Syst. 42, 4567–4581 (2015)
54. Pachori, R.B., Kumar, M., Avinash, P., Shashank, K., Acharya, U.R.: An improved online
paradigm for screening of diabetic patients using RR-interval signals. J. Mech. Med. Biol. 16,
1640003 (2016)
55. Flynn, A.C., Jelinek, A.F., Smith, M.: Heart rate variability analysis: a useful assessment tool
for diabetes associated cardiac dysfunction in rural and remote areas. Aust. J. Rural Health
13(2), 77–82 (2005)
56. Acharya, U.R., Fujita, H., Oh, S.L., Adam, M., Tan, J.H., Chua, C.K.: Automated detection of
coronary artery disease using different durations of ECG segments with convolutional neural
network. Knowl.-Based Syst. 132, 62–71 (2017)
57. Acharya, U.R., Fujita, H., Oh, S.L., Hagiwara, Y., Tan, J.H., Adam, M.: Application of deep
convolutional neural network for automated detection of myocardial infarction using ECG
signals. Inf. Sci. 415, 190–198 (2017)
58. Acharya, U.R., Oh, S.L., Hagiwara, Y., Tan, J.H., Adam, M., Gertych, A., Tan, R.S.: A deep
convolutional neural network model to classify heartbeats. Comput. Biol. Med. 89, 389–396
(2017)
Diabetes Detection Using ECG Signals: An Overview 327
59. Sujadevi, V.G., Soman, K.P., Vinayakumar, R.: Real-time detection of atrial fibrillation from
short time single lead ECG traces using recurrent neural networks. In: The International Sympo-
sium on Intelligent Systems Technologies and Applications, pp. 212–221, Sept 2017. Springer
60. Swapna, G., Soman, K.P., Vinayakumar, R.: Automated detection of diabetes using CNN and
CNN-LSTM network and heart rate signals. Procedia Comput. Sci. 132, 1253–1262 (2018)
61. Swapna, G., Vinayakumar, R., Soman, K.P.: Diabetes detection using deep learning algorithms.
ICT Express 4, 243–246 (2018)
62. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving,
G., Isard, M., et al.: Tensorflow: a system for large-scale machine learning. OSDI 16, 265–283
(2016)
G. Swapna is a Ph.D. student in the Computational Engineering and Networking, Amrita School
of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, India since July 2015. She is also a
faculty at Government Engineering College, Kozhikode, Kerala, India.
K. P. Soman has 25 years of research and teaching experience at Amrita School of Engineer-
ing, Coimbatore. He has around 150 publications in national and international journals and con-
ference proceedings. He has organized a series of workshops and summer schools in Advanced
signal processing using wavelets, Kernel Methods for pattern classification, Deep learning, and
Big-data Analytics for industry and academia. He authored books on “Insight into Wavelets”, “In-
sight into Data mining”, “Support Vector Machines and Other Kernel Methods” and “Signal and
Image processing-the sparse way”, published by Prentice Hall, New Delhi, and Elsevier.
Abstract Deep Learning (DL) is popular among the researchers and academicians
due to its reliability and accuracy, especially in the field of engineering and medical
sciences. In the field of medical imaging for the diagnosis of disease, DL techniques
are very helpful for early detection. Most important features of DL techniques are
that they are uncomplicated with lower complexity, which ultimately saves the time
and money and tackle many tough tasks simultaneously. Artificial Intelligence (AI)
and Deep Learning (DL) technologies have rapidly improved in recent years. These
techniques played an important role in every field of application, especially in the
medical field such as in image processing, image fusion, image segmentation, image
retrieval, image analysis, computer aided diagnosis (CAD), image registration and,
image-guided therapy and many more. The aim of writing this chapter is to describe
the DL methods and, the future of biomedical imaging using DL in detail and discuss
the issues and challenges.
1 Introduction
Currently, DL techniques are one of the most often used algorithms for getting
better, scalable, and accurate results from the data as compared to state-of-the-art
methods of Machine Learning (ML). DL is also applied to the biomedical images
to detect (diagnose) diseases with precisely tailored treatment plans for improving
the patient’s health. EEG, ECG, MEG, MRI, etc. are the trending biomedical images
for diagnosis of patients by minimising the intervention of humans. These medical
images may also contain noise, which makes it difficult to analyse them accurately.
Deep Learning has the potential to give reliable and precise results with higher
accuracy. Every technology has some pros and cons, Similarly DL is also having
some cons too, like it gives promising outcomes when data size is huge, it needs GPU
to process the medical images with requiring higher system configurations. Although
Deep Learning is having some disadvantages but still it is trendy in present scenario
due to its capability of processing huge amount of data. This chapter discusses,
the state-of-the-art approaches of DL for biomedical images. We will also discuss
the application of Deep Learning (DL) for classification, registration, segmentation,
issues and challenges of DL approaches and future of Deep Learning in biomedical
imaging.
Within the wide assortment of various Machine Learning (ML) approaches, Deep
Learning has truly marked its presence with its excellent performance, particularly
in the area of medical image processing.
Deep Learning (DL) belongs to the area of ML, which in turn is a fork of AI.
It deals with algorithms motivated by the structure and functioning of the brains. It
permits computing models to learn from the representation of dataset with the aid
of numerous hidden processing layers [1]. These layers are concerned with idea of
feature extraction and transformation. The output from the last layer is fed into the
subsequent one. It is a way to automate predictive analysis. Moreover, it can excel
in performance in both supervised and unsupervised approaches (Fig. 1).
The working of Deep Learning approach is shown in Fig. 2. In this approach,
firstly dataset and particular Deep Learning algorithm are chosen for which model
is to be designed, in further steps comprehensive experiments are performed and
thereafter the results are generated and analyzed.
Numerous Deep Learning models have been developed, such as Convolutional
Neural Networks (CNN), Deep Belief Networks (DBN), Recurrent Neural Network
(RNN) etc. which are discussed in the following subsections:
Encoder-Decoder (ED)
Encoder-Decoder architecture surpasses the traditions of ML methods. It has trans-
formed as a core technology for prediction in neural networks and sequence-to-
sequence technique. It has ability to tackle with variable length input and output.
The encoder holds the input sequence and maps it to an encoded sequence. The
encoded version is utilized by the decoder to materialize it into output (Fig. 5).
Deep Learning is the growing and trendy research area in medical research for the
diagnosis of the diseases. In today’s scenario, people are primarily suffering from
lifestyle diseases like type-2 diabetes, obesity, heart diseases, and neurodegenerative
diseases due to the consumption of drugs, alcohol, smoking and unhealthy diet.
Deep Learning is playing a vital role in the prediction of such diseases. In our day to
day life, Computer-Aided Diagnosis (CAD) is preferable for testing and diagnosing
any disease via Computerised Tomography (CT), Single Photon Emission Computed
Tomography (SPECT), Positron Emission Tomography (PET), Magnetic Resonance
Imaging (MRI) and some more. Deep Learning accelerates the processing speed of
the diagnosis as well as it can expand the 2D and 3D parameters for further details.
It can also resolve the issues regarding data labeling and over fitting to some extent.
There are many diseases which can be classified or diagnosed using DNN like breast
cancer, aphasia, attention deficit hyperactivity disorder (ADHD) and many more.
Deep Learning is very much popular research area with the help of which we can
diagnose any type of disease. For example, ADHD is a very common mental disorder
334 M. Jyotiyana and N. Kesswani
among children. A child suffering from ADHD may have to face some problems like
poor concentration power, distractibility, weakness, and excessive activity. Similarly
Deep Learning is also used for detection of cancer, Alzheimer, Parkinson’s, brain
tumor and many more.
1.4 Applications
Deep Learning is prevalent in nowadays not even in the field of health informatics
but in daily routine life too. Many prediction and classification tasks are managed
by Deep Learning because of its promising results, accuracy, and faster processing
with less complexity.
There are many applications of Deep Learning, but some typical popular health
informatics applications are:
1. Content-based image retrieval
2. Object detection
Face detection
Disease diagnosis
Lesion detection
3. Machine vision and medical imaging
Tumor detection
Tumor stage
Surgery planning
Remote surgery
Intra-surgery navigation
Virtual surgery simulation
4. Recognition tasks
Iris recognition
Pattern recognition.
Machines are faster and more accurate as compare to humans, so humans prefer
machine/computer-based jobs mostly. In medical sciences, Computer-Aided Diag-
nosis (CAD) and automatic medical image analysis are the preferable choices, or
we can say crucial too. CAD also playing the important role in the modeling disease
progression [2, 3], like in many neurodegenerative disorders (NDD) such as strokes,
Parkinson’s disease (PD), Alzheimer’s disease (AD) and another type of dementia,
Deep Learning and the Future of Biomedical Image Analysis 335
brain scan is crucial and detailed maps of brain regions are available for analysis
and prediction of the diseases. We can add the most popular task of CAD in medical
imaging as a cancer diagnosis and measuring the intensity of lesions too. In current
years, CNN’s are more popular because of its spectacular performance and relia-
bility. The efficiency and performance of CNN’s are indicated in a survey of CNN
methods/algorithms in which brain pathology segmentation [4] and Deep Learning
approaches are used in CAD, shape prediction, and segmentation [2].
The massive challenge in CAD is in distinguishing intensity of tumors and shape
and the variations in imaging protocols in same neuro-imaging modality. In various
cases, it’s been noticed that intensity of pathological tissues may overlap with healthy
tissues and different types of noises like Rician noise, intensity-based noise and non-
isotropic resolution effects in MRI cannot be handled easily or by using elementary
Machine Learning (ML) approaches. To handle such type of data complications,
hand-crafted features and well established ML methods are used to classify them in
an entirely distinct step.
Deep Learning approaches can automate and unite the features with classification
approaches [5, 6]. CNN is capable of learning more complex features; thus, CNN is
capable of handling patch of the images centered on unhealthy tissues. CNN in med-
ical imaging is able to classify tuberculosis manifestation based on X-ray images [7],
and classification of lung disease based on CT images [8]. Along with Hemorrhages
detection in color fundus images [9] CNN can extract least discriminative patches
and most discriminative patches in pre-training stage. CNN has proposed some seg-
mentation methods of iso-intense stage brain cells [10] and extraction of different
brain regions from multi-modality Magnetic Resonance Images (MRI) [11]. There
are many hybrid approaches proposed in which CNN combines with other archi-
tectures for example, in [12] DL approach is proposed, to encode the parameters
of a distorted model and, the process of segmentation of heart’s left ventricle from
short-axis Magnetic Resonance Imaging. CNN itself distinguishes the left ventricle
while Deep Auto-Encoder (DAE) is employed to infer its shape.
2.1 Classification
Classification, classifies the data into various classes according to our need. There
are many cutting edge techniques for classification such as Support Vector Machine
(SVM), K-Nearest Neighbor (KNN), Random Forest (RF), Neural Networks (NN)
and most recent technique is Deep Learning (DL) in which we used different
approaches of DL for classification. CNN is a trendy method in the field of biomedical
imaging and health informatics for classification. Details of the image classification
are discussed in the next section.
336 M. Jyotiyana and N. Kesswani
Image classification is the broad area in which Deep Learning has an immense con-
tribution. In classification, multiple images are used as input with one variable as the
output and that output is compared to the desired output to check whether the disease
is diagnosed or not. We can use different classifiers like Support Vector Machine
(SVM), Random Forest (RF), Artificial Neural Networks (ANN), and many more.
Medical image classification is crucial in image recognition; its prime focus is to
classify medical images into various categories for diagnosis of a disease or helping
the researchers in further research. Medical image classification can be performed
by extracting useful features from the image and, using those features to build clas-
sification models that classify the image from the dataset.
When CAD was not as popular as it is today, in that era, doctors commonly used
their experience for extracting features, from the medical image and then classify
the image into various classes. This is an ordinarily complicated, tedious, and time-
consuming job. Deep Learning resolves the issue of accurate prediction means DL
is giving more precise results than humans and also it is faster to predict. It can
also process many datasets of different patients. In recent years, medical imaging
applications have great merits not only in the case of solving issues of doctors but in
research too. However, we researchers still cannot succeed in the mission efficiently.
If studies could perform classification efficiently and excellently, then it would be a
great help to doctors for diagnosis of diseases.
In medical image analysis and diagnosis, CAD provides an opinion (second objective
or additional) as an assistant. In recent years many types of research and studies have
proved that incorporation of CAD system boots up the diagnosis processes faster as
well as accurate, by enhancing the image diagnosis by lessening inter-observer vari-
ation [13, 14]. CAD enhances quantitative support for clinical recommendations like
biopsy [15]. For the identification of tumor, CAD is often constructed from follow-
ing important steps such as, feature selection, feature extraction, and classification
[16–19]. Various ML and DL classification techniques [20] have been proposed to
classify cancerous and healthy cells [21]. The main challenge is to reduce the dimen-
sions of features without losing significant information. In Deep Learning, the dataset
is the major issue if the dataset is smaller in size; it makes it more difficult to predict
some instances with the least risk of over-fitting [21]. The researchers have given
many solutions for lesion classification, but most of them accomplish feature space
reduction by deriving short feature sets selecting the features or constructing new
features in supervised ways [21].
Deep Learning and the Future of Biomedical Image Analysis 337
2.2 Detection
In medical imaging, the organ and region detection is an important task especially
in cancer, and neurodegenerative diseases, When the organ deformation activity is
recorded in MRI or other modality then it becomes easy to diagnose the type of disease
subject is suffering from and stages of the disease [23]. In case of cancer diagnosis
of tumor/brain tumor its plays vital role for treatment planning. A prime challenge
in microscopic image analysis is to analyze all independent cells for precise or exact
detection, although the distinction of most of the disease grades depends on the cell
level information [24]. To accomplish this dare, academician and researcher used
CNN for faultless detection and segmentation of cell robustly from histo-pathological
images [24, 25], outstandingly used for cancer diagnosis.
As discussed in Sect. 2.1.2, object and lesion detection is similar to its classification.
The only difference is that for the detection of lesion we have to perform segmentation
task first then perform classification or prediction for the diagnosis of disease [23,
26]. In the current scenario, Deep Learning provides promising results so that early
stages and treatment can contribute to the patient at the right time. For example, in the
year 2018, Abraham et al. suggested a novel method of lesion segmentation using
U-Net Deep Learning architecture to enhance segmentation accuracy and disease
diagnosis or prediction [27].
2.3 Segmentation
There are many leading edge approaches for lesion segmentation, but CNN gives
the most promising results in 2D as well as 3D biological data [29]. Yuan proposed
lesion segmentation method [30] for the detection of melanoma automatically from
surrounding skin cells using convolution and deconvolution method [30]. For the
diagnosis of various types of cancerous cells, CNN and other DL methods are used,
because they give more accuracy and promising results in less time period.
Deep Learning and the Future of Biomedical Image Analysis 339
2.4 Registration
There are many other tasks in medical imaging for enriching the quality of image
and diagnosis of disease. We will describe them in following subsections:
Content-Based Image Retrieval (CBIR) tasks prime goal is to assist the physician by
yielding similar medical cases of a given image in the process of decision making.
It requires massive dataset to be used in DL, sharp image representation and algo-
rithms that reliably retrieve the most identical image and their interpretation. The
first application of DL with CBIR came in 2015 [35]. In the year 2019, Pizarro et al.
[36] designed CNN architecture for automated inferring the contrast of MRI scans
based on pixel amplitude or intensity of the MR images of multiple slices [37].
As the massive data is pre-processed in Deep Learning, it gives better results, which
helps the radiologist in disease diagnosis and further research. The nearby instances
and different probabilities of the occurring of the symptoms of the disease is included
in the report of the subjects which helps in strong decision making.
340 M. Jyotiyana and N. Kesswani
An upcoming new era will be known in the health sector, where medical imaging
and data will play a vital role. As the human population is increasing day by day, the
number of cases/subjects will also increase, as we are aware of the fact that Deep
Learning is applied on massive datasets, if the number of cases recorded will increase
then the problem of the large dataset will resolve automatically. The fundamental
requirement of any subject is that right treatment should be given to the right subject
in limited time. In this context, we can say that the availability of massive dataset
brings immense opportunities as well as challenges.
In many studies, it is reported that CAD is more accurate than humans in disease
diagnosis, and it can handle many of the cases simultaneously. Thus CAD availabil-
ity and reliability is no more an issue in this technological world. In current years,
Deep Learning replaces the ML and Pattern Recognition because of the availability
of great number of data-driven solutions in medical imaging by permitting the auto-
matic feature creation and lessens human intervention during the procedure [20]. It is
favorable in many health informatics problems, and ultimately, Deep Learning rein-
forces speedily in forward direction for unstructured data originate from biomedical
imaging, bioinformatics, and medical informatics. Most of designed applications of
DL to medical imaging process the health data which is an unstructured source [20].
However, a plenty of information is encoded in structured data [20]. This gives com-
plete information about the subject’s history, treatment, diagnosis and pathology. In
medical imaging, in tumor detection cases, the cytological notes include information
about the tumor stage and its spread [20]. Such kind of information is crucial; it is
required for judging the patient’s condition or disease. Deep Learning boosts up the
reliability of the clinical decision support system with artificial intelligence (AI).
As the popularity of Deep Learning increases due to its reliability and flexibility, there
are many approaches and frameworks in the field of biomedical imaging, which are
popular over time. Recently, CNN is popular with the combination of other Deep
Learning architecture like CNN with Auto-Encoder, CNN with SVM for classifi-
cation, CNN with K-Means algorithm in image segmentation; similarly there are
various methods and architecture available for resolving the real-life problems and
other research problems.
There are some CNN models available with different layers and structure, such
as VGG [38], AlexNet [39], GoogLeNet [40], ResNet [41], Highway nets [42],
DenseNet [43], ResNext [44], SENets [45], NASNet [46], YOLO [47], GANs [48],
Siamese nets [49], U-net [50], V-net [51], and many more.
Deep Learning and the Future of Biomedical Image Analysis 341
There are various issues and challenges associated with various application domains
in particular with the medical applications that need to be solved:
• Data volume: Deep Learning being highly computational it tries to process big
amount of data. It is not generalized to have a specific number of training doc-
uments, but at least 10 sample parameters in the network should be there as a
general thumb rule. We can find large volume of data for the various application
domains like computer vision, speech, natural language etc. As we are aware of
the fact that the population of the earth is increasing day by day, so number of
cases of diseases also increase hence, collection of data is easier.
• Data quality: Data quality is again a pertinent issue in the area of Deep Learning
because, in some application domains the data which is heterogeneous, raw, noisy
and incomplete, may cause wrongly interpreted results; so, to maintain the quality
of data with such huge and heterogeneous raw database while training a good
DL model has several issues, such as data scarcity, repetition of data and missing
values that needs to be considered.
• Interpretability: Despite of successful implementation of Deep Learning models
in few application domains, still Deep Learning models are treated as black boxes,
as interpretability for the various application domains is crucial for the predictive
systems.
• Domain complexity: The domain complexity is another issue; as we talk about the
medical domain, the data sets are highly heterogeneous with incomplete knowl-
edge of their causes and their progress. Hence designing and developing Deep
Learning model with the domain complexity is very important aspect of the train-
ing models.
• Temporality: In various applications domains like medical domain datasets are
changing over the time in a nondeterministic way because the diseases are pro-
gressing and the Deep Learning models are trained with static vector based inputs
and are not trained to handle the time factor. So, designing or developing the DL
model while taking temporal data into consideration is another aspect of Deep
Learning. These challenges and issues associated with the Deep Learning opens
the door for the future research directions.
• Feature enrichment: There is limited data available in the world because of less
number of patient are present that characterize each disease. The data set required
for generating the features are not limited to the specific data source like social
media etc., the data sources can be collected through various wearable devices,
surveys, social communities etc. The integration of data sources with the Deep
Learning models is another research challenges in front of the research community.
• Temporal modeling: In health sector and real life problems time is crucial. If
the involvement of machine like CAD systems and EHR and other monitoring
devices, then time is very sensitive and training with Deep Learning should be
faster, accurate and reliable too for understanding subject’s condition and detecting
342 M. Jyotiyana and N. Kesswani
the stage of the disease. For solving the issue we can trust on RNNs and architecture
coupled with memory.
• Interpretable modeling: In Deep Learning, performance of model is important but
reliability or interpretability of the model is also very important. Deep Learning is
trendier because of its promising results and great performance, yet, how to make
the results more explanatory is also a task. Researchers should focus on model
performance as well as on algorithms too; to develop better prediction inability of
the systems.
References
1. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
2. Greenspan, H., Van Ginneken, B., Summers, R.M.: Guest editorial deep learning in medical
imaging: overview and future promise of an exciting new technique. IEEE Trans. Med. Imaging
35(5), 1153–1159 (2016)
3. Stoyanov, D., Taylor, Z., Sarikaya, D., McLeod, J., Ballester, M.A.G., Codella, N.C., De Rib-
aupierre, S. (eds.): OR 2.0 Context-Aware Operating Theaters, Computer Assisted Robotic
Endoscopy, Clinical Image-Based Procedures, and Skin Image Analysis: First International
Workshop, OR 2.0 2018, 5th International Workshop, CARE 2018, 7th International Work-
shop, CLIP 2018, Third International Workshop, ISIC 2018, Held in Conjunction with MICCAI
2018, Granada, Spain, September 16 and 20, 2018, Proceedings, vol. 11041. Springer (2018)
4. Havaei, M., Guizard, N., Larochelle, H., Jodoin, P.M.: Deep learning trends for focal brain
pathology segmentation in MRI. In: Machine Learning for Health Informatics, pp. 125–148.
Springer, Cham (2016)
5. Nie, D., Zhang, H., Adeli, E., Liu, L., Shen, D.: 3D deep learning for multi-modal imaging-
guided survival time prediction of brain tumor patients. In: International Conference on Medi-
cal Image Computing and Computer-Assisted Intervention, pp. 212–220, Oct 2016. Springer,
Cham
6. Xu, T., Zhang, H., Huang, X., Zhang, S., Metaxas, D.N.: Multimodal deep learning for cervical
dysplasia diagnosis. In: International Conference on Medical Image Computing and Computer-
Assisted Intervention, pp. 115–123, Oct 2016. Springer, Cham
7. Cao, Y., Liu, C., Liu, B., Brunette, M.J., Zhang, N., Sun, T., Curioso, W.H.: Improving tuber-
culosis diagnostics using deep learning and mobile health technologies among resource-poor
and marginalized communities. In: 2016 IEEE First International Conference on Connected
Health: Applications, Systems and Engineering Technologies (CHASE), pp. 274–281, June
2016. IEEE
8. Anthimopoulos, M., Christodoulidis, S., Ebner, L., Christe, A., Mougiakakou, S.: Lung pattern
classification for interstitial lung diseases using a deep convolutional neural network. IEEE
Trans. Med. Imaging 35(5), 1207–1216 (2016)
9. van Grinsven, M.J., van Ginneken, B., Hoyng, C.B., Theelen, T., Sánchez, C.I.: Fast con-
volutional neural network training using selective data sampling: application to hemorrhage
detection in color fundus images. IEEE Trans. Med. Imaging 35(5), 1273–1284 (2016)
10. Zhang, W., Li, R., Deng, H., Wang, L., Lin, W., Ji, S., Shen, D.: Deep convolutional neural
networks for multi-modality isointense infant brain image segmentation. NeuroImage 108,
214–224 (2015)
11. Kleesiek, J., Urban, G., Hubert, A., Schwarz, D., Maier-Hein, K., Bendszus, M., Biller, A.:
Deep MRI brain extraction: a 3D convolutional neural network for skull stripping. NeuroImage
129, 460–469 (2016)
Deep Learning and the Future of Biomedical Image Analysis 343
12. Avendi, M.R., Kheradvar, A., Jafarkhani, H.: A combined deep-learning and deformable-model
approach to fully automatic segmentation of the left ventricle in cardiac MRI. Med. Image Anal.
30, 108–119 (2016)
13. Singh, S., Maxwell, J., Baker, J.A., Nicholas, J.L., Lo, J.Y.: Computer-aided classification of
breast masses: performance and interobserver variability of expert radiologists versus residents.
Radiology 258(1), 73–80 (2011)
14. Sahiner, B., Chan, H.P., Roubidoux, M.A., Hadjiiski, L.M., Helvie, M.A., Paramagul, C., Blane,
C.: Malignant and benign breast masses on 3D US volumetric images: effect of computer-aided
diagnosis on radiologist accuracy. Radiology 242(3), 716–724 (2007)
15. Joo, S., Yang, Y.S., Moon, W.K., Kim, H.C.: Computer-aided diagnosis of solid breast nodules:
use of an artificial neural network based on multiple sonographic features. IEEE Trans. Med.
Imaging 23(10), 1292–1300 (2004)
16. Chen, C.M., Chou, Y.H., Han, K.C., Hung, G.S., Tiu, C.M., Chiou, H.J., Chiou, S.Y.: Breast
lesions on sonograms: computer-aided diagnosis with nearly setting-independent features and
artificial neural networks. Radiology 226(2), 504–514 (2003)
17. Sun, T., Zhang, R., Wang, J., Li, X., Guo, X.: Computer-aided diagnosis for early-stage lung
cancer based on longitudinal and balanced data. PLoS ONE 8(5), e63559 (2013)
18. Newell, D., Nie, K., Chen, J.H., Hsu, C.C., Hon, J.Y., Nalcioglu, O., Su, M.Y.: Selection
of diagnostic features on breast MRI to differentiate between malignant and benign lesions
using computer-aided diagnosis: differences in lesions presenting as mass and non-mass-like
enhancement. Eur. Radiol. 20(4), 771–781 (2010)
19. Tourassi, G.D., Frederick, E.D., Markey, M.K., Floyd, C.E.: Application of the mutual informa-
tion criterion for feature selection in computer-aided diagnosis. Med. Phys. 28(12), 2394–2402
(2001)
20. Ravì, D., Wong, C., Deligianni, F., Berthelot, M., Andreu-Perez, J., Lo, B., Yang, G.Z.: Deep
learning for health informatics. IEEE J. Biomed. Health Inform. 21(1), 4–21 (2017)
21. Lu, J., Getz, G., Miska, E.A., Alvarez-Saavedra, E., Lamb, J., Peck, D., Downing, J.R.:
MicroRNA expression profiles classify human cancers. Nature 435(7043), 834 (2005)
22. Cruz-Roa, A.A., Ovalle, J.E.A., Madabhushi, A., Osorio, F.A.G.: A deep learning architecture
for image representation, visual interpretability and automated basal-cell carcinoma cancer
detection. In: International Conference on Medical Image Computing and Computer-Assisted
Intervention, pp. 403–410, Sept 2013. Springer, Berlin, Heidelberg
23. Bowles, C., Qin, C., Guerrero, R., Gunn, R., Hammers, A., Dickie, D.A., Rueckert, D.: Brain
lesion segmentation through image synthesis and outlier detection. NeuroImage Clin. 16,
643–658 (2017)
24. Shen, D., Wu, G., Suk, H.I.: Deep learning in medical image analysis. Annu. Rev. Biomed.
Eng. 19, 221–248 (2017)
25. Chen, H., Dou, Q., Wang, X., Qin, J., Heng, P.A.: Mitosis detection in breast cancer histology
images via deep cascaded networks. In: Thirtieth AAAI Conference on Artificial Intelligence,
Feb 2016
26. Van Leemput, K., Maes, F., Vandermeulen, D., Colchester, A., Suetens, P.: Automated seg-
mentation of multiple sclerosis lesions by model outlier detection. IEEE Trans. Med. Imaging
20(8), 677–688 (2001)
27. Abraham, N., Khan, N.M.: A novel focal Tversky loss function with improved attention U-Net
for lesion segmentation. arXiv preprint arXiv:1810.07842 (2018)
28. Havaei, M., Davy, A., Warde-Farley, D., Biard, A., Courville, A., Bengio, Y., Larochelle, H.:
Brain tumor segmentation with deep neural networks. Med. Image Anal. 35, 18–31 (2017)
29. Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., Cardoso, M.J.: Generalised dice overlap
as a deep learning loss function for highly unbalanced segmentations. In: Deep Learning in
Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 240–248.
Springer, Cham (2017)
30. Yuan, Y.: Automatic skin lesion segmentation with fully convolutional-deconvolutional net-
works. arXiv preprint arXiv:1703.05165 (2017)
344 M. Jyotiyana and N. Kesswani
31. Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., Sánchez, C.I.:
A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017)
32. Lundervold, A.S., Lundervold, A.: An overview of deep learning in medical imaging focusing
on MRI. Z. Med. Phys. (2018)
33. Maclaren, J., Herbst, M., Speck, O., Zaitsev, M.: Prospective motion correction in brain imag-
ing: a review. Magn. Reson. Med. 69(3), 621–636 (2013)
34. Zaitsev, M., Akin, B., LeVan, P., Knowles, B.R.: Prospective motion correction in functional
MRI. NeuroImage 154, 33–42 (2017)
35. Juneja, K., Verma, A., Goel, S., Goel, S.: A survey on recent image indexing and retrieval
techniques for low-level feature extraction in CBIR systems. In: 2015 IEEE International
Conference on Computational Intelligence & Communication Technology, pp. 67–72, Feb
2015. IEEE
36. Pizarro, R., Assemlal, H.E., De Nigris, D., Elliott, C., Antel, S., Arnold, D., Shmuel, A.: Using
deep learning algorithms to automatically identify the brain MRI contrast: implications for
managing large databases. Neuroinformatics 17(1), 115–130 (2019)
37. Sklan, J.E., Plassard, A.J., Fabbri, D., Landman, B.A.: Toward content-based image retrieval
with deep convolutional neural networks. In: Medical Imaging 2015: Biomedical Applications
in Molecular, Structural, and Functional Imaging, vol. 9417, p. 94172C, Mar 2015. International
Society for Optics and Photonics
38. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recog-
nition. arXiv preprint arXiv:1409.1556 (2014)
39. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional
neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105
(2012)
40. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich, A.: Going
deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 1–9 (2015)
41. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Pro-
ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778
(2016)
42. Srivastava, R.K., Greff, K., Schmidhuber, J.: Training very deep networks. In: Advances in
Neural Information Processing Systems, pp. 2377–2385 (2015)
43. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional
networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni-
tion, pp. 4700–4708 (2017)
44. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep
neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 1492–1500 (2017)
45. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
46. Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable
image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 8697–8710 (2018)
47. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time
object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 779–788 (2016)
48. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Bengio,
Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems,
pp. 2672–2680 (2014)
49. Koch, G., Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recogni-
tion. In: ICML Deep Learning Workshop, vol. 2, July 2015
50. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image seg-
mentation. In: International Conference on Medical Image Computing and Computer-Assisted
Intervention, pp. 234–241, Oct 2015. Springer, Cham
Deep Learning and the Future of Biomedical Image Analysis 345
51. Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volu-
metric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision
(3DV), pp. 565–571, Oct 2016. IEEE
Nishtha Kesswani is currently assistant professor in the Department of Computer Science at the
Central University of Rajasthan. She did her post-doctorate research from California State Univer-
sity, San Bernardino, USA and doctorate from the University of Rajasthan, Rajasthan, India. She
received her Master’s degree from Malaviya National Institute of Technology, Rajasthan, India.
Her research interests include Algorithms, Human-Computer Interaction and Wireless networks.
She has publications in various international journals, conferences, books and book chapters.
Automated Brain Tumor Segmentation
in MRI Images Using Deep Learning:
Overview, Challenges and Future
Abstract Brain tumor segmentation of MRI images is a crucial task in the medical
image processing. It is very important that a brain tumor can be diagnosed in initial
stages which eventually improve treatment as well as survival chances of patient.
Manual segmentation is highly dependent on doctor, it may vary from one expert to
another as well as it is very time-consuming. On the other side, automated segmenta-
tion helps a doctor in quick decision making, results can be reproduced and records
can be maintained electronically which improves diagnosis and treatment planning.
There are numerous automated approaches for brain tumor detection which are popu-
lar from last few decades namely Neural Networks (NN) and Support Vector Machine
(SVM). But, recently Deep Learning has attained a central tract as far as automa-
tion of Brain tumor segmentation is concerned because deep architecture is able to
represent complex structures, self-learning and efficiently process large amounts of
MRI-based image data. Initially the chapter starts with brain tumor introduction and
its various types. In the next section, various preprocessing techniques are discussed.
Preprocessing is a crucial step for the correctness of an automated system. After pre-
processing of image various feature extraction and feature reduction techniques are
discussed. In the next section, conventional methods of image segmentation are cov-
ered and later on different deep learning algorithms are discussed which are relevant
in this domain. Then, in the next section, various challenges are discussed which are
being faced in medical image segmentation due to deep learning. In the last section,
a comparative study is done between various existing algorithms in terms of accu-
racy, specificity, and sensitivity on about 200 Brain Images. The motivation of this
chapter is to give an overview of deep learning-based segmentation algorithms in
terms of existing work, various challenges, along with its future scope. This chapter
deals with providing the crux of different algorithms involved in the process of Brain
Tumor Classification and comparative analysis has also been done to inspect which
algorithm is best.
1 Introduction
In earlier times, one could not imagine getting facilitated with a huge amount of
health care data; whereas, an enormous amount of data (precisely, big data) is avail-
able today; reason being an enhancements in an image acquisition devices and tools,
which is further engrossing as well as leading to varying challenges in the domain
of image analysis. The magnification and widening extent in terms of medical data,
such as images and techniques demands exhaustive and arid attempts by medical pro-
fessionals which would not only be error-prone but also be mutable across different
professionals. Ergo, an equivalent substitute is an absolute concern to automate the
diagnostic process. Although, machine language could help to do such automation,
yet the traditional approach would not work efficiently for complicated problems.
Thus, some sort of blending could be considered to raise an accuracy and preci-
sion level in such fields. Henceforth, machine learning along with high-performance
computing might help to tackle complex medical images for authentic and adequate
diagnostic outcomes. Similarly, feature extraction could be more powerful if done
with the help of deep learning; as such it could help to build new images as well. The
conclusions obtained by deep learning would hit the many domains namely, diag-
nosing the disease, providing accurate measurement of targets as well as providing
solutions in terms of predictive models suggesting what actions could be preformed,
eventually, guiding the field experts.
In the past few years, many fields have shown fleet evolution such as Artifi-
cial Intelligence, Deep Learning, and Machine Learning. These modalities played
a crucial role in the medical domains such as segmenting images, registering and
interpreting images, automated diagnosis, image processing, analyzing and retriev-
ing image data. Machine learning assists in image data and features extraction and
presenting this information in an organized way. These techniques of Artificial Intel-
ligence and Machine Learning can help medical experts make predictions about the
likelihood of diseases in a more detailed and precise manner and eventually, would
help to prevent them beforehand. Specialists, experts and researchers of medical
fields get enhanced and clear vision for making an analysis of generic variations that
are actually responsible for disease manifestation. Numerous traditional algorithms
form the core of these techniques, namely K Nearest Neighbors Algorithm, Neural
Networks, etc. [1]. Though these approaches are efficient yet they have their own
shortcomings in terms of processing power and time consumption. They have the
potential to process the images in their raw form but need more time for feature
extraction as well as an expert comprehension.
Automated Brain Tumor Segmentation in MRI Images Using … 349
Along with these conventional approaches, many other approaches have also
started empowering the domain namely, Long Short Term Memory, Extreme Learn-
ing Model, Recurrent Neural Network, Convolution Neural Network and many more.
These techniques overcome the limitations of conventional approaches as in feature
extraction is automated and learning becomes fast. They tend to automate the depic-
tion of information and training multiple levels of cogitation from a broad set of
images that exhibit required data behavior [2]. Despite the fact that conventional
approaches have proven to revert significantly precise results in the medical fields,
still emerging technologies and advancements helped to derive accurate solutions for
complex problems as well. Numerous deep learning algorithms produced significant
performance and speed improvements in major areas like the discovery of drugs, text
and speech recognition, facial recognition, etc. The chapter persuasion lies in exten-
sive and exhaustive retrospection of deep learning algorithms in the medical fields,
particularly, medical image analysis marking the future perspective while consider-
ing the past work as well. The chapter is inclined towards providing the elementary
information and modernity and highest development of the deep learning approaches
in the context of the medical domain.
Brain controls all imperative and essential functions of the human body. It forms one
of the most crucial and complicated organs of the human body and is a dominant
part of the Central Nervous System. Skull masks the human brain, which further
consists of- “gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF)”.
Cerebrospinal fluid (CSF) is a translucent liquid that sheathes the human brain as well
as the spinal cord. It provides different functionalities to the central nervous system
(CNS) as well as acts as confines shocks comprising ions, oxygen, and glucose,
distributed in nervous tissues at full length. CSF also aids in ejecting garbage from
nervous tissues [3–5].
As per the World health organization (WHO), there are approximately 120 different
types of brain tumor which have been detected so far. WHO’s classification criteria
is the cell’s origin. Broadly, brain tumors can be further classified in two categories
(as shown in Fig. 1):
a. Primary brain tumors: These are the tumor that originates in the brain. The
name of the tumor can be determined from where they originate. Tumors can
be categorized in two ways—benign and malignant. Benign tumors are also
known as non-cancerous and do not affect other parts. While malignant brain
350 M. Sharma and N. Miglani
Giloma(45%) Metastasis
tumors start in the brain itself and affect other parts of the body like a spine. The
growth rate of malignant tumors is fast. Benign brain tumors are easy to treat
than malignant tumors because they are not deeply buried in the brain and have
defined boundaries. Also, benign tumor if removed successfully there is very
little chance to come back. But that is not true in case of malignant tumor, if they
have been removed still there is a chance of coming back.
b. Secondary Brain Tumors: These are tumors that come from other body parts
(Metastasis). This type of tumor is named according to body parts from where
they originally spread. If a brain tumor develops from lung, then it is known
as metastatic lung tumor; reason being that this tumor gets developed due to
abnormal growth of lung cells [10].
By Grade
In medical terms, brain tumors can be classified into four grades. Grade 1 tumors are
the tumors that are in the initial phase and their growth rate is slow. Grade 2 is known
as a benign tumor. Grade1 and Grade2 come under low-grade tumor category. Grade
3 and Grade 4 can be categorized as a high-grade tumor (Malignant) and need urgent
treatment (Refer Fig. 2).
Brain Tumor
Benign Malignant
layers because it would complicate the structure as well. Such types of neural net-
works are called Deep Neural Network. Training and learning of data is cost-effective
in such networks. These extra layers, precisely hidden layers, facilitate constitution
of features originating from lower layers and moving towards upper layers by pro-
viding ability of designing complex architecture. For designing and developing auto-
mated applications, deep learning has emerged as a promising approach and has set
a benchmark as well. Results being obtained by automations outperformed manual
observations, i.e., when applied in medical domains, computer vision applications
based upon deep learning provided accurate and precise results in capturing cancer
identifying indicators in tumors and blood in MRIs. It can be judged as an augmenta-
tion of artificial neural network which comprised numerous hidden layers permitting
an abstract view and refined image analysis. This approach has grabbed attention by
many researchers of varying fields because of its unsurpassed conclusions and results
obtained in different applications such as facial recognition, object detection as well
as medical fields. Deep Neural Network assembles many layers of nodes/neurons,
generating a hierarchical structure. The layer count has even exceeded over thousand
layers in a single network. With such tremendous modeling dimensions, the network
can absolutely recollect every feasible mapping with the help of regressive training
process by collecting huge database and could be able to make apt predictions such
as reckoning of unseen cases. Therefore, it can be concluded that this approach def-
initely has an empowering impression in the fields of medical images and computer
vision. Nonetheless, its huge influence can also be seen in the fields of voice and
text as well. Researchers are exploiting different domains and extensions of deep
neural network; one such example is Convolution Neural Network, an absolute trend
nowadays. Along with it, many more fields of this domain has become an interest of
researchers such as deep Boltzmann machine (DBM), deep neural network (DNN),
Automated Brain Tumor Segmentation in MRI Images Using … 353
deep autoencodre (dA), deep belief network (DBN), recurrent neural network (RNN)
and its variants such as MDLATM or BLSTM etc. (depicted with their advantages
and disadvantages) in Table 1. The CNN model is grabbing an attention in the fields
of digital image processing and vision.
Basic Working
Deep neural network’s working has been divided into five steps as depicted in Fig. 4.
First step involves identification of problem and feasibility study that should be
carried out. It is very crucial step to know whether deep learning can solve given
problem or not. In second step, relevant data is required to be collected. There are
various deep learning algorithms available, thus in the third step, selection of appro-
priate algorithm is carried out. Eventually, fourth and fifth step deals with training
and testing of data.
Image interpretation and acquisitions are two ways of performing correct disease
diagnosis. In the past few years, tools and devices for acquiring images have upgraded
considerably in such a way that nowadays high-resolution radiological images are
retrieved for performing further analysis namely, CT scans, X-Ray, MRI. Nonethe-
less, this is just an initiation of achieving benefits from automating the process of
interpreting images. Numerous applications of machine learning, such as computer
vision, are already there, yet conventional machine learning approaches which are
used for image interpretations have strong dependency on experts in terms of features
extraction, an instance could be detection of brain tumor, which entails structural fea-
ture extraction. Conventional approaches though efficient, yet yields an inaccurate
and unreliable results; reason being the huge dissimilarities between patients’ data.
Henceforth, machine learning algorithm plays crucial role in handling disordered
and convoluted data [12].
Furthermore, deep learning, being more peculiar and precise approach has diverted
so much interest in every field, precisely, medical fields for analyzing images and
expectations behold around $300 million medical imaging market would be held
by deep learning by 2021. It would separately get huge investment for medical
domains, as in providing better accuracies and results for complex data as well.
Growth of deep learning has shown tremendous growth over years (as shown in
Fig. 5) The approach falls in the class of supervised machine learning method. Deep
learning has targeted many and varying fields, one of them is the computer vision. Yet
main success lies in the contraction of human involvement for disease diagnosis and
relying on automated results in order to get high veracity level, specifically in the field
of brain tumor where infinitesimal mistake in analysis could cause blunder. In such
cases, deep learning approach provides significant results as it can better approximate
and mimic the human brain by using leading methodologies and technologies in
comparison to basic neural network approach. Deep learning delves the utility of
354 M. Sharma and N. Miglani
Collect Data
Training Algorithm
deep and in-sight model of neural network. The technique proves it’s worth when
available knowledge in little and problem in hand is complicated and realistic. The
crux of neural network is its basic unit-neuron, inspiration being the working of
human brain, where multiple signals acts as an input unit, signals are passed on from
one layer to another, layers being linked together on the basis of inter-connection
weights. Eventually, the combined signals are passed through different non-linear
operations, resulting in an output signal.
Brain tumor classification consist seven stages from data collection to tumor detection
as shown in Fig. 6:
First step is to develop MR image database. Images are collected from 1.5 T MRI
machine and images which are generally used have size 256 × 256. The intensity
of grey scale image has range [0 255] where 0 represents black and 255 represents
white. This database can be divided into two types-Training database and Testing
database. These images are stored in jpeg format. Examples of brain images are
shown in Fig. 7.
1. MR Image Acquisition
2. Image Pre-Processing
Fig. 8 Shows brain image before histogram equalization and after applying it. It can be seen from
both figures that contrast of image is enhanced which is beneficial for later steps [71]. a The original
MRI [5]. b Histogram equalized MRI
358 M. Sharma and N. Miglani
I = rb2gray(RGB)
where RGB is the image to be converted in grey scale image and I is the resulting
image
(c) Morphological Operations: Morphological operations can be applied to
images for sharpening the regions and for filling gaps of image. Basically there
are four basic operations: dilation, erosion, opening and closing. Figure 10
shows before and after result of morphological operations [24, 25].
Features are an important attributes as far as an image is concerned. One of the vital
features of an image is the texture of the image. Filtering different features from
any pre-processed image is known as feature extraction. Such features are used in
classifying images [49–51]. There are two different approaches to segment an image:
Structured approach and statistical approach. The proposed study deals with the
statistical approach. Numerous techniques being used for texture measurement are
Gabor filters, co-occurrence matrix, wavelet transform, Fractals. The technique used
in this study applies Gray Level Co-occurrence Matrix (GLCM). This technique relies
on apprehending feature values numerically by making use of spatial relationships
among neighboring pixel features. They can also aid further in classification and
making comparison of different features values obtained numerically. The function
used to compute these features for any given image is available in MATLAB:
where, image is a variable used for input image and offset is used to measure features
from four different directions—0°,45°,90°,135° and have offset value—0 1, −1 1,
−1 0, −1−1 respectively.
0 0 0
(-1,-1) 135 (-1, 0)90 (-1, 1)45
These features are used for segmenting image. Image segmentation can be done in two
ways: statistical approach and structured approach. Most of the researchers make use
of statistical approach. There are several statistical techniques for measuring texture
such as co-occurrence matrix, Fractals, Gabor filters, wavelet transform. Proposed
research work uses Gray Level Co-occurrence Matrix (GLCM). GLCM captures
numerical feature values using spatial relationship among neighborhood pixels fea-
tures. These numerical feature values are used for further comparing and classifying
features. GLCM extract 20 texture features, “Autocorrelation, Contrast, Correlation,
Cluster Prominence, Cluster Shade, Dissimilarity, Energy, Entropy, Homogeneity,
Maximum probability, Sum of squares, Variance, Sum average, Sum variance, Sum
entropy, Difference variance, Difference entropy, information measure of Correla-
tion, Information measure of correlation 2 Inverse difference (INV), Inverse differ-
ence normalized (INN) Inverse difference moment normalized” (as shown in Fig. 13)
[52, 53].
GLCM features are an extracted images- for three different brain images, namely,
Brain image 1, Brain image 2, and Brain image 3 as depicted in Fig. 14 and Table 3
presents results obtained from these images.
1. Contrast (contdr): It measure variation between pixel and its adjoining pixel
in terms of grey scale change. Contrast can be computed using the formula
suggested below
Contdr = |a − b|2 Pi (a, b) (1)
a,b
Feature types
Table 3 GLCM features for brain image 1, brain image 2, brain image 3
Feature no Feature name Feature value Feature values Feature values
image 1 image 2 image3
1 Autocortrelaion 0.07978 0.152848 43.1530
(autoc)
2 Contrast (contrd) 0.95866 0.919698 1.8692
3 Co-relation1 295.685 294.6303 0.1392
(corrpd)
4 Co-relation2 30.1227 30.70965 34.6933
(cpromd)
5 Cluster shade 0.06146 0.093196 5.2662
(cshad1)
6 Energy (energd) 0.83309 0.787922 0.1233
7 (Dissimilarity) 0.5369 0.672409 0.6877
Dissid
8 Entropy (entrod) 0.97182 0.960849 2.6980
9 Homogeneity 0.97524 0.959027 0.65645
(homopd)
10 Maximum 0.91314 0.886946 0.6411
probability
(maxprd)
11 Sum of Squares 2.31867 2.48507 0.1973
(sosvhd)
12 Sum Average 2.31703 2.447587 44.9329
(savghd)
13 Sum Variance 0.16651 0.622724 13.2626
(svarhd)
14 Sum entropy 0.53192 0.96682 133.5676
(senthd)
15 Difference 10.9731 0.965064 1.8188
Variance (dvhd)
16 Difference 1.85308 0.895411 1.8927
entropy (denthd)
17 Information 0.15269 2.227421 1.2145
measure of
Co-relation1
(inf1hd)
18 Information 0.65648 0.886946 −0.0322
measure of
Co-relation2
(inf2 h)
19 Inverse difference 0.96834 2.48507 0.2863
(indncd)
20 Inverse difference 0.62785 2.447587 0.9107
moment
Automated Brain Tumor Segmentation in MRI Images Using … 365
√
energd = a (3)
3. Homogeneity (HOM): Measure changes in grey values. If there are large vari-
ation in grey values then homogeneity will also be large and vice versa.
Pi (a, b)
HOM = (4)
i, j
1 + |a − b|
4. Energy (E): It yields the sum of squared errors in the GLCM. If an image is
constant, then value of Energy becomes one.
E= Pi (a, b)2 (5)
a,b
6. Variance (VAR): It predicts the difference between gray levels and the mean
value obtained
VAR = Pi (a, b)Pi (a, b) − µ2 (7)
a b
Hx y − Hx y1
I MC1 = (8)
max(Hx , Hy )
Feature reduction helps in minimizing feature set out of total available features
to enhance the accuracy and precision of segmentation and time complexity will
also be minimized. The key behind feature reduction is to filter out merely those
366 M. Sharma and N. Miglani
Initialization and
Representation
Stopping criteria=false
Stopping
criteria?
Stopping criteria=true
Exit
features which are more relevant. Most popular feature reduction algorithms are-
“Sequential forward Selection, Sequential Backward selection, Genetic Algorithm
and Particle Swarm Optimization, Principal Component analysis [54, 55]”. Genetic
Algorithm was developed by Jon Holland in 1975 which relies on the biological
concept, that is, fittest can only survive [56, 57]. It means that only best parent can
produce their offspring. In the same manner only best solution can lead to another
best solution. Genetic Algorithm finds application in many areas like optimization
problem, Machine learning and pattern recognition.
Generally, Genetic algorithm has following steps (as shown in Figs. 15 and 16):
(1) Initialization and representation: In the first phase, initial population is gen-
erated. This initial population is randomly generated out of available search
space. Genetic algorithm uses binary coding scheme for representation where
1 shows gene is present and 0 shows gene is absent.
(2) Selection: Selection is also known as “survival of the test operator”. In this
phase, worst solutions are removed from the population while best items are
duplicated. A fitness function is used to decide whether an item is best or whether
it is worst.
(3) Cross Over and Mutation: In mutation, a position in string is chosen at random
and flips that value of that bit i.e. 1–0 or 0–1. Whereas, in crossover two best
chromosome joins at some point to generate new population.
(4) Stopping Criteria: There must be some stopping criteria for feature selection
process otherwise this process will keep on going uninterruptedly. There are
various ways to stop feature selection process-(1) a pre-defined number of fea-
tures can be selected as a stopping criteria which depends on user requirement,
Automated Brain Tumor Segmentation in MRI Images Using … 367
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
Features f1 f2 ……………………. f20
no
Accuracy>threshold
yes
no yes
Maximum iteration exit
reached?
(2) number of iterations can be used as a stopping criteria, (3) fitness function
value cannot be changed further, then algorithm must be stopped.
1. Initially, Grey level co-occurrence matrix (GLCM) abstracts twenty texture
features from all the respective images. Each feature has been assigned
number from 1 to 20. For example f1, f2 … f20
368 M. Sharma and N. Miglani
2. These 20 features have been passed to genetic algorithm for reducing them
up to Y features. For Genetic Algorithm to be start, following parameters
should be set:
Population Size: 20 Maximum Chromosomal Length: Y
3. After the feature reduction phase, only Y most promising features will be
selected out of 20 features. 3. For initialization, initially Y features are ran-
domly selected and assigned to temporary feature set. Each time algorithm is
executed, different feature set is selected. Features are shown in the form of
binary string “10010110101010000000”, where 1 signifies respective feature
set is available and 0 signifies corresponding feature is not available or absent.
For example-
In this example, stopping criteria that has been chosen is Maximum iteration.
When maximum iteration would be reached, algorithm will stop.
4. Fitness Function: For the success of genetic algorithm, fitness function must
be defined which is used to determine whether a particular feature subset is
promising or not. In the proposed research work, ANFIS is used to calculate
fitness value using Eq. (1).
20
Tr P Tr N Tr P + Tr N
Fitness(fi) = + + (9)
i=1
Tr P + Fa N Tr N + Fa P Tr P + Fa N
Total features = A U B
Fitness(A) = F(i) − penalty(A)
where A is the subset of selected features, and penalty (A) = w * (|A| − d) where
w is penalty coefficient. On the basis of fitness function, next generation features
are selected (as shown in Table 4). The fitness value helps in deciding whether
feature selected is good or not.
Automated Brain Tumor Segmentation in MRI Images Using … 369
Neuro Fuzzy concept was developed in 1995 by J.S.R Jang. The hybridization of
neuro-fuzzy is the most fruitful integration of the Soft Computing techniques. Neuro
Fuzzy system combines benefits of both fuzzy system and neural network. Fuzzy
logic is capable of modeling vagueness, handling uncertainty and supporting human-
type reasoning.
The Adaptive Network based Fuzzy Inference System (ANFIS) uses a Takagi
Sugeno Fuzzy Inference System and it has five layers as shown in Fig. 17. The
first hidden layer is used for mapping of the input variable to their corresponding
membership functions. To calculate antecedent of rule, T-norm is applied in the
second hidden layer. Final shape of membership function is also tuned in the second
layer. The third hidden layer is concerned about normalization of rule strength.
ANFIS Editor GUI can be used for initialization of FIS properties. To start ANFIS
editor in MATLAB type:
anfisedit
Figure 18 shows GUI of ANFIS editor in which no. of inputs are 7 and corresponding
to 7 input there is 1 output. Each input has two membership functions of custom type.
There are different panel in GUI such as loading the data, generating FIS, training
FIS, and testing FIS where loading of data is the first step. Data should be in matrix
form and can be either taken from file or workspace.
Proposed system comprises two steps: In first step, training is done and in the second
step, testing is done as shown in Fig. 19.
Automated Brain Tumor Segmentation in MRI Images Using … 371
Feature
Extractor
[Y1, Y2……Y7]
In training phase, features from different images are extracted using GLCM and
are reduced to 7 feature subset using Genetic Algorithm and then store them in the
database along with the corresponding output. Total 57 images are used to train
proposed system. When a query image comes for tumor identification, firstly its
GLCM image features are extracted and are finally send to recognizer of proposed
work for finding the best suitable match. After, finding suitable match, corresponding
output will be generated. Output means which type of tumor is there and grade of
tumor as well [58].
372 M. Sharma and N. Miglani
C
µ j (xi ) = 1 (10)
j=1
Automated Brain Tumor Segmentation in MRI Images Using … 373
Start
Calculate centroids
No
Is previous cluster center same
as new cluster center?
yes
Stop
where
i = 1, 2, 3, … n
n represent no. of elements to be partition into clusters.
J = 1, 2, 3 … C
C represents no. of clusters in which elements are to be partitioned
µ j (xi ) represents degree to which element xi belongs to cluster Cj
m
i µ j (x i) xi
cj = m (11)
i µ j (x i)
where, m is fuzzification parameter and its value lies between 1.25 and 2 (generally)
Step 3: Calculate dissimilarity between the data points and centroid using
Euclidean distance
374 M. Sharma and N. Miglani
Di = (x2 − x1 )2 + (y2 − y1 )2 (12)
m−1
1
1
d ji
µ j (xi ) = m−1 (13)
c 1
1
j=1 d ji
In Fig. 22, four clusters are represented by four colors-red, blue, purple and green
and cluster center is represented by “X”.
• Shape feature can also be used to increase classification accuracy. Get extra infor-
mation from patient like history, age to increase classification accuracy.
• Modified Sugeno type ANFIS can be used.
Though deep learning in itself is a domain with numerous benefits and has large
number of practical applications yet to attain those benefits, one might encounter
some challenges as discussed below:
The human brain requires lots of information and experiences to reach to any out-
come. On similar pattern, artificial neural networks demands huge amount of data for
training and learning. Huge dataset is beneficial to obtain accurate and precise results.
Deep learning classifier relies heavily on the magnitude and quality of dataset avail-
able. If limited data or information is available, it could directly hamper the success
ratio of deep learning, specifically in medical domains [69]. Although, huge dataset
is a crucial concern, yet another challenge lies in generating such data for medical
imaging as it depends on the observations and interpretations provided by experts
of that field. In order to minimize inaccuracies and human errors, it is important to
consider multiple experts opinions. This would become difficult if field experts are
not available. Moreover, in extreme cases of rare diseases, sufficient cases might
not be available. One more issue could be unbalancing of data as if it is the case of
rare disease, data set could be unprecedented, and in which case an imbalance may
supervene.
In deep learning, training the data can yield productive and precise results, but only
for a specific problem. In the current scenario, deep learning approach is highly
domain-specific in such a way that if one requires solution for similar kind of prob-
lems or patterns, one has to re-assess and re-train the data all over. Although, the
approach is efficient enough for solving some specific problem, yet it is inflexible to
accommodate multi-tasking. Research is going on to focus multi-tasking without the
need of revising complete architecture. Multi-Task Learning (MTL) and Progressive
Neural Networks are being explored to bring some amelioration in this aspect.
Deep learning algorithms bought new hopes in the field of medical imaging and
triggered new opportunities. It provided the solution for the problems which were
376 M. Sharma and N. Miglani
When the values of parameters are set before the learning process begins, these are
called hyper-parameters. If a small change is done in these values, it could largely
affect the model performance. When real life problems are considered, default value
of parameters cannot help building accurate results. It can hamper the system per-
formance significantly. If small number of hyper-parameters are considered and are
tuned manually instead of optimizing them with standard methods, could also raise
a performance issue.
Deep learning requires high capacity hardware which is costly and demands huge
power consumption as well.
Deep neural network can be trained to one domain only. It cannot adapt to another
domain. For different problem, it again requires training of neurons.
Processing Power, Big Data and Deep Learning Algorithms based on human brain
are three key features that are stimulating the revolution of deep learning. Undoubt-
edly, the benefits achieved by deep learning are remarkable and for attaining those
benefits, human efforts and cost incur is also high. Large scale companies and differ-
ent research laboratories with prominent hospitals are also engaging and functioning
Automated Brain Tumor Segmentation in MRI Images Using … 377
together towards reaching the most favorable unravelments in medical fields. Numer-
ous companies namely, Hitachi, Siemen etc. have already step forward for putting
high expenses in the domain. For detection of pediatric brain disorders, GE Health-
care with Bostons Children Hospital is developing smart imaging technology. Even
research labs are expending money for delivering potent image-based applications.
There is a requirement of huge dataset for applying deep learning methods, and
availability of such huge data in itself is a crucial and difficult task. Illustration of
real world data is easy in comparison to medical image data. For instance, illustration
of objects, distinction of men or women in real world is a negligible task to do whereas
interpretation of medical images requires field expertise as well as it is costly affair
which demands lot of time for processing. In fact, not only an opinion of single
expert but a multiple experts for same data are required for gaining accuracies and
peculiarities in manipulating image data. One more issue could lie in whether data
is available or not in case where diseases are rare. In such cases, it becomes more
difficult to get large amount of dataset. The solution for above- suggested problem
could be the sharing of data by different healthcare service providers as far as possible.
In this way problem of data access could be minimized.
Even though numerous predictions about benefits and growth of deep learning in
medical image field are being made by stakeholders, yet replacement of human with
machines or tools will always remain a debatable issue. Significant improvements in
378 M. Sharma and N. Miglani
Either technical or sociological issues can affect data confidentiality, thus there is an
urge of dealing it with both perspectives technical as well as sociological. To deal
with privacy concerns, HIPAA comes to the mind as far as medical field is concerned.
HIPAA stands for Health Insurance Portability and Accountability Act of 1996, is an
US Legislation. It renders patients with the legal rights concerning his/her individual
accountable information and providing some standards and protocols to secure their
personal details and their use in any form. This privacy concern is an absolute need of
the current scenario yet it is challenging in terms of how to secure and hide the patient
personal information in order to forbid its misuse. If some kind of restrictions would
prevail on data, then it would limit the content availability, which would further
raise an issue of limited dataset and henceforth, would lead to inaccurate results.
Although it is not mandatory to comply with HIPAA yet secure health information
can be stored and maintained as HIPAA covered entity. Applicability of HIPAA exists
only if Protected Health Information for transactions is transmitted electronically.
Indian organizations and companies are also being assisted for HIPAA compliance
in order to stay ahead in the world of data protection. Moreover, health care data is
dynamic in nature, thus existing methodologies are insufficient to tackle the problem.
8 Performance Comparison
Fig. 23 Comparative analysis between deep learning and other segmentation methods
Table 5 Comparative
Algorithms Sensitivity Specificity Accuracy
analysis between deep
(%) (%) (%)
learning and other
segmentation methods (also, Fuzzy C 96.1 93.4 86.16
refer Fig. 23) means
segmentation
ANFIS + 95.1 93.1 90.1
Genetic
K-Mean + 80.1 93.32 83.4
FCM
Deep 97.01 96.1 97.17
learning
(CNN)
9 Conclusion
For the automation of daily life tasks, deep learning has gained much popularity in
recent years. In the upcoming years most of the routine jobs would be performed
using automatic devices rather than manual work. This chapter yields an overview of
different segmentation methods for images. Deep learning methods are more efficient
and can address problem in better way than other algorithms. Deep learning provides
380 M. Sharma and N. Miglani
10 Future Scope
References
1. Zikic, D., Ioannou, Y., Brown, M., Criminisi, A.: Segmentation of brain tumor tissues with
convolutional neural networks. In: Proceedings of MICCAI workshop on Multimodal Brain
Tumor Segmentation Challenge (BRATS), pp. 36–39 (2014)
2. Pereira, S., Pinto, A., Alves, V., Silva, C.A.: Brain tumor segmentation using convolutional
neural networks in MRI images. IEEE Trans. Med. Imaging 35(5), 1240–1251 (2016)
3. Central Brain Tumor Registry of the United States (CBTRUS), Fact Sheet available at (2011)
http://www.cbtrus.org/factsheet.html
4. Christ, J.M., Parvathi, R.M.S.: Brain tumors: an engineering perspective. IJCSI 9(4), 392–396
(2012)
5. Schmidt, F.E.W.: Development of a time-resolved optical tomography system for neonatal
brain imaging. Ph.D. thesis, Chapter-2, pp. 25–34 (1999)
6. Thurnher, M.M., Thurnher, S.A., Fleischmann, D., Steuer, A., Rieger, A., Helbich, T., Trattnig,
S., Schindler, E., Hittmair, K.: Comparison of T2-weighted and fluid-attenuated inversion-
recovery. Am. Soc. Neuroradiol. 1601–1609 (1997)
7. Doolittle, N.D.: State of the science in brain tumor classification. Semin. Oncol. Nurs. 20,
224–230 (2004)
Automated Brain Tumor Segmentation in MRI Images Using … 381
8. Wen, P.Y., Teoh, S.K., Black, P.M.: Brain tumors: an encyclopedic approach. Cancer Neurol.
Clin. Pract. 217–248 (2001)
9. Chandrasoma, P.C.P.: Stereotactic brain biopsy. W. J. Med. 1–5 (1991)
10. Kong, N.S.P., Ibrahim, H., Hoo, S.C.: A literature review on histogram equalization and its
variations for digital image enhancement. Int. J. Softw. Eng. Res. Pract. 1(2), 386–389 (2013)
11. Singaravel, S., Suykens, J., Geyer, P.: Deep-learning neural-network architectures and methods:
Using component-based models in building-design energy prediction. Adv. Eng. Inform. 38,
81–90 (2018)
12. Du, X., Cai, Y., Wang, S., Zhang, L.: Overview of deep learning. In: 31st Youth Academic
Annual Conference of Chinese Association of Automation Wuham, China, 11–13 Nov 2016,
pp. 159–164
13. Ishak, N.F., Logeswaran, R., Tan, W.H.: Artifact and noise stripping on low-field brain mri.
Int. J. Biol. Biomed. Eng. 2(2), 59–68
14. Nobi, M.N., Yousuf, M.A.: A new method to remove noise in magnetic resonance and ultra-
sound images. J. Sci. Res. 3(1), 81–89 (2011)
15. Devasena, C.L., Hemalatha, M.: Noise removal in magnetic resonance images using hybrid
KSL filtering technique. Int. J. Comput. Appl. 27(8), 1–4 (2011)
16. Kumar, S., Kumar, P., Gupta, M., Nagawat, A.K.: Performance comparison of median and
wiener filter in image de-noising. Int. J. Comput. Appl. 12(4), 27–31 (2010)
17. Bhatia, A., Kulkarni, R.K.: High density salt and pepper noise removal through improved
adaptive median filter. Int. Conf. Comput. Sci. Inform. Technol. (CSIT-2012). 197–200 (2012)
18. Bagade, S.S., Shandilya, V.K.: Use of histogram equalization in image processing for image
enhancement. Int. J. Softw. Eng. Res. Pract. 6–10 (2011)
19. Chen, S.D.: Contrast enhancement using brightness preserving bi-histogram equalization. IEEE
Trans. Consum. Electron. 1, 1–8 (1997)
20. Wang, C., Zhongfu, Y.: Brightness preserving histogram equalization with maximum entropy:
a variational perspective. IEEE Trans. Consum. Electron. 51(4), 1326–1334 (2005)
21. Ning, C.Y., Liu S.F., Qu, M.: Research on removing noise in medical image based on median
filter method. IEEE Explore. 384–388 (2009)
22. Sawant, H.K., Deore, M.: A comprehensive review of image enhancement techniques. Int. J.
Comput. Technol. Electron. Eng. 1(2), 34–38 (2012)
23. Gonzalez, R.C., Woods, R.E.: Digital image processing, 2nd edn. Prentice Hall (2002)
24. Chen, S.D., Ramli, R.: Contrast enhancement using recursive mean-separate histogram equal-
ization for scalable brightness preservation. IEEE Xplore, 1301–1309 (2001)
25. Dykstra, C., Das, M.: The use of image morphing to improve the detection of tumors in emission
imaging. Nucl. Sci. Symp. 3, 1781–1785 (1998)
26. Marr, D., Hildreth, E.: Theory of edge detection. Proc. Roy. Soc. Lond. B. 187–217 (1980)
27. Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell.
6, 679–698 (1986)
28. Schunck, B.G.: Edge detection with Gaussian filters at multiple scales. IEEE Comput. Soc.
Work. Comp. Vis.208–210 (1987)
29. Bergholm, F.: Edge focusing. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-09, 726–741
(1987)
30. Lacroix, V.: The primary raster: A Multiresolution Image Description. In: 10th International
Conference on Pattern Recognition, pp. 903–907 (1990)
31. Williams, D.J., Shah, M.: Edge contours using multiple scales. Comput. Vis. Graph Image
Process. 51, 256–274 (1990)
32. Goshtasby, A., Marr, D.: On edge focusing. Image visualization. Computer. 12, 247–256
33. Deng, G., Cahill, L.W.: An adaptive Gaussian filter for noise reduction and edge detection. In:
Proceedings IEEE Nuclear Science Symposium, pp. 1615–1619 (1994)
34. Bennamoun, M., Boashash, B., Koo, J.: Optimal parameters for edge detection. Proc. IEEE
Int. Conf. SMC. 2, 1482–1488 (1995)
35. Heric, D., Zazula, D.: Combined edge detection using wavelet transform and signal registration.
Elsevier J. Image Vis. Comput. 25, 652–662 (2007)
382 M. Sharma and N. Miglani
36. Shih, M.Y., Tseng, D.C.: A wavelet based multi resolution edge detection and tracking. Elsevier
J. Image Vis. Comput. 23, 441–451 (2005)
37. Bezdek, J.C., Chandrasekhar, R., Attikiouzel, Y.: A geometric approach to edge detection.
IEEE Trans. Fuzzy Syst. 6(1), 52–75 (1998)
38. Wu, J., Yin, Z., Xiong, Y.: The fast multilevel fuzzy edge detection of blurry images. IEEE
Signal Process. Lett. 14(5), 344–347 (2007)
39. Lu, S., Wang, Z., Shen, J.: Neuro-fuzzy synergism to the intelligent system for edge detection
and enhancement. Elsevier J. Pattern Recogn. 36, 2395–2409 (2003)
40. Shrivakshan, G.T., Chandrasekar, C., Bhandarkar, S.M.: An edge detection technique using
genetic algorithm-based optimization. Pattern Recogn. 27(9), 1159–1180 (1994)
41. Zhang, Y., Potter, W.D.: Comparison of various edge detection techniques used in image pro-
cessing. IJCSI Int. J. Comput. Sci. Issues 9(5), 269–276 (2012)
42. Becerikli, Y., Karan, T.M., Cabestany, J., Prieto, A., Sandoval, D.F.: A new fuzzy approach for
edge detection. IWANN 2005, 943–951 (2005)
43. Anver, M.M., Stonie, R.J.: Evolutionary learning of a fuzzy edge detection algorithm based on
multiple masks. Springer, vol. 12, pp. 1–13 (2005)
44. Suliman, C., Boldişor, C., Băzăvan, R., Moldoveanu, F.: A fuzzy logic based method for edge
detection. Eng. Sci. 4, 159–164 (2011)
45. Sharifi, M., Fathy, M., Mahmoudi, M.T.: A classified and comparative study of edge detec-
tion algorithms. In: Proceedings of the International Conference on Information Technology:
Coding and Computing (ITCC.02) IEEE, pp 1–4 (2002)
46. Yu-Qian, Z., Wei-Hua, G., Zhen-Cheng, C., Jing-Tian, T., Ling-Yun, L.: Medical images edge
detection based on mathematical morphology. In: Proceedings of the 2005 IEEE Engineering
in Medicine and Biology 27th Annual Conference Shanghai, China, pp. 6492–6495 (2005)
47. Saxena, S., Kumar, S., Sharma, V.K.: Comparative analysis of various edge detection tech-
niques. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3(6), 758–761 (2013)
48. Haralick, R.M., Shanmugam, K., Dinstein, I.H.: Textural Features for Image Classification.
IEEE Trans. Syst. Man Cybern. 610–621 (1973)
49. Prasetiyo, Khalid, M., Yusof, R., Meriaudean, F.: A comparative study of feature extraction
methods for wood texture classification. SITIS, IEEE Conf.. 23–29 (2010)
50. Nithya, R., Santhi, B.: Comparative study on feature extraction method for breast cancer clas-
sification. J. Theor. Appl. Inf. Technol. 33(2), 220–226 (2011)
51. Chadha, A., Mallik, S., Johar, R.: Comparative study and optimization of feature-extraction
techniques for content based image retrieval. Int. J. Comput. Appl. 52(20), 35–42 (2012)
52. Ramamurthy, B., Chandran, K.R., Aishwarya, S., Janaranjani, P.: CBMIR: content based image
retrieval using invariant moments, GLCM and grayscale resolution for medical images. Eur. J.
Sci. Res. 460–471 (2010)
53. Hamza, R.M., Al-Assadi, T.A.: Genetic algorithm to find optimal GLCM features. Inf. Technol.
Univ. Babylon Iraq. pp. 1–16 (2012)
54. Jolliffe, I.T., Potter, W.D.: Principal Component Analysis, 2nd edn, pp. 1–5. Springer, New
York (2002)
55. Scholkopf, B., Smola, A., Muller, K.R.: Kernel Principal Component Analysis, pp. 327–352.
IT Press, Cambridge, MA (1999)
56. Shapiro, V.A., Veleva, P.K., Sgurev, V.S.: An adaptive method for image thresholding. In: 11th
IAPR International Conference on Image, Speech and Signal Analysis, pp. 696–699 (1992)
57. Sezgin, Mehmet, Sankur, Bulent: Survey over image thresholding techniques and quantitative
performance evaluation. J. Electron. Imaging 13, 146–165 (2004)
58. Elaiza, N., Khalid, A., Ibrahim, S., Manaf, M.: Comparative study of adaptive network-based
fuzzy inference system (ANFIS), k-nearest neighbors (k-NN) and fuzzy c-means (FCM) for
brain abnormalities segmentation. Int. J. Comput. 5(4), 513–524 (2011)
59. Zhang, J., Morgan, N.: Stochastic model based image segmentation using Markov random
fields and multi-layerperceptrons. IEEE Signal Process. 1–8 (1990)
60. Azmi, R., Norozi, N.: A new markov random field segmentation method for breast lesion
segmentation in MR images. J. Med. Signals Sens. 1(3), 156–164 (2011)
Automated Brain Tumor Segmentation in MRI Images Using … 383
61. Prastawa, M., Bullitt, E., Gerig, G.: A brain tumor segmentation framework based on outlier
detection. Med. Image Anal. 18, 217–231 (2004)
62. Dipali, B.B., Patil, S.N.: Brain tumor mri image segmentation using FCM and SVM techniques.
Int. J. Eng. Sci. Comput. 3939–3942 (2016)
63. Kannan, S.R., Ramathilagam, S., Devia, R., Hines, E.: Strong fuzzy C-means in medical image
data analysis. J. Syst. Softw. 2425–2438 (2012)
64. Zhang, J.G., Ma, K.K., Chong, V.: Tumor segmentation from magnetic resonance imaging by
learning via one-class support vector machine. IWAIT. 207–21 (2004)
65. Garcia, C., Moreno, J.: Kernel based method for segmentation and modeling of magnetic
resonance images. LNCS. 636–645 (2004)
66. Lee, C.H., Schmidt, M., Murtha, A., Bistritz, A., Sander, J., Greiner, R.: Segmenting brain
tumors with conditional random fields and support vector machines. LNCS 3765, 469–478
(2005)
67. Gibbs, P., Buckley, D.L., Blackband, S.J., Horsman, A.: Tumor volume determination from
MR images by morphological segmentation. Phys. Med. Biol. 2437–2446 (1996)
68. Letteboer, M., Olsen, O., Dam, E., Willems, P., Viergever, M., Niessen, W.: Segmentation of
tumors in magnetic resonance brain images using an interactive multiscale watershed algorithm.
Acad. Radiol. 11, 1125–1138 (2011)
69. Havaei, M., Davy, A., Warde-Farley, D., Biard, A., Courville, A., Bengio, Y., Pal, C., Jodoin,
P.-M., Larochelle, H.: Brain tumor segmentation with deep neural networks. Med. Image Anal.
35(2017), 18–31 (2017)
70. Web Source: https://www.oreilly.com/library/view/deep-learning/9781491924570/ch04.html
71. Magudeeswaran, V., Ravichandran, C.G.: Fuzzy logic-based histogram equalization for image
contrast enhancement. Math. Eng. 1–10 (2013)
72. Vorontsov, A.O., Averkin, A.N.: Comparison of different convolution neural network architec-
tures for the solution of the problem of emotion recognition by facial expression. In: Proceedings
of the VIII International Conference “Distributed Computing and Grid-technologies in Science
and Education” (GRID 2018), Dubna, Moscow region, Russia, Sep 10–14 2018, pp. 35–40
73. Agarwal, V.: Analysis of histogram equalization in image preprocessing. BIOINFO Hum.
Comput. Interact. 1(1), 04–07
74. Yang, Y., Huang, S.: Novel statistical approach for segmentation of brain magnetic resonance
imaging using an improved expectation maximization algorithm. Optica Appl. 125–36 (2006)
75. Vinitski, S., Iwanaga, T., Gonzalez, C.F., Andrews, D., Knobler, R., Curtis, M.: Fast tissue
segmentation based on a 4D feature map. In: 9th International Conference (ICIAP 97), vol. 2,
pp. 445–452 (1997)
76. Revathy, M., Hemalataha, M.: Efficient method for feature extraction on video processing. In:
CCSEIT 2012 ACM International Conference, pp. 539–543 (2012)
Minakshi Sharma received the Ph.D. degree in Computer Science from Banasthali University
Rajasthan India, in 2015. In 2017, she joined as an Assistant Professor in NIT Kurukshetra in the
Department of Computer Engineering. She has more than 10 papers to his credit in national and
international conferences and journals. Her research interests include Deep Learning, Artificial
Intelligence, Neural Network, Fuzzy logic Based systems.
Neha Miglani she has received her Master Degree in Computer Science from Kurukshetra Uni-
versity, India in 2012. Currently, she is working as an Assistant Professor in National Institute
of Technology, Kurukshetra, India. Her research interest includes Cloud Computing, Neural Net-
works, Software Reliability ranging from Cost Models, Software Reliability Growth Models, and
Reliability metrics, etc.